A climate model intercomparison for the Antarctic region: present and past

Eighteen General Circulation Models (GCMs) are compared to reference data for the present, the MidHolocene (MH) and the Last Glacial Maximum (LGM) for the Antarctic region. The climatology produced by a regional climate model is taken as a reference climate for the present. GCM results for the past are compared to ice-core data. The goal of this study is to find the best GCM that can be used to drive an ice sheet model that simulates the evolution of the Antarctic Ice Sheet. Because temperature and precipitation are the most important climate variables when modelling the evolution of an ice sheet, these two variables are considered in this paper. This is done by ranking the models according to how well their output corresponds with the references. In general, present-day temperature is simulated well, but precipitation is overestimated compared to the reference data. Another finding is that model biases play an important role in simulating the past, as they are often larger than the change in temperature or precipitation between the past and the present. Considering the results for the present-day as well as for the MH and the LGM, the best performing models are HadCM3 and MIROC 3.2.2.


Introduction
Variations in ice volume of the Antarctic Ice Sheet (AIS) have a large impact on sea level and ocean circulation. Since the Last Glacial Maximum (LGM), at approximately 21 ka, the AIS has undergone many changes (e.g. Huybrechts, 2002;Bentley, 1999). This is especially true for the West Antarctic Ice Sheet, which is potentially unstable (see for example, Hughes, 1975;Thomas, 1979;Bamber et al., 2009).
To study variations in the AIS with a dynamical ice-sheet model, realistic (near-surface) air temperature and precipitation are needed as input. These variables may be given by a General Circulation Model (GCM) or a Regional Climate Model (RCM), which in its turn may be driven by a GCM at its lateral boundaries. Therefore it is important to know which GCMs perform well in the Antarctic region.
It is generally accepted that a model performs well when it is close to the ensemble mean, as was done by Zweck and Huybrechts (2005). However, the best GCM for a specific study or use might not be the one closest to the ensemble mean. For instance, when using a GCM to drive an RCM, it is important that the GCM produces realistic output close to the boundaries of the RCM, whereas certain regional biases in the GCM may play a bigger role when studying a part of the AIS. In the literature, different criteria have been described for a GCM to perform "well", such as high resolution (Ren et al., 2011) and low bias (Murphy et al., 2002). However, these studies focus on either only one model or one criterion, instead of intercomparing a larger set of GCMs.
Comparisons of larger sets of GCMs have been done through the Paleoclimate Modelling Intercomparison Project Phase II (PMIP2, Braconnot et al., 2007), which has a large database with output from GCMs for the present, the Mid-Holocene (MH) and the LGM. Intercomparison studies of the models in this database have been done by, amongst others, Braconnot et al. (2007), Yanase and Abe-Ouchi (2007), Brewer et al. (2007) and Masson-Delmotte et al. (2006). Only the study of Masson-Delmotte et al. (2006) focuses on the polar regions (and therefore Antarctica). They conclude that the PMIP2 models' simulations agree reasonably well with ice-core signals for both the MH and the LGM, although there are uncertainties in the models' ice-sheet topography, which is based on ICE-5G (Peltier, 2004). However, their study focuses on the ensemble mean of all the models under consideration and less on the differences between models.
In order to decide which GCMs perform best in the Antarctic region, we compare the individual output of the models to ice-core reconstructions for the MH and LGM. Furthermore, as ice-core data have large uncertainty and do not cover the entire Antarctic region, we compare presentday GCM data to a reference state from RACMO2/ANT (Lenaerts et al., 2012a). RACMO2/ANT (simply "RACMO" hereafter) is a regional climate model, which has been developed especially for polar regions and has been thoroughly validated (e.g. van de Berg et al., 2005;Lenaerts et al., 2012b).

Method
Eighteen models from the PMIP2 database, see Table 1, are compared with reference data from RACMO for the presentday climate and with ice-core climate reconstructions for the past. The GCM data used for this study originate from coupled ocean-atmosphere models. Some of the models are closely related to others: UBRIS-HadCM3 and HadCM3 are much alike; CSIRO-1.1 is the same as CSIRO-1.0, but with a doubled oceanic resolution; MRI-fa uses flux adjustments for heat and water fluxes and wind stress, whereas MRI-nfa does not; and MIROC 3.2.2 is the same as MIROC 3.2, but an error in the land surface scheme of MIROC 3.2 has been corrected in MIROC 3.2.2, affecting the wind stress calculation over ice sheets and resulting in somewhat lower temperatures. Nonetheless, MIROC 3.2 has been included in this study because there are additional (Mid-Holocene) simulations available for this model.
The present-day reference state originates from RACMO, at a horizontal resolution of 27 km. RACMO is forced at its lateral boundaries by ERA-Interim reanalysis data for 20 yr . RACMO has been chosen because it provides data at a high resolution. Furthermore, temperature and precipitation have a smaller bias than reanalysis products such as ERA-40 or ERA-Interim (van de Berg et al., 2006(van de Berg et al., , 2007van de Berg, 2008;Ettema et al., 2010;Lenaerts et al., 2012b). The uncertainty in RACMO precipitation is about 10 % (Lenaerts et al., 2012b). The uncertainty in 2m air temperature is more difficult to determine, a plot of the difference between yearly average RACMO skin temperature and observed temperatures at 10 m depth is shown in Fig. 1. The difference between modelled and observed temperatures is small, except for parts of the Western Antarctic Ice Sheet.
The RACMO-domain runs from 90 • South to approximately 47 • South. We compared 2 m air temperature and annual mean precipitation from the GCMs with RACMOdata. To this purpose all GCM data are interpolated on the RACMO grid. No lapse rate correction has been applied to the GCM data to compensate for the mismatches in surface height between the GCMs and RACMO. This is because when the same analysis as the one described in this paper was performed on the GCM data with a lapse rate correction of −11.6 K km −1 (Masson-Delmotte et al., 2011), this did not change the results much and introduced uncertainties that were not in the GCM output initially.
The data are compared regarding bias, root mean square deviation (rmsd), and correlation coefficient (ρ): in which the subscripts G and R stand for GCM and RACMO, respectively, and x indicates the average of the variable x over all grid points i. The correlation coefficient indicates how well temperature and precipitation patterns are simulated by a model, whereas the bias (mean deviation of the model from the reference) and the rmsd (a measure for the absolute deviation of the model from the reference) quantify how much the model output deviates from the reference state as a whole. A distinction is made between results over the ice sheet, including ice shelves (Fig. 3) and results over the ocean (Fig. 4). The bias, rmsd and correlation coefficient have been chosen because together they give a good overview of whether the GCMs can reproduce the correct patterns and realistic absolute values of temperature and precipitation.
In the second part of this study, GCM output for the MH (6 ka) and the LGM (21 ka) is compared to the present. Differences between the past and the present are evaluated, using reconstructions from ice cores (see Fig. 2 for their locations). Temperature data are available from six ice cores for both the MH and the LGM: -EPICA Dome C (EDC), a deuterium excess based temperature reconstruction by Jouzel and Masson-Delmotte (2007).
-EPICA Dronning Maud Land (EDML), a deuterium and δ 18 O based temperature reconstruction. The T for the MH was read from Fig. 7b by Stenni et al. (2010), the T for the LGM is mentioned in their paper as well.
-Law Dome (LD), a δ 18 O reconstruction is given in Fig. 5 in van Ommen et al. (2004). Past temperatures may be calculated from this graph by using a conversion of 0.44 ‰ • C −1 . Details were communicated in van Ommen (2011).
Precipitation records are scarce as they are more difficult to derive from ice cores. The precipitation reconstructions used in this study are: -Law Dome, the accumulation rate is determined from a flow model together with age-ties in van Ommen et al. (2004), the reconstructions are given in Table 2 of their paper.
-Talos Dome, a δ 18 O based precipitation reconstruction is given in Fig. 7 in Buiron et al. (2011) in cm ice equivalent per year. To get the precipitation change in mm water equivalent per year, the number is multiplied by 9.2.
The model output is compared to ice-core data with respect to the temperature difference between the past and the present, the precipitation difference between the past and the present and the ratio of past to present precipitation, where both past and present-day data originate from the GCMs. The precipitation ratio is given because some models give a correct change in precipitation, but overestimate the actual amount both in the past and for the present-day. In this case the modelled ratio will be larger than the ratio deduced from the corresponding ice core. The comparison is carried out by interpolating the data from the four grid points of the GCM closest to the location of the ice core.
The goal of this study is to find the best models regarding simulations of temperature and precipitation. To do this a simple ranking system is introduced: the best model for a certain variable, e.g. temperature bias of the present-day output, gets 10 points, the next gets 9 points, etc. For every period (i.e. present-day, MH and LGM) these points are added up per model resulting in a ranking of the models for each of the periods. When multiple models have the same amount of points, the spread is taken into account. That is to say, a model is judged to be better when an intermediate number of points is scored for all the variables than the maximum number of points for only half of the variables. Figure 3a shows the bias (in red) and the rmsd (in blue) for the present-day temperature comparison between the PMIP2-models and RACMO over the ice sheet and ice shelves. The biases range from −3.8 K (MRI-nfa) to +4.8 K (Ecbiltclio) and the rmsd values go up to 10.3 K (Ecbiltclio) for the temperature. Temperature correlation coefficients (shown in Fig. 3c in red for the temperature) are close to 1, ranging from 0.89 to 0.97, for all models except for Ecbiltclio and Ecbiltcliove-code. Precipitation bias and rmsd are presented in Fig. 3b. The highest bias is +349 mm yr −1 for FGOALS, which also shows the highest rmsd value of 463 mm yr −1 . Precipitation correlation coefficients show a larger spread than for temperature, from 0.51 to 0.82. The largest bias and rmsd are found for the precipitation output of FGOALS and the temperature output of Ecbiltclio, which might be due to the low resolution of the model. As mentioned before, the model MIROC 3.2.2 should give lower temperatures (and therefore a smaller temperature bias) than MIROC 3.2 due to a corrected error in MIROC 3.2.2, which is indeed the case.

Present-day results
In Fig. 4 the same variables are presented as in Fig. 3, but for a domain that only incorporates the ocean grid points of RACMO. Again, the temperature correlation coefficients are mostly close to 1, ranging from 0.86 to 0.96. Precipitation correlation coefficients are slightly larger here than over the ice sheet (from 0.60 to 0.87). Rmsd values are smaller, while bias values are generally somewhat larger, i.e. more negative, over the ocean.
The four models that simulate the present-day climate best are UBRIS, HadCM3, ECHAM5 and IPSL for temperature and UBRIS, HadCM3, ECHAM5, and MIROC 3.2 for precipitation. This is based on the ranking method, applied on the combination of the results over the ice sheet and the ocean. The difference fields between these best models and RACMO are shown in Fig. 5 for temperature and Fig. 6 for precipitation.
A notable feature in Fig. 5 is that the modelled temperatures over the Ross ice shelf (see Fig. 2) are too high, which is the case for almost all models. At the locations of these ice shelves, land is modelled by the GCMs, which is only partly covered with ice. In contrast, over the Amery Ice Shelf region the models simulate too low temperatures. This is something to take into account when deciding on which model to use. For example, when focussing on West Antarctica, HadCM3 shows less (negative) bias there than the other models and might be a better choice because RACMO shows a negative bias here as well when compared to observations. The modelled temperatures are closer to the reference data over the ocean, at the edges of the domain.  3. The bias (red) and rmsd (blue) for temperature (a) and precipitation (b), and spatial correlation coefficients (c) for temperature (red) and precipitation (blue) for all PMIP2 models, as compared to the RACMO reference state. These results apply to the ice sheet, for the present-day climate.
Precipitation is generally overestimated inland. It is underestimated close to the coasts and strongly underestimated at the western side of the Antarctic Peninsula by all models. This is probably due to the fact that the steep orography of the Peninsula is not well represented in the GCMs. Consequently, the orographically enhanced precipitation is underestimated (Rojas et al., 2009).

Mid-Holocene results
Mid-Holocene temperature output from the models is compared to reconstructions from five ice cores in Table 2. The uncertainty ranges of these reconstructions are probably larger than the small differences in temperature between the MH and the present. The models also simulate small temperature differences between the MH and the present. However, the models do not capture the change in sign of the temperature differences between different locations, i.e. EDC and Fig. 4. The bias (red) and rmsd (blue) for temperature (a) and precipitation (b), and spatial correlation coefficients (c) for temperature (red) and precipitation (blue) for all PMIP2 models, as compared to the RACMO reference state. These results apply to the ocean, for the present-day climate.
Fuji were colder in the MH than in the present and the temperature difference was largest at Law Dome.
For temperature, the best models according to the ranking method are CSIRO-1.1 and IPSL. The spatial distribution of the temperature difference between the MH and the present (both MH and present temperature values are from GCM output) are shown in Fig. 7. Temperature differences are mainly positive, but small, except over the western South Pacific Ocean in Fig. 7b (IPSL). Although this difference between the models may not be of much importance when using GCM output in an ice-sheet model, it is important when only using the output to provide boundary conditions for an RCM. The negative temperature differences over the western ocean in IPSL cannot be affirmed nor negated by ice-core reconstructions. It may therefore be concluded that the comparison of model output with ice-core reconstructions gives an indication of which models are better than others, but it is not conclusive. This is even more true for precipitation, as is argued below. In Table 3 precipitation data are shown for three ice-core locations. The Law Dome data are not very accurate as only the average accumulation between age ties (2545 and 6778 yr ago) is known (van Ommen et al., 2004). At the Talos Dome location, the difference in precipitation between 6 ka and the present is captured by most GCMs, but the ratios are too high. This means that, at this location, the absolute amounts of precipitation are overestimated by the models in both present and past. This can be seen in Fig. 6 as well. Precipitation at the Vostok location is simulated quite accurately by most of the GCMs.
CCSM and Ecbiltcliove precipitation differences between the MH and the present are shown in Fig. 8, as these are the best models according to the ranking method. It is clearly visible that the patterns are not the same for these two models. The question remains which model is the better one. The differences between 6 ka and the present are small, and the biases are of the same order of magnitude. This makes it hard to distinguish between the GCMs in terms of performance for the MH.
To investigate the influence of biases in simulating the present climate on model performance when simulating the past, a signal-to-noise ratio has been calculated for both temperature and precipitation. The signal is the difference, in temperature or precipitation, between 6 ka and the present. The noise is the present-day bias of a model, as shown in Figs. 3 and 4. For precipitation the average signal-to-noise ratio of all GCMs is 0.09, which is very low. This means that the signal is practically indistinguishable from the data. The mean signal-to-noise ratio for temperature is 0.21. Combining this with the presumably large uncertainties in the ice-core reconstructions, compared to the signal, judgements about which models achieve the best results for the MH cannot be accurately made.

LGM results
In Table 4 modelled temperature differences between the LGM and the present are compared to data from five ice cores. At Law Dome the temperature difference is the largest, which is not captured by any of the GCMs, except for    Ecbiltclio. However, Ecbiltclio generally simulates too small temperature differences between the LGM and the present. This holds for CNRM as well, whereas FGOALS overestimates the temperature differences at four of the five locations. According to the ranking method, MIROC 3.2.2 and CCSM are the best models. Output from these models is shown in Fig. 9.
MIROC 3.2.2 simulates smaller temperature differences than CCSM, which is also visible in Table 4. Both models show larger temperature differences over West Antarctica, which is the case for almost all models with LGM output. This is probably due to the change in topography, as the difference between the LGM and the present in ice thickness of the West Antarctic Ice Sheet is larger than the difference of the East Antarctic Ice Sheet. This agreement between the models regarding the temperature pattern over the ice sheet gives some confidence when using either one as input in an ice-sheet model. However, when using the data to drive an RCM, the boundaries become important, as has been noted before, and the differences between MIROC 3.2.2 and CCSM might play a bigger role.
Modelled precipitation differences between the LGM and the present are compared to reconstructions for Law Dome, Talos Dome and Vostok data in Table 5. For Law Dome the LGM-precipitation was less than 10 % of the present-day value. Law Dome is located near the coast, where it receives   precipitation from cyclonic systems. These systems have probably changed since the LGM, causing a large change in precipitation in coastal regions (van Ommen et al., 2004). None of the models has captured this change, suggesting that the representation of cyclonic systems is deficient, something also noted by Rojas et al. (2009). Figure 10 shows the precipitation difference fields between the LGM and the present-day climate for HadCM3 and MIROC 3.2.2, which appear to be the best models regarding this variable. Overall the LGM was drier than the present, while the tip of the Antarctic Peninsula is modelled to have been wetter. This applies to most of the models and might be related to the underestimation of Western Antarctic Peninsula precipitation in the present-day output.
Signal-to-noise ratios have been calculated for the LGM as well to study the influence of the biases of the models  on the simulation of precipitation and temperature patterns at 21 ka. The average signal-to-noise ratio for temperature is 3.8, which is significantly larger than 1. Overall this ratio means that the signal of temperature change from the LGM to the present is discernible when studying the model output.
For precipitation the mean signal-to-noise ratio is 1.4, which is lower than the temperature signal-to-noise ratio. The precipitation signal-to-noise ratios are lower than the temperature signal-to-noise ratios for both the MH and the LGM. The reason for this is probably that precipitation is harder to model correctly than temperature, and therefore the biases are relatively larger. Although the signal-to-noise ratios for the LGM are higher than for the MH, it is still essential to be aware of the (present-day) bias of a model to correctly assess its output for the LGM.

Conclusions
In this paper we compared present-day output from GCMs to a reference state from the regional climate model RACMO2/ANT for the Antarctic region. We found that airtemperature patterns are generally well simulated, as the correlation coefficients between the GCM output and the reference data are close to 1. Temperature is generally more correctly simulated over the ocean than over the ice sheet. The temperature over the ice shelves is too high in most of the models, which is probably due to the fact that there is land at the locations of the ice shelves in the GCMs, which is only partly covered with ice. Precipitation patterns are also well simulated in general, but the amount of precipitation is often underestimated over the ocean. In addition, a strong negative bias is observed over the western coast of the Antarctic Peninsula. The GCMs probably do not resolve the circulation pattern and the orography well enough to simulate the additional precipitation in this region (Rojas et al., 2009). Considering temperature and precipitation results for the present-day, the top five models are HadCM3, UBRIS, which is a HadCM3-based model, ECHAM5, MIROC 3.2, and IPSL.
The differences in temperature and precipitation between the Mid-Holocene and the present are small in ice-core reconstructions and in the output from the GCMs. Generally, both temperature and precipitation are higher during the MH than in the present climate. The differences between the MH and the present are small, and the biases of the GCMs are of the same order of magnitude or even larger. Therefore, it is hard to judge individual model performances. For the MH, the signal-to-noise ratios are 0.21 for temperature and 0.09 for precipitation. These low signal-to-noise ratios indicate that to find a model that performs well when modelling the past, it is important to take its present-day performance into account. Furthermore, the uncertainties in ice-core data are presumably as large as the signal as well, making it even harder to judge the performance of the models. Based on the comparison between the output of the GCMs and the ice-core reconstructions for the MH, the five best models are Ecbiltcliove, CCSM, MRI-fa, MRI-nfa and CSIRO-1.0. However, in the final judgement of which GCMs perform best overall, the MH will not be taken into consideration as the biases in the models are too large to make the intercomparison trustworthy.
In the LGM, temperatures were lower and there was less precipitation than in the present-day climate, according to both ice-core reconstructions and GCM output. Also, the temperature difference between the LGM and the present is modelled to be larger over the West Antarctic Ice Sheet than over the East Antarctic Ice Sheet by most GCMs. The precipitation differences between the LGM and the present over the Antarctic Peninsula are generally modelled to be smaller than elsewhere, or even positive (wetter at the LGM than in the present). The differences between the past and the present are larger for the LGM than for the MH, and therefore the signal-to-noise ratios are higher: 3.8 for temperature and 1.4 for precipitation. This means that more confidence can be had in the ranking of the models, which points out MIROC 3.2.2, CCSM, HadCM3, ECHAM53 and MIROC 3.2 as the five best GCMs for the LGM.
The low signal-to-noise ratios indicate large uncertainties in the output of the models, but there are other sources of uncertainties in the comparison between model results and ice-core reconstructions. The first source, important to the judgement of present-day performance of the GCMs, is the uncertainty in RACMO-data. This is negligible in this particular study according to van de Berg (2008); Lenaerts et al. (2012a). The second source is the uncertainty in the ice-core reconstructions; part of this is due to the uncertainty in temperature and precipitation reconstruction and part is due to the uncertainty in the determination of the age of the ice in the ice core. The third source is the elevation. As Masson-Delmotte et al. (2006) state in their paper, there probably is a discrepancy between the elevation at which the surface was in the past and the elevation that is used in the models. However, the past elevation of the ice sheet is not known with great accuracy either, nor is the lapse rate, so we decided not to correct for this discrepancy, which is probably within the uncertainty margin of the ice-core reconstructions.
To conclude, some models simulate temperature and precipitation significantly better than others, according to our ranking methods. Not all models provided data for the MH or the LGM, but the results for the MH are judged to be less significant due to large relative uncertainty in model output. Finally, considering both present-day and past simulations, the best performing models according to our comparison, in simulating temperature and precipitation in the Antarctic region, are HadCM3 and MIROC 3.2.2.