Historical and idealized climate model experiments: an intercomparison of Earth system models of intermediate complexity

. Both historical and idealized climate model experiments are performed with a variety of Earth system models of intermediate complexity (EMICs) as part of a community contribution to the Intergovernmental Panel on Climate Change Fifth Assessment Report. Historical simulations start at 850 CE and continue through to 2005. The standard simulations include changes in forcing from solar luminosity, Earth’s orbital conﬁguration, CO 2 , additional greenhouse gases, land use, and sulphate and volcanic aerosols. In spite of very different modelled pre-industrial global surface air temperatures, overall 20th century trends in surface air temperature and carbon uptake are reasonably well simulated when compared to observed trends. Land carbon ﬂuxes show much more variation between models than ocean carbon Published by Copernicus Publications on behalf of the European Geosciences Union. ﬂuxes, and recent land ﬂuxes appear to be slightly underestimated. It is possible that recent modelled climate trends or climate–carbon feedbacks are overestimated resulting in too much land carbon loss or that carbon uptake due to CO 2 and/or nitrogen fertilization is underestimated. Several one thousand year long, idealized, 2 × and 4 × CO 2 experiments are used to quantify standard model characteristics, including transient and equilibrium climate sensitivities, and climate–carbon feedbacks. The values from EMICs generally fall within the range given by general circulation models. Seven additional historical simulations, each including a single speciﬁed forcing, are used to assess the contributions of different climate forcings to the overall climate and carbon cycle response. The response of surface air temperature is the linear sum of the individual forcings, while the carbon cycle response shows a non-linear interaction between land-use change and CO 2 forcings for some models. Finally, the preindustrial portions of the last millennium simulations are used to assess historical model carbon-climate feedbacks. Given the speciﬁed forcing, there is a tendency for the EMICs to underestimate the drop in surface air temperature and CO 2 between the Medieval Climate Anomaly and the Little Ice Age estimated from palaeoclimate reconstructions. This in turn could be a result of unforced variability within the climate system, uncertainty in the reconstructions of temperature and CO 2 , errors in the reconstructions of forcing used to drive the models, or the incomplete representation of certain processes within the models. Given the forcing datasets used in this study, the models calculate signiﬁcant land-use emissions over the pre-industrial period. This implies that land-use emissions might need to be taken into account, when making estimates of climate–carbon feedbacks from palaeoclimate reconstructions.


Introduction
Climate models are powerful tools that help us to understand how climate has changed in the past and how it may change in the future. Climate models vary in complexity from highly parameterized box models to sophisticated Earth system models with coupled atmosphere-ocean general circulation model (AOGCM) subcomponents, such as those involved in the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor et al., 2012). Different models are designed to address different scientific questions. Simple models are often useful in developing and understanding individual processes and feedbacks, or teasing apart the basic physics of complex systems. However, they usually lack the complex interactions that are an integral part of the climate system. Current "state-of-the-art" Earth system models are both sophisticated and complex, but the number and length of simulations that can be performed is limited by the availability of computing resources. Another class of models, known as Earth system models of intermediate complexity (EMICs), helps fill the gap between the simplest and the most complex climate models .
Usually EMICs are complex enough to capture essential climate processes and feedbacks while compromising on the complexity of one or more climate model component. Often EMICs are used at lower resolution, and model components may have reduced dimensionality. While generally simpler, EMICs sometimes include more subcomponent models than Earth system AOGCMs. New subcomponents (for example, continental ice sheets, representations of peatlands, wetlands or permafrost) are often developed within the EMIC framework before they are embedded into coupled AOGCMs, because development and testing is less computationally expensive. In addition, there are some processes operating within the Earth system (e.g. carbonate dissolution from sediments or chemical weathering) with very long inherent timescales that can only be integrated by EMICs.
With relatively abundant proxy data, the last millennium is an important test bed for validating the longer term climatecarbon response of models. Understanding the role of climate forcing over this period will help to reconcile any inconsistences between the models and the various palaeo-forcing and data reconstructions and improve our confidence in future simulations. There are a very limited number of studies that have modelled the climate-carbon cycle response over this period (Gerber et al., 2003;Goosse et al., 2010;Jungclaus et al., 2010), and while a few AOGCMs have carried out climate simulations over the last millennium (for example, González-Rouco et al., 2003;Jungclaus et al., 2010;Swingedouw et al., 2010;and Fernández-Donado et al., 2012), these studies were limited in the number of experiments that could be performed. See Fernández-Donado et al. (2012) for a comprehensive review of the current state of climate simulation and reconstruction over the last millennium.
Given their relatively rapid integration times, EMICs are capable of simulating climate change over many thousands of years. They are able to perform the many simulations needed to investigate the sensitivity of climate to various external forcings. EMICs often have relatively low levels of internal variability, which makes them particularly useful in experiments investigating the forced response of the climate system. Many EMICs also include long timescale carbon processes such as changes in carbonate sedimentation, wetlands or permafrost, processes which are usually missing in models designed for shorter simulations. These characteristics make EMICs particularly well suited to investigate the forced climate-carbon response over the last millennium.
Over the years a number of model intercomparison projects have been designed using EMICs (e.g. Pethoukhov et al., 2005;Rahmstorf et al., 2005;Brovkin et al., 2006;Plattner et al., 2008). More typically, however, EMICs have been included in model intercomparisons with coupled AOGCMs (e.g. Gregory et al., 2005;Stouffer et al., 2006;Friedlingstein et al., 2006). Results from EMIC simulations descriptions are provided in Appendix A. Unlike the EMICs cited in the AR4, several models now calculate land-use change carbon fluxes internally (B3, DC, GE, I2, ML, UV) and/or include ocean sediment and terrestrial weathering (B3, DC, GE, UV). Eight EMICs (B3, DC, GE, I2, ME, ML, UM and UV) include interactive land and ocean carbon cycle components that allow them to diagnose emissions that are compatible with specified CO 2 concentrations.

Methods
To be consistent with other intercomparison projects, forcing for the initial condition and the historical period were obtained from the Paleoclimate Modelling Intercomparison Project Phase 3 (PMIP3) and CMIP5-recommended datasets. Specified forcings included orbital configuration (from Berger, 1978), trace gases from various ice cores (Schmidt et al., 2012;Meinshausen et al., 2011), volcanic aerosols (Crowley et al., 2008), solar irradiance (Delaygue and Bard, 2009;Wang et al., 2005), sulphate aerosols (Lamarque et al., 2010) and land use (Pongratz et al., 2008;Hurtt et al., 2011). The warming from black carbon and the indirect effect of ozone, and the cooling from the indirect effect of sulphate aerosols were not included. Forcing data from PMIP3 and CMIP5 were concatenated or linearly blended before 1850 when necessary. From 1850 to 2005, all specified forcings are identical to the historical portion of the Representative Concentration Pathway (RCP) scenarios. See Appendix B for details on the implementation of the forcing protocol and how this may differ between models. All years in this paper are given as Common Era (CE) unless stated otherwise.
Diagnosed model equivalent radiative forcing estimates, for the various specified external forcings, are shown in Fig. 1. Given the large diversity of EMICs, not all models are able to apply the specified forcing in the same way. Some diagnosed estimates of equivalent radiative forcing are only approximate. This is especially true for sulphate aerosol and land-use forcing, which are more complex to implement and diagnose than most of the other externally specified forcings. The multi-model mean radiative forcing estimates for the direct effect of trace gases, sulphates and land-use change are very similar to estimates from the Task Group: RCP Concentrations Calculation and Data (Meinshausen et al., 2011), and present-day estimates are similar to those given in the AR4 (Forster et al., 2007). The ranges in diagnosed radiative forcing in Fig. 1 are also similar to the ranges from the models described by Fernández-Donado et al. (2012), with most of the variation between models due to the implementation of anthropogenic forcing (which includes aerosols and land-use change).
Models that participated in the historical last millennium simulations (B3, C2, C3, DC, GE, IA, I2, LO, ME, MI, UM, UV) used steady forcing to create the initial equilibrium state. Most models started from a steady state using forcing www.clim-past.net/9/1111/2013/ Clim. Past, 9, 1111-1140, 2013  with uncoupled (Müller et al., 2006) hydrology (Wania Strassmann et al., 2008;et al., 2009aet al., ,b) Stocker et al., 2011, BV  C2: CLIMBER-2.4 SD, 3-D, CRAD,  (Cox et al., 1999) Totterdell, 2001 al., 2012) (Pope et al., 2000) L20 (Gordon et al., 2000) GE: GENIE EMBM, 2-D(ϕ, λ),  (Holian et al., 2001), N/A N/A (Sokolov et al., ICL, CHEM, anomaly diffusing, al., 1984) al., 2008) BT (Melillo et al., 1993; (Hansen et Felzer et al., 2004;Liu, (Sokolov andal., 1984) 1996) Stone , (Brovkin et al., 2002) L30 (Huybrechts, 2010) 1998) Fichefet, 1999) Fig. 1. Annual average radiative forcing estimates, from specified forcing changes in the historical simulations, for 11 of the participating models. Changes in radiative forcing from changes in solar irradiance are shown in (a), volcanic aerosols in (b), greenhouse gases in (c), land use in (d), the direct effect of sulphate aerosols in (e), and all forcings in (f). Due to the complexity of implementing forcing from land use and the direct effects of sulphate aerosols in some models, the diagnosed radiative forcing estimates should be considered to be only approximate. Models that specified equivalent task group radiative forcings may be exactly overplotted and appear to be missing (see Appendix B for details on forcing). Land-use radiative forcing was not available for model I2. Radiative forcing is plotted as an anomaly of the multi-model mean over the century centred at year 900. Forcing estimates in panels (a) to (e) are unfiltered while the results in (f) have been processed with a 30 yr moving average, rectangular filter. Changes in radiative forcing due to changes in orbit are much smaller than any of the other forcings and are not shown separately, but they are included in the total radiative forcing shown in (f).
from the year 850, but the B3 model started from equilibrium at the Last Glacial Maximum in order to ensure that its permafrost and peatland components were in a consistent initial state by the year 850.
Models were then integrated to the year 2005 under various specified transient forcings. Nine historical simulations with specified CO 2 concentrations were performed. Seven simulations specified only one of the following transient forcings: non-CO 2 trace gases, CO 2 , land use, solar luminosity, orbital parameters, sulphate aerosols or volcanic aerosols. The other two simulations specified either all or none of these forcing changes. The simulation with no changes in forcing is merely a continuation of the equilibrium simulation and can be considered a control experiment. The all-forcing simulation was used as the initial condition for future simulations in Zickfeld et al. (2013).
For models with complete carbon cycles (B3, DC, GE, I2, ME, UM and UV), three additional free CO 2 historical Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ simulations were performed. In these experiments, the initial conditions and forcings were the same as for the equivalent historical, specified CO 2 simulations, but CO 2 concentrations were no longer specified. In the equivalent "all" forcing experiment, changing anthropogenic CO 2 emissions were specified. In the equivalent "control", no anthropogenic CO 2 emissions were applied. The third simulation had changes only in natural forcings (orbital parameters, solar luminosity and volcanic aerosols) and no anthropogenic CO 2 emissions. Several idealized experiments were also performed in order to calculate standard climate and carbon cycle metrics. All of these experiments were started from an equilibrium state with a CO 2 concentration near 280 ppm and integrated for 1000 yr. There were seven idealized experiments performed in total. Two experiments specified an instantaneous increase of CO 2 to a constant concentration at 2 × and 4 × the initial concentration. These were used to help assess equilibrium climate sensitivity. Another experiment specified an instantaneous increase to 4 × the initial CO 2 but then allowed CO 2 to evolve freely. This experiment was used to determine the time scales over which carbon perturbations are removed from the atmosphere.
The other four idealized experiments specified an increase in CO 2 at 1 % per year until reaching 4 × the initial CO 2 concentration. One experiment allowed CO 2 to freely evolve after reaching 4 × the initial concentration. This experiment was used to assess the models' carbon-climate response (CCR). This is calculated as the change in surface air temperature (SAT) divided by the total amount of accumulated emissions from some reference period (Matthews et al., 2009). For the other three experiments, which started with an increase in CO 2 at 1 % per year, CO 2 was fixed after reaching 4 × the initial concentration. These experiments were used to determine the models' carbon cycle feedbacks. One experiment was fully coupled, one excluded the changes in radiative forcing from increasing CO 2 , and one excluded the direct effects of increasing CO 2 on land and ocean carbon fluxes. The CO 2 concentration-carbon sensitivity can be calculated directly as the change in land or ocean carbon, in the experiment that excludes the radiative forcing from increasing CO 2 (radiatively uncoupled), divided by the change in atmospheric CO 2 . The climate-carbon sensitivity can be calculated directly as the change in land or ocean carbon, in the experiment that excludes the effects of increasing CO 2 on land and ocean carbon fluxes (biogeochemically uncoupled), divided by the change in SAT. The fully coupled simulation was also used to assess transient climate sensitivities, ocean heat uptake efficiency and zonal amplification. Ocean heat uptake efficiency is the global average heat flux divided by the change in SAT, and zonal temperature amplification is the zonal average SAT anomaly divided by the global average SAT anomaly.

Historical climate response
There is a very large range in the absolute SAT simulated over the historical period by the models involved in this intercomparison. Absolute SAT is a difficult quantity to measure, but Jones et al. (1999) estimate the absolute global average value of SAT to be approximately 14 • C between the years 1960 and 1990. As seen in Fig. 2a, the annual average SAT at 850 varies from 12.3 to 17.2 • C. All of the models are using similar externally specified forcing, so this large range in initial conditions must be due to internal model differences. Most comparisons between models and observational datasets only examine anomalies from a particular reference period (as in Fig. 2b). However, the large differences between initial states might influence the models' responses to changing climate forcing. Some feedbacks, such as the albedo changes from reductions in snow or ice, and hence an individual model's climate sensitivity (Weaver et al., 2007), would likely depend on the models' initial state.
Although the average model trend over the 20th century (0.79 • C) is close to the observed trends (0.73 • C) (Morice et al., 2012), there is still considerable spread in model response (0.4 to 1.2 • C). One difficulty in comparing EMICs involves the large variation in complexity between models. The implementation of some of the forcing needs to be highly parameterized or even specified in many models. Aerosols and landuse change can be particularly challenging to implement. A few EMICs are not able to apply all of the forcings specified in the experimental design, which adds to the model spread.
For the specified external forcings over the 20th century, five models appear to stay mostly within the observational uncertainty envelope for this period, five tend to overestimate the observed trends, and two tend to underestimate the trends (Fig. 2b). The model with the largest trend (ME) did not include any sulphate aerosol forcing. Without the cooling associated with this forcing, the model would be expected to overestimate 20th century warming. On the other hand, the UM model, which simulates the 20th century trend well, includes estimates of the indirect effect of sulphate aerosols but not the countering ozone and black carbon forcing (see Appendix B). Given the large number of models included in this intercomparison, the variation in the application of external forcing appears to average out, and the model mean trend agrees well with observations. It is unclear if there is a relationship between the preindustrial climate state and the climate response over the 20th century. The two models with the strongest 20th century response also start from the coldest initial state. The model with the weakest response starts from the second warmest state. Using all models, the linear correlation (r) between initial state and 20th century warming is about 0.4. If the two models with the strongest and the model with the weakest response are excluded, there is no clear relationship (r = 0.2). . The dark grey line shows changes in SAT, and the grey shading indicates the uncertainty, from Morice et al. (2012). The model results and the data estimates are shown as anomalies from the average over the decade centred at year 1900 and have been processed with a 5 yr moving average, rectangular filter. Temperature anomalies before 1850 are shown in (c). The dark grey line shows changes in global SAT, and the grey shading indicates the uncertainty, from the errorin-variables (EIV) reconstruction of Mann et al. (2008). The grey dashed line shows the model mean, and the light grey shading the model range, of the CMIP5 models that carried out the PMIP3 "last millennium" experiment. The model results and the data estimates are shown as anomalies from the average over the entire period (900 to 1850 or 900 to 1800 for the CMIP5 models) and have been processed with a 30 yr moving average, rectangular filter. Given that many factors other than initial state influence a model's 20th century climate response, a strong relationship is not expected.
Over the pre-industrial period (Fig. 2c), the models tend to show a relatively weak response compared to the Mann et al. (2008) global SAT reconstruction. The reconstruction indicates a relatively warm period near year 1000 (often called the Medieval Climate Anomaly, or MCA) and a cooler period near year 1700 (often referred to as the Little Ice Age, or LIA). In terms of anomalies from today, it is mostly the MCA that is not well reproduced by the models since they agree reasonably well with the reconstructed difference in temperature between the LIA and present climate (∼ 1 • C).
It does appear that, on average, the models show a stronger cooling response to several large volcanic eruptions than is indicated by the reconstructions (see eruptions near years 1280 and 1810 in Fig. 2). Given that volcanic forcing is generally short-lived, it is difficult to determine if the strong cooling response is due to excessive volcanic forcing or a lack of temporal resolution in the SAT reconstruction. It is also possible some of the proxies used in the reconstructions are not very sensitive to volcanic cooling . Temperature reconstructions over the last millennium (as in Mann et al., 2008or Frank et al., 2010 tend to show little agreement. Several CMIP5 models have also carried out the PMIP3 "last millennium" or CMIP5 "past1000" experiment (bcc-csm1-1, CCSM4, FGOALS-gl, GISS-E2-R, IPSL-CM5A-LR, MIROC-ESM, MPI-ESM-P), and these results are available through the CMIP5 data portal. The range and mean of these CMIP5 model simulations are also shown in Fig. 2c. The mean response of the CMIP5 models is very similar to the mean EMIC response, including a strong SAT response to volcanic forcing. This is perhaps expected since both sets of models are following the same PMIP3 external forcing protocol. A similar strong response to volcanic forcing is also seen in Fernández-Donado et al. (2012). Given the low levels of internal variability in EMICs, the range in the EMIC's SAT response is mostly contained within the CMIP5 model range. This weakly forced response between the MCA and LIA, found here in both EMICs and CMIP5 models, is also seen in several other modelling studies -especially for models that specify weak solar irradiance variation between these periods (Goosse et al., 2010;Jungclaus et al., 2010;Servonnat et al., 2010;Fernández-Donado et al., 2012). However, Feulner (2011) finds good agreement with Northern Hemisphere temperature reconstructions for a solar constant of 1361 W m −2 , weak solar variations and the Crowley (2000) volcanic forcing.
The amount of heat taken up by the ocean is an important factor in determining transient climate response and sea level change. The models' changes in ocean heat content over the 20th century are shown in Fig. 3a. While the data estimates are only to 2000 m depth and the model heat content change shown is over the entire ocean depth, it appears  Levitus et al. (2012). Ocean heat content (a) and thermosteric sea level (b) are plotted as anomalies from the year 1957 to 2005 average. Note that the model heat content and sea level changes are averages over the entire ocean so these would be expected to be somewhat larger than the values estimated over the first 2000 m. The North Atlantic overturning index (c) is shown as a percent change from the decade centred at year 1900 and was processed with a 30 yr moving average, rectangular filter. that many models may be overestimating ocean heat uptake. Some of the modelled differences from observations could be due to too much or too little surface warming. The two models that agree well with ocean heat uptake estimates are the same models that slightly underestimate atmospheric surface warming over the 20th century. Estimates of past thermosteric sea level rise (Fig. 3b) show similar differences between the models' and data estimates, with the models generally simulating more thermosteric sea level rise than observed. This is not surprising since the largest component of thermosteric sea level changes is from changes in ocean heat content.
The response of the thermohaline circulation in the Atlantic, as indicated by a simple Atlantic meridional overturning index (defined as the maximum value of the overturning stream function in the North Atlantic), indicates a moderate slowing in all models (between about 0.8 and 2.1 Sv, or from 3 to 13 %). Direct measurements of the thermohaline circulation are difficult, and trends are hard to distinguish from natural variability. There is, therefore, still some controversy as to the response of the meridional overturning circulation over the 20th century (Latif et al., 2006). However, this moderate response to a warming climate is similar to what has been seen in previous studies (Plattner et al., 2008).

Idealized climate response metrics
In order to assess the models' responses in a more controlled environment, several standard idealized experiments were performed. Idealized experiments with CO 2 increasing at a rate of 1 % per year until reaching two or four times the initial level of pre-industrial CO 2 were used to assess the transient climate response and equilibrium climate sensitivities. Here we define the equilibrium climate sensitivity to be the change in SAT after 1000 yr, even though the models are not truly in equilibrium. There is a relatively large range in the equilibrium climate sensitivity (see Fig. 4a and Table 2). Equilibrium climate sensitivity at 2 × CO 2 ranges between 1.9 and 4.0 • C, and at 4 × CO 2 between 3.5 and 8.0 • C. For comparison, the range of 2 × CO 2 equilibrium climate sensitivity for CMIP3 models is 2.1 to 4.4 • C, and the range for CMIP5 models is estimated to be 2.1 to 4.7 • C (Andrews et al., 2012). The ocean heat uptake efficiency was also calculated from this 4 × CO 2 idealized experiment, and it is interesting to note that the model with the highest uptake efficiency (LO) is also one of the models with the lowest ocean heat uptake over the 20th century. This implies that the lower than average (but closer to observed) heat uptake is most likely due to the model's lower than average 20th century warming. This may not be the case with B3, which also has lower than average 20th century ocean heat uptake, but shows one of the lowest heat uptake efficiencies. Figure 4b shows the models' zonal SAT amplification, which is calculated as the zonal SAT change divided by the global mean change at year 140 of the 1 % increase to 4 × CO 2 experiment. The temperature amplification at year 70 (when 2 × CO 2 is reached, not shown) is similar. Some models (DC, ME, MI, ML) exhibit very little polar amplification, showing a nearly flat zonal response. Two models (IA and LO) show polar amplification in the north to be larger than 3.0. Although the model with the highest polar amplification has one of the lowest climate sensitivities, and starts from one of the warmest initial states, there appears to be no simple relationship between polar amplification, climate sensitivity and initial state.

Historical carbon-cycle response
The ability to reproduce trends in the carbon cycle is another important requirement for models that are used to predict the fate of anthropogenic carbon. For the historical all-forcing experiment (as for RCPs), CO 2 concentrations are specified, but models with a complete carbon cycle can still calculate emissions that are compatible with the specified CO 2 . The overall average EMIC carbon cycle response for the 1990s is within the uncertainty range of estimated values, except for diagnosed emissions, which are slightly underestimated (see Table 3). The EMIC mean in Table 3 excludes the two models (ME, UM) that do not transfer carbon with land-use change. These models would be expected to overestimate diagnosed emissions due to the lack of emissions from land-use change. This can be seen in the accumulated fluxes from 1800-2000, where they underestimate land fluxes to the atmosphere, and thus overestimate total diagnosed emissions by 61 to 103 Pg of carbon.
The fluxes of carbon to the atmosphere are shown in Fig. 5a. All models reproduce the estimated fluxes to the ocean within the uncertainty ranges, between 1980 and 2005. Although all models remain within the large range of uncertainty for land fluxes in the 1980s, many appear to underestimate recent land fluxes, especially since 2000. A few models are able to reproduce trends in emissions reasonably well, but most underestimate recent emissions, apparently because of insufficient net uptake on land. Figure 5b shows the accumulated fluxes of carbon since 1800. The integral changes in pools (or emissions) are also shown with associated uncertainty as cross bars at 1994 (estimates from Sabine et al., 2004). Again, all models estimate total ocean uptake within the range of uncertainty (see Table 3). Total land pool changes are much more variable, with only half of the models estimating fluxes within the range of uncertainty. However, most models do remarkably well at estimating total emissions between 1800 and 1994. Two models overestimate total emissions, and one underestimates emissions. Figure 5c breaks down the land fluxes into two components. The land-use change (LUC) flux component is estimated from a simulation with only land-use change forcing. The residual flux is the total flux from an all-forcing simulation minus the LUC component. As in Houghton (2008), the LUC component should not include any interaction between land-use change and changes in climate or CO 2 . Any interaction terms are part of the residual flux. As expected, the UM model, which does not directly include carbon transfer as part of the model's land-use forcing, shows near-zero LUC carbon fluxes. If the diagnosed LUC flux is correct, then most other models also appear to underestimate carbon fluxes to the atmosphere from LUC. Most models also tend to underestimate residual land uptake.
It would appear that any underestimation of LUC and the residual flux partially cancel, allowing some models to generate reasonable overall land fluxes. If LUC fluxes were higher, then the residual uptake by land would need to be greater. In general, it would appear that all models are either overestimating the response of the land carbon cycle to climate change or not taking up sufficient carbon through fertilization of vegetation (either from CO 2 or deposition of N). Only the I2 model includes an interactive land nitrogen cycle, which incorporates nitrogen limitation of photosynthesis. None of the models include anthropogenic nitrogen deposition as part of their vegetation component forcing. There is a great deal of uncertainty in future vegetation response, but even the  1980-1989, 1990-1999 and 2000-2005 in (a) and (c) are from Table 1 in Denman et al. (2007).
The data and uncertainty estimates at year 1994 in (b) are from Sabine et al. (2004). The solid black line in (b) indicates fossil fuel emission estimates from Boden et al. (2012) and in (c) the LUC flux estimates from Houghton (2008). current response does not appear to be well simulated by most models. It is possible that the diagnosed partitioning between the LUC and residual flux is poorly estimated by the LUC only forcing simulation -at least in terms of the definition of Houghton (2008). There may be a small feedback on the carbon cycle due to the cooling from local albedo changes. Additional simulations with GENIE (not shown) indicate that there is likely some underestimation of LUC fluxes in the simulation with only land-use forcing due to the climatecarbon feedback on soil respiration. The cooling from the albedo change reduces soil respiration, which in turn reduces the apparent LUC flux. Simulations with fixed albedo (and thus climate) indicate that this underestimation may be significant. Further investigation is required to determine if this small feedback leads to a significant underestimation of the LUC flux in other models.

Idealized carbon-cycle response metrics
Standard carbon cycle metrics are also calculated from specified 1 % increasing to 4 × CO 2 experiments. In addition to the standard fully coupled experiment, two additional partially coupled experiments were done by EMICs with a complete carbon cycle. One experiment excluded only the direct greenhouse radiative effects of increasing CO 2 (radiatively uncoupled) while the other experiment excluded only the direct effects of increasing CO 2 on land and ocean carbon fluxes (biogeochemically uncoupled). For specified CO 2 experiments, the CO 2 concentration-carbon sensitivity (β) can be calculated directly as the change in land or ocean carbon divided by the change in atmospheric CO 2 in a radiatively uncoupled simulation. The climate-carbon sensitivity (γ ) is calculated directly as the change in land or ocean carbon divided by the change in SAT in a biogeochemically uncoupled simulation.
These parameters are calculated differently from the C 4 MIP intercomparison project (Friedlingstein et al., 2006) due to the specification of CO 2 concentrations rather than emissions, which results in somewhat lower estimates of γ (Plattner et al., 2008;Gregory et al., 2009;Zickfeld et al., 2011). Carbon cycle sensitivities can also be calculated indirectly from the difference between a fully coupled simulation and either a biogeochemically uncoupled simulation (for γ ) or a radiatively uncoupled simulation (for β). The value of γ , and to a lesser extent β, is highly dependent on the method of calculation for models with large nonlinear climate and CO 2 interactions (Zickfeld et al., 2011). Plattner et al. (2008) calculated β directly and γ indirectly.
Directly calculated sensitivities at year 140 (and year 995) are shown in Table 4, and Fig. 6 shows how these sensitivities change through time. The CO 2 concentration-carbon sensitivities for land (β L ) are relatively constant after 140 yr (CO 2 quadrupling) for most models. This is similar for γ L , except for the B3 model and, to a lesser extent, the UM model. The www.clim-past.net/9/1111/2013/ Clim. Past, 9, 1111-1140, 2013 Table 2. Standard metrics that help characterize model response. T PI is the average surface air temperature between 850 and 1750, and T 20th is the change in surface air temperature over the 20th century, both from the historical "all" forcing experiment. TCR 2X , TCR 4X , and ECS 4X are the changes in global average model surface air temperature from the decades centred at years 70, 140, and 995 respectively, from the idealized 1 % increase to 4 × CO 2 experiment. The ocean heat uptake efficiency, κ 4X , is calculated from the global average heat flux divided by TCR 4X for the decade centred at year 140, from the same idealized experiment. Note that ECS 2x was calculated from the decade centred at about year 995 from a 2 × CO 2 pulse experiment.

Model
T  Table 3. Average carbon fluxes to the atmosphere over the 1990s and accumulated carbon fluxes from 1800 to 1994. Land LUC is an estimate of land-use change fluxes from simulations with only land-use forcing. Land Res is the residual land flux, which is derived from the land flux from a simulation with all forcing minus Land LUC . All other model fluxes are from differences between the all-forcing and control simulations. Estimates of average fluxes are from Table 7.1 of Denman et al. (2007), and estimates of accumulated fluxes are from Table 1 of Sabine et al. (2004). The change in atmospheric carbon storage between 1800 and 1994 is estimated to be 164 ± 4 Pg in Sabine et al. (2004). Although the change in CO 2 between 1800 and 1994 was specified to be 75 ppm in the all-forcing simulations, due to different estimates of atmospheric volume, the change in atmospheric carbon storage in the models is between 158 and 165 Pg. 3 to −0.9 −2.6 to −1.8 6.0 to 6.8 11 to 67 −137 to −99 224 to 264 * The ME and UM models were excluded from the EMIC model mean and range calculations, because they did not include any direct carbon exchange due to changes in land use. Only the total land flux is reported for these models. , 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ Table 4. Carbon cycle sensitivities and metrics from idealized 4 × CO 2 experiments. β L (or β O ) is the change in land (or ocean) carbon divided by the change in atmospheric CO 2 in a radiatively uncoupled simulation. γ L (or γ O ) is the change in land (or ocean) carbon divided by the change in atmospheric temperature, in a biogeochemically uncoupled simulation. CCR is the carbon-climate response and is calculated as atmospheric temperature change divided by diagnosed emissions (Matthews et al., 2009). β L , γ L , β O , γ O , and CCR are all averages over the decade centred at about year 140 from a 1 % increase to 4 × CO 2 experiment. Numbers in parentheses are averages over the decade centred at year 995. Yr50 4X is the year that emissions remaining in the atmosphere fall below 50 %, from an instantaneous 4 × CO 2 pulse experiment.

Clim. Past
Model  large continuing change in γ L in the B3 model is likely due to the inclusion of permafrost in that model, which reacts to climate change over much longer timescales than most other land processes. The I2 model has the lowest value of β L , and this is likely due to nitrogen limitation reducing land uptake in this model . Over the ocean, the changes in both sensitivities, β O and γ O , largely occur after year 140 (CO 2 quadrupling). As expected, the response time of most land processes (possibly excluding permafrost and peat) is much faster than the response time of the ocean to either CO 2 or climate change.
www.clim-past.net/9/1111/2013/ Clim. Past, 9, 1111-1140, 2013 The dashed lines in Fig. 6 show sensitivities calculated indirectly (as differences from fully coupled simulations). If γ L is calculated indirectly, the I2 model indicates a positive rather than a negative land sensitivity (see Fig. 6b). This is due to a strong interaction between climate warming, which causes an increase in nitrogen availability and photosynthesis, and land carbon uptake. When calculated indirectly, the interaction is strong enough to change the climate-carbon feedback on atmospheric CO 2 from positive to negative for the I2 model. UM also has a positive γ L for a short period when calculated indirectly, while all other models always show negative γ L , using either method of calculation. UV, B3, GE and ME always show more negative γ and β, while I2 and UM always show more positive γ and β, when these sensitivities are calculated indirectly rather than directly. ML shows more positive sensitivities for land and more negative for the ocean, while DC is the opposite, when γ and β are calculated indirectly rather than directly. Presumably the nonlinear interactions that cause both γ and β to always change in the same direction (depending on the calculation method, for either the ocean or land) must be very different in the models. Figure 7a shows the residence time of CO 2 emissions from a 4 × CO 2 pulse simulation. This pulse is equivalent to about 1800 Pg. As seen in Table 4, the EMIC mean time for half of the emitted CO 2 to be absorbed by the land and ocean sinks is 130 yr. This is considerably longer than the estimate of 30 yr to remove 50 % of emissions given in Chapter 7 of the AR4 (Denman et al., 2007). The main reason for this difference is that the emission pulses used to assess the CO 2 absorption timescales in the AR4 were small (40 Pg) compared to both the pulse used here and any likely future emissions. Absorption timescales depend on the amount of emissions (Maier-Reimer and Hasselmann, 1987;Archer et al., 2009;, and this could have been stated more clearly in Chapter 7 of the AR4. Two models (B3 and ME) show considerably longer times to absorb half of emissions. The longer time for the B3 model is likely due to the increasing climate feedback over land due to the inclusion of permafrost and peat in that model.
The carbon-climate response has been proposed as a simple metric that combines both the climate and carbon cycle sensitivities into a single value. It has been suggested that this metric is relatively insensitive to emission scenarios and is approximately constant over several hundred years (Matthews et al., 2009). Figure 7b shows the CCR from a 1 % increasing CO 2 experiment which has zero emissions after reaching 4 × CO 2 . The EMIC results show that, at least for this scenario, CCR is not constant over time for any of the models, although the intra-model range is smaller for most models than the inter-model range. This metric decreases in all models until emissions are set to zero. After CO 2 is allowed to freely evolve, CCR generally increases and then declines in most models. After emissions are set to zero, any changes in CCR are just due to changes in SAT and so  (a), and the carbon-climate response (CCR) from a 1 % increase to 4 × CO 2 experiment is shown in (b). CO 2 is allowed to freely evolve in both experiments once CO 2 has reached 4 times the initial preindustrial level. This is equivalent to about 1800 Pg of carbon emissions. CCR is calculated as the change in SAT divided by the accumulated, diagnosed emissions. After year 140, emissions are zero and any changes in CCR are just due to changes in temperature.
CCR becomes a measure of a model's zero emissions commitment. Two models show a continual increase while one shows a continual decrease. At the time of CO 2 doubling, the range in CCR is between 1.4 and 2.5 • C Eg −1 of carbon (1 Eg or Tt = 1000 Pg or Gt), and after 500 yr the range is between 0.9 and 2.3 • C Eg −1 . Further discussion of the response of CCR in these models can be found in Zickfeld et al. (2013).

Contributions of forcing components to temperature
Several experiments were designed to examine the linearity of temperature and carbon cycle response to various climate forcings. In each experiment, only one major climate forcing was allowed to vary over the historical period (years 850 to 2005). The individual experiments applied forcing from "additional" or non-CO 2 greenhouse gases (AGG), CO 2 (CO2), land-use change (LUC), orbital (ORB), solar luminosity (SOL), sulphate aerosols (SUL) and volcanic aerosols (VOL). Figure 8 shows the EMIC mean results from Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ the individual forcing experiments compared to the experiment that applied all forcings together. Since specified CO 2 forcing is treated separately, any changes in CO 2 due to other forcings, either directly, as with land-use change, or indirectly, through climate-carbon feedbacks, are included as part of the CO 2 forcing. As expected, most of the change in SAT over the last millennium has occurred after industrialization (Fig. 8a). Since 1800, the direct albedo effect from land-use change caused a model average cooling of roughly 0.2 • C, while sulphate forcing also caused a cooling of about 0.2 • C. The sulphate forcing may seem weak, but it is mostly due to the exclusion of the indirect forcing from sulphates in the experimental design. The lack of negative forcing from the indirect effect of sulphates is balanced by also excluding similar positive forcing from ozone and black carbon (see Appendix B). The change in solar luminosity since 1800 was found to have a small positive effect on SAT (< 0.1 • C). Non-CO 2 greenhouse gases have had a large influence on SAT since 1800, but this is largely countered by the combined negative forcing from land use and aerosols. As a result, CO 2 alone is capable of providing the vast majority of the climate change signal since pre-industrial times.
Before industrialization, the net changes in SAT from any of the forcings over the previous 1000 yr are much weaker, although there are warmer and cooler periods during that time (Fig. 8b). It appears that changes in solar forcing have the largest effect on SAT over long timescales in these simulations. Over the last millennium, orbital forcing was found to have almost no effect on modelled SAT, at the hemispheric scale, in the annual mean, while LUC is associated with a small long-term cooling. Volcanic aerosol forcing has a large but short-lived negative effect on modelled SAT. This agrees with the conclusions of Shindell et al. (2003) and Fernández-Donado et al. (2012), although Schneider et al. (2009), Feulner (2011), Miller et al. (2012, and Schleussner and Feulner (2012) indicate the possibility of large volcanic forcing causing longer term persistent cooling in polar regions. Volcanic forcing does, however, contribute to the cooling between the MCA and the LIA because of the larger number of big eruptions during the latter period (see also Fig. 11).
As noted earlier, the models appear to underestimate the SAT changes between the MCA and the LIA compared to reconstructions (Mann et al., 2008;Frank et al., 2010). With similar forcing, the CMIP5 models also show a similar small change in SAT simulated over this period (Fig. 1c). A relatively small change in simulated SAT between the MCA and LIA is also seen in Fig. 15  With similar solar forcing to the datasets used here, many models are not capable of generating a large enough MCA to LIA temperature transition when compared to reconstructions (but see Feulner, 2011). Additionally, several studies have argued that internal variability may have played a significant role in explaining temperature changes during the past millennium (e.g. Jungclaus et al., 2010;Goosse et al., 2012). The models and experimental design used here do not take into account this potential contribution, and this may explain part of the disagreement between model results and reconstructions.
www.clim-past.net/9/1111/2013/ Clim. Past, 9, 1111-1140, 2013 3.6 Linearity of the forced response Figure 9a shows the difference between the anomalous response of the all-forcing experiment and the sum of the anomalous responses of the individual forcing simulations. If the climate system responded linearly to the individual forcings, then the resulting summation would be zero. In general this is the case for all models. There are some differences after 1900 for the ME and UM model, but these differences are still small compared to overall noise. It may be that EMICs are too simple to demonstrate significant nonlinearities, but given the diversity in the complexity of EMICs, this seems unlikely. Fernández-Donado et al. (2012) also suggest a high degree of linearity between forcing and simulated temperature. Figure 9b shows similar results for diagnosed carbon emissions. Here some models appear to show an interaction between individual forcing that is larger than noise, particularly the UV model. This interaction is between land-use change and CO 2 forcing. In models that simulate land-use forcing by removing vegetation (B3, GE, I2 and UV), it would be expected that there would be less CO 2 fertilization due to the removal of vegetation from land-use change (Strassmann et al., 2008). This reduction in land uptake by CO 2 fertilization would result in lower diagnosed emissions and thus total carbon in a simulation that has both land-use change and CO 2 fertilization acting together. In the DC model most of the land carbon removed by land-use change was taken from the soil rather than from the vegetation and so this model would not be expected to show a strong interaction. The ME and UM models also do not show this interaction between vegetation removal (due to land-use change) and the CO 2 fertilization feedback since they do not reduce vegetation or directly exchange carbon with land use. The UV model has one of the largest CO 2 concentration-carbon sensitivities (Table 4) and the largest recent land-use emissions ( Table 3). As such, it shows the largest interaction between land-use change and CO 2 . The GE model is not shown, because one of the ensemble members that constitute the ensemble mean was not useable.

Changes in climate from natural and anthropogenic forcing
Freely evolving CO 2 simulations have the advantage of not forcing the model into a specified state. Thus, these types of experiments can show how CO 2 might change under different scenarios. Two historical simulations in which CO 2 was allowed to freely evolve were conducted to determine the anthropogenic influence on the carbon cycle. While one simulation applied natural forcing, the other included natural and anthropogenic forcing, including specified emissions from fossil fuel combustion. The anomalous SAT from these two simulations since 1750 is shown in Fig. 10. The range in the simulations with only natural forcing is very small compared to the range of the models when anthropogenic forcing is also applied. This is not surprising since the magnitude of the anthropogenic forcing is much greater than the magnitude of the natural forcing, so any differences in model response are amplified. The simulations with only natural forcing produce almost no change in overall SAT between 1750 and 2005. As seen in many other studies (see Hegerl et al., 2007 for a review), when only natural forcing is applied, the models are not capable of simulating the rise in SAT that has been observed over this time period.
It is important to understand the feedbacks between the carbon cycle and the climate system in order to have confidence in future projections. Most models show a positive climate-carbon cycle feedback (Friedlingstein et al., 2006). However, the magnitude, the constancy and perhaps even the sign of the feedback are still uncertain . Reconstructions of past climate variables and forcing, combined with carbon cycle model simulations, may help constrain model climate-carbon cycle feedbacks. On the other hand, climate reconstructions are also often highly uncertain Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/  Fig. 10. The surface air temperature response of the models in freely evolving CO 2 simulations that include only natural forcing (Nat.) or all-forcing (All). Note all-forcing includes both natural (orbital, solar, stratospheric and volcanic aerosols) and anthropogenic (greenhouse gas, land use, tropospheric aerosol) forcing. As in Fig. 7, the light grey shading indicates the uncertainty range in SAT, from Morice et al. (2012). The model results and the data estimates were processed with a 5 yr moving average, rectangular filter. Model results are shown as anomalies from the decade centred at 1750. The data uncertainty estimates are shown as an anomaly, which has been offset to show the same SAT anomaly as the model average over the decade centred on the year 1900. and disparate. Models may then be useful in assessing the plausibility of differing palaeoclimate reconstructions, by allowing reconstructions and forcing to be compared in a physically consistent manner.

Changes in temperature between the MCA and the LIA
The pre-industrial SAT anomalies from the naturally forced, freely evolving CO 2 simulation are shown in Fig. 11a. The models simulate a modest MCA and a small decline in SAT into the LIA that is similar to the specified CO 2 experiment. Although there is considerable uncertainty, estimates from palaeoclimate reconstructions suggest about 0.38 • C as the difference between the warmest (1071-1100) and coolest (1601-1630) pre-industrial periods in the Northern Hemisphere over the last millennium (Frank et al., 2010). The start and end of these climate periods are still debated, but for the simple analysis used here we define an MCA index period to be between 1100 and 1200, and a LIA index period between 1600 and 1700. The 100 yr periods used for these indices were chosen to represent the models' warmest period between year 950 and 1250 and the coolest period between 1450 and 1750. These periods are slightly later than the times estimated from reconstructions for the highest and lowest temperature change for the pre-industrial portion of the last millennium (Frank et al., 2010). Between these 100 yr MCA and LIA periods, the model average difference in globally averaged SAT is 0.19 • C. The largest difference occurs in GE, which simulates a drop in temperature of 0.33 • C between the reference periods. For the all-forcing simulation (with land-use change and specified CO 2 ), the average EMIC global SAT response for the transition between the MCA and the LIA index periods is about 0.21 • C (see Fig. 11b). The slightly smaller change in SAT in the naturally forced, free CO 2 simulation is in part due to the lack of cooling from land-use change (which is not included in the natural forcing simulation), but mostly because the simulated drop in CO 2 is not as great as suggested from ice cores (see Fig. 11c). With the additional cooling from the specified reduction in CO 2 , the largest difference once more occurs in GE, which simulates a drop in temperature of 0.35 • C. It is, however, difficult to compare SAT changes averaged over different periods and regions. For example, the UV model simulates a 0.05 • C greater drop in SAT over the Northern Hemisphere (for which the observational estimate was reconstructed) than globally. However, even after correcting for this difference, most models appear to be underestimating the change in SAT over this period.
The lack of cooling in the models may be from underestimating the specified forcing changes over this period. The individual forcing component contributions to the change in SAT are shown in Fig. 11b. The CO 2 , solar and volcanic forcings are, in nearly equal proportions, the major contributors to the total drop in SAT between the MCA and the LIA index periods. There is also a small cooling contribution from the direct climate effects of land-use change, and very small warming contributions from changes in orbit and non-CO 2 greenhouse gases. There are earlier periods in the simulations that are as cold or nearly as cold as the LIA index period. These early minima in simulated temperature are mostly caused by a series of volcanic eruptions.

Changes in CO 2 between MCA and LIA
CO 2 can be extracted directly from ice cores and is relatively well reconstructed over the past millennium, although there are still uncertainties in both the timing of the CO 2 record and in determining how well CO 2 records from individual ice cores represent global values. The large fluctuations in modelled CO 2 before the LIA are not seen in the PMIP3 CO 2 dataset ( Fig. 11c and d), although other records may be more variable (Mitchell et al., 2011). If we assume that the CO 2 record is accurate and that changes in CO 2 are determined by changes in SAT over this period (through climate-carbon cycle feedbacks), then there would appear to be an inconsistency between the lack of change in the CO 2 record and the large model temperature changes generated by the (mostly volcanic) forcing before the LIA. This implies the following: the specified volcanic forcing is not realistic, or the CO 2 record is not reliable, or the climate-carbon cycle feedback is weak or masked in the CO 2 record by other factors. It is beyond the present experiment to say which is the case, but this does warrant more research.  (Frank et al., 2010). The CO 2 response is shown for the freely evolving CO 2 simulations with only natural forcing in (c) and all-forcing in (d). When multiple model results are shown, the model mean is indicated by a dashed grey line. The solid dark grey lines in (c) and (d) are an estimate of reconstructed CO 2 from the PMIP3 forcing dataset (Schmidt et al., 2012). Horizontal lines over two, century-long index periods (1100-1200 and 1600-1700), which roughly correspond to the Medieval Climate Anomaly (MCA) and Little Ice Age (LIA), indicate model or data averages over these periods. All model results and data estimates are shown as anomalies from the average over the years 1100 to 1200 and were processed with a 30 yr moving average, rectangular filter.
Along with temperature, the models also seem to underestimate the reduction in CO 2 during the MCA-LIA transition. The model average reduction in CO 2 is 2.4 ppm compared to 7.9 ppm from the PMIP3 protocol reconstruction (Schmidt et al., 2012), calculated over the same MCA and LIA index periods. Since most models appear to have a positive climatecarbon cycle feedback over this period (see Fig. 11a and c), some of the small reduction in CO 2 would be attributable to an underestimated reduction in SAT.

Estimating the carbon-climate sensitivity from the MCA to LIA transition
Assuming all of the reduction in CO 2 is driven by changes in climate, a crude estimate of the climate feedback can be calculated from the simulated change in CO 2 and SAT over the MCA-LIA transition. This estimate of the climate-carbon cycle sensitivity has a model average of 2.4 ppm/0.19 • C = 12.6 ppm • C −1 , with a range between 5.1 and 18.8 ppm • C −1 . This estimate is relatively insensitive to the reference periods. Using the average change in SAT and CO 2 between 950 to 1250, and 1450 to 1750 yields very similar results (model average of 13.5 ppm • C −1 with a range between 4.5 and 23.3 ppm • C −1 ). If this estimate of sensitivity were to hold for larger changes in temperature, then the model average drop in CO 2 for a 0.4 • C reduction in temperature would be 5.1 ppm. Even scaling the response of the model with the largest sensitivity (18.8 ppm • C −1 for B3) would still only produce a drop in CO 2 of 7.5 ppm. Apparently, in order to simulate the large observed drop in CO 2 during the MCA-LIA transition, either the temperature drop must be larger than 0.4 • C or the climate-carbon cycle feedback must be large (> 18 ppm • C −1 ). Few models have attempted to simulate a closed carbon cycle over the past millennium. Gerber et al. (2003) used a carbon cycle model with a sensitivity of about 12 ppm • C −1 to constrain the temperature transitions over the pre-industrial portion of the last millennium to less than Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ 1 • C. Both Goosse et al. (2010) and Jungclaus et al. (2010) have used sophisticated models to simulate the evolution of CO 2 over the past millennium. Goosse et al. (2010) show almost no change in CO 2 between the MCA and LIA (see their Fig. 16), and Jungclaus et al. (2010) also seem to underestimate the reduction in CO 2 over this period (see their Fig. 6).
As with the models in this study, some of the small reduction in CO 2 in Goosse et al. (2010) and Jungclaus et al. (2010) is likely due to the small reduction in simulated SAT between the MCA and LIA. It seems that many models are not able to reproduce the reduction in CO 2 shown in the PMIP3 protocol reconstruction (Schmidt et al., 2012) over this period. The direct comparison of EMIC model results with proxy data reconstructions over the last millennium is hampered by the large uncertainty in developing annual-mean proxies from sporadic point source measurements in space and time. Inherent internal variability in the climate system may also make it difficult to compare EMICs, which show little internal variability, to the palaeorecord, which is a single sample of the climate system's natural variability. In addition, our EMIC results are globally averaged and there are few proxy records in the Southern Hemisphere so that current SAT reconstructions focus only on the Northern Hemisphere records. Even after accounting for the different areas and averaging periods over which the MCA and LIA are defined, results suggest that the models may be underestimating both the drop in SAT and CO 2 during the transition from the MCA into the LIA. This in turn could be the result of inadequately modelled, large, long-term, unforced climate variability, errors in the reconstructions of volcanic and/or solar radiative forcing used to drive the models, or the incomplete representation of certain processes within the models.
It is possible that past changes in CO 2 are unrelated to changes in SAT. Long timescale natural variability, unrelated to any climate-carbon feedbacks, could be responsible for many of the past large changes in SAT or CO 2 . If preindustrial land-use changes were significant, then land-use emissions to the atmosphere would also alter the climate-carbon sensitivity estimated from palaeoclimate records. While the effects of unresolved variability are difficult to assess, the possible influence of land-use change on the diagnosed palaeoclimate-carbon sensitivity can be investigated with the freely evolving CO 2 experiments. The evolution of CO 2 in the simulation including anthropogenic forcing, which before 1800 was almost entirely from land-use change, is shown in Fig. 11d. Including anthropogenic forcing reduces the model average estimate of the climate-carbon cycle sensitivity from 12.6 to 8.4 ppm • C −1 . Other land-use reconstructions, which suggest larger pre-industrial land-use fluxes (as in Kaplan et al. 2010), may result in an even larger reduction in apparent sensitivity. However, if the models simulated a larger reduction in CO 2 between the MCA and LIA (due to a higher sensitivity), then, proportionally, the influence of modelled land-use emissions on estimates of the palaeoclimate-carbon cycle sensitivity would be smaller. This result clearly depends on the simulation of uncertain land-use change, but at least for these experiments, there is a considerable sensitivity in diagnosing climate-carbon feedbacks when including emissions from land-use change.

Conclusions
We have evaluated EMIC simulations over the last millennium with respect to other models and historical data. Although some model defects are noted, the EMICs in this intercomparison generally perform well. There is a large range in initial pre-industrial model state, at least in terms of SAT (12.3 to 17.3 • C), but this seems to have little relationship to the models' transient responses to recent changes in radiative forcing. A few models appear to overestimate ocean heat uptake and sea level rise compared to observational estimates over the last several decades. All models show a small decline in the Atlantic meridional overturning circulation over the last century. Ocean carbon uptake is well simulated by all models (within observational uncertainty estimates), but recent land carbon uptake appears to be slightly underestimated by most models. The low land uptake is likely not due to an overestimation of land-use change emissions, which are generally underestimated, but due to overly low residual uptake. This may be due to an overestimation of climate-carbon feedbacks, but is more likely due to an underestimation of the fertilization of photosynthesis.
Idealized experiments were used to calculate a number of standard climate and carbon cycle metrics. The range in the transient climate response is similar to that of CMIP3 and CMIP5 models (Andrews et al., 2012). The model climate sensitivities (diagnosed at year 1000 from 2 × pulse experiments) range from 1.9 to 4.0 • C, spanning most of the likely range given in the IPCC AR4 (2.0 to 4.5 • C). The model average climate sensitivity is 3.0 • C. The models also show a large range in the carbon cycle, CO 2 concentration and climate feedbacks, but all models show negative concentration feedbacks and most show positive climate feedbacks. The carbon climate response (CCR) is not constant for most models either before or after emissions cease, which suggests caution when using this metric. On average, the models suggest that the time to absorb half of an atmospheric CO 2 perturbation (from a relatively large pulse of approximately 1800 Pg) is 130 yr (also see .
The linearity of the climate and carbon cycle components and the importance of different external forcings were assessed with a series of simulations in which each forcing was applied separately. In general, the SAT of a simulation with all forcings is well represented by the sum of the individual forcings. This is not always the case with diagnosed emissions, due to interactions between changing land-use and CO 2 concentrations, for some models. The response of the all-forcing simulations is very similar to simulations with only CO 2 forcing. This implies that historical www.clim-past.net/9/1111/2013/ Clim. Past, 9, 1111-1140, 2013 and modern-day climate forcing can largely be captured by CO 2 , alone, as most other forcings tend to cancel. Free CO 2 simulations were also performed to assess how CO 2 might evolve under different forcing scenarios. Simulations without anthropogenic forcing show almost identical SAT and CO 2 in 2005 compared to 1750. It is only when anthropogenic forcing is added that the models warm by 0.8 • C, on average, over the 20th century.
The climate-carbon sensitivity over the preindustrial portion of the last millennium was compared to palaeoclimate proxy estimates. The uncertainties in palaeoclimate estimates of model forcing and climate make it difficult to constrain the models' climate-carbon cycle response. None of the models were able to reproduce the reconstructed drop in SAT or CO 2 during the MCA to LIA transition.
The effects of land-use emissions on estimates of climatecarbon cycle sensitivities were assessed, and these were found to significantly reduce the diagnosed sensitivity. This suggests that land-use emissions need to be considered when making estimates of climate-carbon feedbacks from palaeoclimate reconstructions. While some of our conclusions remain tentative, our analyses suggest that EMICs can be useful tools in helping to reconcile different palaeoclimate proxy datasets in a physically consistent manner.

Model descriptions
-B3: Bern3D-LPJ is an Earth system model of intermediate complexity with a fully coupled carbon cycle and components that represent the ocean and sea ice, the ocean sediments, the atmosphere, and the terrestrial biosphere. The ocean component is a seasonally forced three-dimensional frictional geostrophic global ocean model with a resolution of 36 × 36 boxes in the horizontal direction and 32 vertical layers (Edwards et al., 1998;Müller et al., 2006). Marine biogeochemical cycles are implemented following OCMIP-2 (Najjar and Orr, 1999;Orr et al., 2000) with the addition of prognostic formulations for biological productivity and the cycling of iron, silica, 13 C and 14 C Tschumi et al., 2008), as well as a sedimentary component (Tschumi et al., 2011;Gehlen et al., 2006;Heinze et al., 1999). The atmosphere is represented by a singlelayer energy and moisture balance model with the same horizontal resolution as the ocean component (Ritz et al., 2011). The CO 2 forcing is calculated after Myhre et al. (1998) and the model is tuned to produce an equilibrium climate sensitivity of 3 • C (global mean SAT increase for a doubling of preindustrial CO 2 , excluding land albedo and other terrestrial feedbacks). Other greenhouse gases and volcanic aerosols are prescribed as global radiative forcing, while tropospheric sulphate aerosols are taken into account by changing the surface albedo locally (Steinacher, 2011;Reader and Boer, 1998). The terrestrial biosphere component is based on the Lund-Potsdam-Jena (LPJ) dynamic global vegetation model at 3.75 • × 2.5 • resolution (Joos et al., 2001;Gerber et al., 2003;Sitch et al., 2003). Vegetation is represented by 12 plant functional types, and CO 2 fertilization is modelled according to the modified Farquhar scheme (Farquhar et al., 1980). The model has recently been extended with modules to account for land use (Strassmann et al., 2008;Stocker et al., 2011), peatlands and permafrost dynamics (Gerten et al., 2004;Wania et al., 2009a,b), and land surface albedo (Steinacher, 2011). The LPJ component is driven by global mean CO 2 concentrations and changes in surface air temperature relative to a reference period using a pattern scaling approach Steinacher, 2011).
-C2: The CLIMBER-2.4 model (Petoukhov et al., 2000;Ganopolski et al., 2001) is a fully coupled climate model without flux adjustments. It consists of a 2.5dimensional statistical-dynamical atmosphere module with a coarse spatial resolution of 10 • in latitude and 360 • /7 in longitude, which does not resolve synoptic variability. The vertical structures of the temperature and humidity are parameterized as well. The ocean component has three zonally averaged basins with a latitudinal resolution of 2.5 • and 20 unequal vertical levels. The model also includes a zonally averaged sea ice module, which predicts ice thickness and concentration and includes ice advection.
-C3: The intermediate complexity climate model CLIMBER-3α  shares CLIMBER-2's statistical-dynamical atmosphere (Petoukhov et al., 2000), but operates at a higher horizontal resolution of 22.5 • in longitude and 7.5 • in latitude. Additionally, it employs a general circulation model for the ocean component. This ocean module is based on MOM3 (Pacanowski and Griffies, 1999), but includes a second-order moments tracer advection scheme (Hofmann and Morales Maqueda, 2006) as well as changes to the parameterizations of diffusivity and convection . In CLIMBER-3α, the ocean model has a horizontal resolution of 3.75 • in both latitude and longitude. It divides the ocean into 24 vertical levels of variable height, ranging from 25 m at the surface to about 500 m at the largest depths. Sea ice is represented by the thermodynamic-dynamic sea ice model ISIS (Fichefet and Morales Maqueda, 1997). Land surface types (including vegetation) are prescribed in the coupler.
-DC: The DCESS model consists of fully coupled modules for the atmosphere, ocean, ocean sediment, land biosphere and lithosphere (Shaffer et al., 2008). The Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ model geometry consists of one hemisphere, divided into two 360 • × 52 • zones. Long-term climate sensitivity has been calibrated to 3 • C. The atmosphere component considers radiation balance, heat and gas exchanges with other modules, and meridional transport of heat and water vapour between low-midlatitude and high latitude zones. The ocean component is 270 • wide and extends from the Equator to 70 • latitude. Both ocean sectors are divided into 55 layers with 100 m vertical resolution. Each layer is assigned an ocean sediment section, with width determined from observed ocean depth distributions. Sea ice and snow cover are diagnosed from estimated atmospheric temperature profiles. Circulation and mixing are prescribed, with values calibrated from observations as in the HILDA model (Shaffer and Sarmiento, 1995). The ocean sediment component considers calcium carbonate dissolution as well as oxic/anoxic organic matter remineralization. The land biosphere component includes leaves, wood, litter and soil. For this experiment, it has been modified to include prescribed land-use change carbon losses, distributed in proportion to the initial inventory sizes of the module components. With this change, the model CO 2 fertilization factor, originally 0.65, has been recalibrated to 0.37. Finally, the lithosphere component considers outgassing and climate-dependent weathering of carbonate and silicate rocks, as well as rocks containing old organic carbon and phosphorus.
-FA: FAMOUS (Smith et al., 2008) is a low resolution AOGCM with no flux adjustments, based on the widely used HadCM3 climate model (Gordon et al., 2000). The atmosphere component is the primitive equation model HadAM3 (Pope et al., 2000), with resolution 5 • × 7.5 • and 11 vertical levels. It uses an Eulerian advection scheme, with a gravity-wave drag parameterization. Radiative transfer is modelled using six shortwave bands and eight longwave bands, while convection follows a mass-flux scheme, with parameterizations of convective downdrafts and momentum transport. Some of the parameter values in HadAM3 which are poorly constrained by observations have been systematically tuned so that FAMOUS produces a climate more like that of HadCM3 (Jones et al., 2005;Smith et al., 2008). FA-MOUS uses a coastal tiling scheme which combines the properties of land and sea in coastal grid boxes in the atmosphere model. The ocean component is HadOM3 (Gordon et al., 2000;Cox, 1984). The resolution is 2.5 • × 3.75 • , with 20 vertical levels. It is a rigid lid model, where surface freshwater fluxes are converted to virtual tracer fluxes via local surface tracer values. HadOM3 uses isopycnal mixing and thickness diffusion schemes with a separate surface mixed layer, while diapycnal mixing of momentum below the mixed layer is parameterized using a Richardson-number dependent scheme. The momentum equations are slowed by a factor of 12 with Fourier filtering applied at high latitudes to smooth instabilities. Outflow from the Mediterranean is parameterized by simple mixing between an area in the Atlantic and an area in the Mediterranean from the surface to a depth of 1300 m. Iceland has been removed to facilitate ocean heat transport, and an artificial island is used at the North Pole to alleviate the problem of converging meridians. The sea-ice component uses simple, zero-layer thermodynamics, which is advected with the surface ocean currents. Land processes are modelled via the MOSES1 land surface scheme (Cox et al., 1999).
-GE: The GENIE-1 physical model comprises the 3-D frictional geostrophic ocean model GOLDSTEIN, at 10 • × (3-19) • horizontal resolution with 16 vertical levels, coupled to a 2-D energy moisture balance atmosphere and a thermodynamic-dynamic seaice model (Edwards and Marsh, 2005). Recent developments (Marsh et al., 2011) include the incorporation of stratification-dependent mixing, a more general equation of state through a parameterization of thermo-baricity, and improvements to the representation of fixed wind forcing. The land surface component is ENTS, a dynamic model of terrestrial carbon storage (Williamson et al., 2006) with a relatively simple implementation of spatiotemporal land-use change . Ocean chemistry is modelled with BIOGEM , including iron limitation , and is coupled to the sediment model SEDGEM with fixed weathering, diagnosed during the model spin-up to simulated observed ocean alkalinity . All GENIE results are derived from ensembles applying the same 20-member parameter set. The selected parameters were filtered from a 100-member, 28-parameter pre-calibrated ensemble, constrained for plausible present-day CO 2 concentrations.
-I2: The MIT-IGSM2.2  is an Earth system model of intermediate complexity. The atmospheric component is a zonally averaged primitive equation model (Sokolov and Stone, 1998) with 4 • latitudinal resolution and 11 vertical levels. Each zonal band can consist of land, land ice, ocean, and sea ice. Surface temperature, turbulent and radiative fluxes, and their derivatives are calculated over each type of surface. The ocean component is a mixed layer/seasonal thermocline model with 4 • × 5 • horizontal resolution. Heat mixing into the deep ocean is parameterized through diffusion of the temperature anomaly at the bottom of the seasonal thermocline (Hansen et al., 1984). Embedded in the ocean model are a thermodynamic sea ice model and a carbon cycle model (Holian et al., 2001) al., 2008) for surface heat fluxes and hydrological processes, TEM (Melillo et al., 1993;Felzer et al., 2004) for carbon dynamics of terrestrial ecosystem, and NEM (Liu, 1996) for methane and nitrogen exchange. The coupled CLM/TEM/NEM model system represents the geographical distribution of land cover and plant diversity through a mosaic approach, in which all major land cover categories and plant functional types are considered over each cell, and are area-weighted to obtain aggregate fluxes and storages. A distinguishing feature of TEM is explicit interactions between the terrestrial carbon and nitrogen cycles . These simulated carbon/nitrogen interactions allow the model to consider the limiting effects of nitrogen availability on plant productivity and how changes in this availability from changing environmental conditions, such as warming  or the application of nitrogen fertilizers (Felzer et al., 2004), might influence future uptake and storage of carbon. For this study, the model also assumes that no nitrogen fertilizers were applied to croplands before 1950, but then after 1950, the proportion of fertilized croplands increased linearly until all croplands were assumed to be fertilized by 1990 and afterwards.
-IA: The IAP RAS climate model (Muryshev et al., 2009;Eliseev and Mokhov, 2011) employs a multilayer statistical-dynamical atmosphere component with a comprehensive radiation scheme, interactive cloudiness, and parameterized synoptic-scale fluxes. The ocean component is a primitive equation global circulation model developed at the Institute of Numerical Mathematics, Russian Academy of Sciences. The sea ice component uses a zero-layer thermodynamic scheme with two-level ice thickness distribution (level ice and leads). IAP RAS includes a comprehensive soil scheme with a high vertical resolution (Arzhanov et al., 2008), as well as a terrestrial and oceanic carbon cycle (Eliseev and Mokhov, 2011). Ice sheets are prescribed. The model does not use flux adjustments for coupling between the model components.
-LO: LOVECLIM 1.2 (Goosse et al., 2010) consists of components representing the atmosphere (ECBilt), the ocean and sea ice (CLIO), the terrestrial biosphere (VECODE), the oceanic carbon cycle (LOCH) and the Greenland and Antarctic ice sheets (AGISM). ECBilt is a quasi-geostrophic atmospheric model with 3 levels and a T21 horizontal resolution (Opsteegh et al., 1998). It includes simple parameterizations of the diabatic heating processes and an explicit representation of the hydrological cycle. Cloud cover is prescribed according to present-day climatology. CLIO is a primitive equation, free-surface ocean general circulation model coupled to a thermodynamic-dynamic sea ice model (Goosse and Fichefet, 1999). Its horizontal resolution is 3 • × 3 • with 20 levels in the ocean. VECODE is a reduced-form model of vegetation dynamics and of the terrestrial carbon cycle (Brovkin et al., 2002). It simulates the dynamics of two plant functional types (trees and grassland) at the same resolution as that of EC-Bilt. A potential fertilization of net primary production (NPP) by elevated atmospheric CO 2 is accounted for by a logarithmic dependence of NPP on CO 2 . LOCH is a comprehensive model of the oceanic carbon cycle (Mouchet and François, 1996). It takes into account both the solubility and biological pumps, and runs on the same grid as CLIO. Finally, AGISM is composed of a three-dimensional thermomechanical model of ice sheet flow, a viscoelastic bedrock model and a model of mass balance at the ice-atmosphere and ice-ocean interfaces (Huybrechts, 2002). For both ice sheets, calculations are made on a 10 km by 10 km resolution grid with 31 sigma levels. Note that LOCH and AGISM were not activated in the experiments conducted for this intercomparison.
-ME: MESMO version 1 (Matsumoto et al., 2008) is based on the C-GOLDSTEIN ocean model (Edwards and Marsh, 2005). It consists of a frictional geostrophic 3-D ocean circulation model coupled to a dynamicthermodynamic sea ice model and atmospheric model of energy and moisture balance. Ocean production is based on prognostic nutrient uptake kinetics of phosphate and nitrate with dependence on light, mixed layer depth, temperature, and biomass. Interior ocean ventilation is well calibrated against natural radiocarbon on centennial timescale and against transient anthropogenic tracers on decadal timescale. Here MESMO1 is coupled to a simple prognostic land biosphere model (Williamson et al., 2006) that calculates energy, moisture, and carbon exchanges between the land and the atmosphere. Prognostic variables include vegetation and soil carbon as well as land surface albedo and temperature.
-MI: MIROC-lite (Oka et al., 2011) consists of a vertically integrated energy moisture balance atmospheric model, an ocean general circulation model, a dynamic-thermodynamic sea ice model, and a singlelayer bucket land-surface model. All model components have 4 • × 4 • horizontal resolution, and the ocean has 35 vertical layers. In the atmosphere component, heat is transported via diffusion and moisture is transported via both advection and diffusion. Internal diagnosis of wind is switched off, and externally specified wind based on observations is used for the computation of moisture advection in the atmosphere and air-sea flux exchanges. To close the water budget, excess land water overflowing from the bucket model is redistributed homogeneously over the entire ocean grid. No explicit flux corrections for heat and water exchanges are applied. , 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ 1133 -ML: MIROC-lite-LCM (Tachiiri et al., 2010) has the same physical components as MIROC-lite, except that equilibrium sensitivity is tuned for 3 • C, runoff water is returned to the nearest ocean grid, freshwater flux adjustment is used between the Pacific and the Atlantic, and internally diagnosed wind is used in the physical components. Additionally, the marine carbon cycle is represented with an NPZD model (Palmer and Totterdell, 2001) in which a fixed wind speed is used. Daily variability from a previous simulation of MIROC3.2 (Hasumi and Emori, 2004) is used to drive the annual cycle of the terrestrial vegetation model Sim-CYCLE (Ito and Oikawa, 2002). The vegetation model is coupled back to MIROC-lite-LCM on an annual basis. To decrease computational cost, the model has a coarser horizontal resolution of 6 • × 6 • with 15 vertical layers in the ocean.

Clim. Past
-SP: SPEEDO (Severijns and Hazeleger, 2009) is an intermediate complexity coupled climate model. The atmospheric component of SPEEDO is a modified version of the AGCM Speedy (Molteni, 2003;Kucharski and Molteni, 2003), having a horizontal spectral resolution of T30 with a horizontal Gaussian latitude-longitude grid (approximately 3 • resolution) and 8 vertical density levels. Simple parameterizations are included for largescale condensation, convection, radiation, clouds and vertical diffusion. The ocean component of SPEEDO is the CLIO model (Goosse and Fichefet, 1999). It has approximately a 3 • × 3 • horizontal resolution, with 20 vertical layers ranging from 10 to 750 m in depth. It includes the sea ice model LIM (Fichefet and Morales Maqueda, 1997). A convective adjustment scheme, increasing vertical diffusivity when the water column is unstably stratified, is implemented. SPEEDO also includes a simple land model, with three soil layers and up to two snow layers. The hydrological cycle is represented with the collection of precipitation in the main river basins and outflow in the ocean at specific positions. Freezing and melting of soil moisture is included.
-UM: The UMD Coupled Atmosphere-Biosphere-Ocean model (Zeng et al., 2004) is an Earth system model with simplified physical climate components including a global version of an atmospheric quasi-equilibrium tropical circulation model Zeng et al., 2000), a simple land model , and a slab mixed layer ocean model with Q flux to represent the effects of ocean dynamics (Hansen et al., 1984). The mixed layer ocean depth is the annual mean derived from Levitus et al. (2000). All models are run at 5.6 • × 3.7 • horizontal resolution, limited by the atmospheric component. The terrestrial carbon model VEGAS (Zeng, 2003;Zeng et al., 2004Zeng et al., , 2005 is a dynamic vegetation model with full soil carbon dynamics. Competition among four plant functional types is determined by climatic constraints and resource allocation strategy such as temperature tolerance and height-dependent shading. Phenology is simulated dynamically as the balance between growth and respiration/turnover, so whether a plant functional type is deciduous or evergreen is interactively determined. There are six soil carbon pools with varying temperature dependence of respiration: microbial, metabolic and structural litter; fast, intermediate, and slow soil. A three-box ocean carbon model including low latitude, high latitude, and deep ocean (Archer et al., 2000) is coupled to the terrestrial component through a fully mixed atmosphere. No ocean biology or sea ice is included in the model.
-UV: The UVic ESCM version 2.9 (Weaver et al., 2001;Eby et al., 2009) consists of a primitive equation, 3-D ocean general circulation model coupled to a dynamic-thermodynamic sea-ice model and an atmospheric energy-moisture balance model with dynamical feedbacks (Weaver et al., 2001). The model conserves heat, moisture, and carbon between components to machine precision without flux adjustments. The land surface and terrestrial vegetation components are represented by a simplified version of the Hadley Centre's MOSES land-surface scheme coupled to the dynamic vegetation model TRIFFID (Meissner et al., 2003). Land carbon fluxes are calculated within MOSES and are allocated to vegetation and soil carbon pools (Matthews et al., 2004). Ocean carbon is simulated by means of an OCMIP-type inorganic carboncycle model and a NPZD marine ecosystem model with two nutrients (PO 4 and NO 3 ), two phytoplankton classes, and prognostic denitrification (Schmittner et al., 2008). Sediment processes are represented using an oxic-only model of sediment respiration (Archer, 1996). Terrestrial weathering is diagnosed from the net sediment flux during spin-up and held fixed at the equilibrium pre-industrial value for transient simulations. The model was spun up with boundary conditions from the year 1800 for more than 10 000 yr.

Forcing descriptions
The recommended externally specified forcings for the historical simulations follow the PMIP3 last millennium and CMIP5 RCP experimental protocols. Unlike the PMIP3 protocol only one forcing dataset was recommended. Models that were not able to incorporate the recommended forcing were encouraged to use the equivalent radiative forcing estimates from the Task Group: RCP Concentrations Calculation and Data (Meinshausen et al., 2011).
The recommended forcings were applied over the last millennium for the appropriate individually forced and allforcing simulations. Since most EMICs are not able to simulate the warming from black carbon, the indirect effect of ozone, or the cooling from the indirect effect of sulphate aerosols, these forcings were excluded. The excluded forcings are all highly uncertain and in general tend to cancel out, and so no extra net external forcing was specified in order to compensate for their exclusion (Meinshausen et al., 2011). A description of the recommended forcing and any model deviations from the experimental protocol are as follows.
-CO 2 forcing is from the recommended PMIP3 (years 850 to 1800) and the CMIP5 RCP (years 1765 to 2005) historical CO 2 concentrations (Schmidt et al., 2012;Meinshausen et al., 2011). The annual mean values were linearly blended between years 1765 and 1800 and converted to radiative forcing as in the AR4 (Forster et al., 2007). All models applied this forcing through changes in their radiative transfer schemes. Individual modelling groups supplied their own estimates of any emissions needed for the free CO 2 experiments.
-Land-use forcing is from the recommended PMIP3 data of Pongratz et al. (2008), between years 850 and 1699, and the CMIP5 historical RCP land-use dataset of Hurtt et al. (2011), between years 1500 and 2005. Only the changes in the annual mean area of crop and pasture were used. The datasets were linearly blended between years 1500 and 1699. The B3, DC, GE, I2, and UV models apply the forcing through changes in the local surface albedo, hydrology and as internally calculated carbon fluxes due to vegetation cover changes. The IA and LO model applied land-use forcing only as changes in surface albedo and hydrology. Only the equivalent radiative forcing from the Task Group: RCP Concentrations Calculation and Data, due to albedo changes, was applied as a change in net top-of-atmosphere (TOA) shortwave radiation by the C2, C3, MI, ME and UM models.
-Non-CO 2 greenhouse gas forcing is from the recommended PMIP3 (years 850 to 1800) and the CMIP5 (years 1765 to 2005) historical RCP non-CO 2 greenhouse gas concentrations (Schmidt et al., 2012;Meinshausen et al., 2011) converted to an aggregated radiative forcing as in the AR4 (Forster et al., 2007). The annual mean data were linearly blended between 1765 and 1800. All models applied non-CO 2 greenhouse gas forcing through changes in their radiative transfer schemes.
-Orbital forcing followed Berger (1978) and changed only between years 850 and 2005. Orbital parameters are fixed at their year 2005 values, after 2005. Orbital changes are applied as changes in the TOA incoming solar radiation. The IA, ME and UM models did not include changes in orbital forcing in any simulations.
-Solar irradiance forcing is from the PMIP3 recommended datasets of Delaygue and Bard (2009), between years 850 and 1609, and the CMIP5 forcing of Wang et al. (2005), between years 1610 and 2008. These datasets include annual variation from both the solar cycle and background solar irradiance. Since the solar irradiance from Wang et al. (2005) dataset is very close to that from Delaygue and Bard (2009) at year 1610, the datasets were spliced at this year. The spin-up used a constant average of the first solar cycle (after year 850) of 1365.76 W m −2 . After year 2008, a repeating solar cycle 23 was added. All models applied changes in solar irradiance as changes in incoming shortwave.
-Sulphate aerosol forcing is from the CMIP5 historical RCP SO 4 concentration data (Lamarque et al., 2010) converted to optical depth. SO 4 concentrations are vertically integrated to determine the atmospheric sulphate aerosol burden (g m −2 ). The burden is then multiplied by a constant specific extinction cross-section factor of 8 m 2 g −1 to obtain the optical depth. Only the direct effect of sulphate forcing was to be included. The data are monthly, from January 1855 to December 2105. The B3, IA, I2, LO and UV models calculated the forcing internally from a change in surface albedo. The equivalent radiative forcing from the Task Group: RCP Concentrations Calculation and Data, due to albedo changes, was applied as a change in the net TOA shortwave radiation by the C2, C3, GE, MI and UM models. The GE model applies the equivalent radiative forcing as a perturbation to the outgoing longwave. The UM model specified the direct and indirect effects of sulphates while the ME model did not include any sulphate forcing.
-Volcanic forcing is from the PMIP-recommended Crowley et al. (2008) dataset. For most models aerosol optical depth (AOD) was globally averaged and converted to radiative forcing (RF) and applied as a reduction in net TOA shortwave. A simple linear conversion constant of −20 is used to convert AOD to RF for negative forcing less than 1.5 W m −2 and 2/3 of this conversion constant for the portion of the negative forcing greater than 1.5 W m −2 . The smaller conversion constant effectively reduces the radiative forcing for large eruptions. Peak forcing for the Tambora eruption is reduced to about 6 W m −2 as in Crowley (2000), but peak forcing for Krakatoa and Pinatubo is kept near the 3 W m −2 estimated in the AR4. This dataset has approximately 10day temporal resolution for the years 850 to 1998. This forcing has been made into an anomaly with a positive radiative forcing of approximately 0.207 W m −2 , when there was no negative volcanic forcing. Anomalous volcanic forcing during the spin-up, and after 1998, was set Clim. Past, 9, 1111-1140, 2013 www.clim-past.net/9/1111/2013/ to zero. The GE model applies volcanic radiative forcing as a perturbation to the outgoing longwave. The ML model used equivalent radiative forcing from the Task Group: RCP Concentrations Calculation and Data. All models specified volcanic forcing.