Large-scale temperature response to external forcing in simulations and reconstructions of the last millennium

. Understanding natural climate variability and its driving factors is crucial to assessing future climate change. Therefore, comparing proxy-based climate reconstructions with forcing factors as well as comparing these with paleoclimate model simulations is key to gaining insights into the relative roles of internal versus forced variability. A review of the state of modelling of the climate of the last millennium prior

Abstract. Understanding natural climate variability and its driving factors is crucial to assessing future climate change. Therefore, comparing proxy-based climate reconstructions with forcing factors as well as comparing these with paleoclimate model simulations is key to gaining insights into the relative roles of internal versus forced variability. A review of the state of modelling of the climate of the last millennium prior to the CMIP5-PMIP3 (Coupled Model Intercomparison Project Phase 5-Paleoclimate Modelling Intercomparison Project Phase 3) coordinated effort is presented and compared to the available temperature reconstructions. Simulations and reconstructions broadly agree on reproducing the major temperature changes and suggest an overall linear response to external forcing on multidecadal or longer timescales. Internal variability is found to have an important influence at hemispheric and global scales. The spatial distribution of simulated temperature changes during the transition from the Medieval Climate Anomaly to the Little Ice Age disagrees with that found in the reconstructions. Thus, either internal variability is a possible major player in shaping temperature changes through the millennium or the model simulations have problems realistically representing the response pattern to external forcing. A last millennium transient climate response (LMTCR) is defined to provide a quantitative framework for analysing the consistency between simulated and reconstructed climate. Beyond an overall agreement between simulated and reconstructed LMTCR ranges, this analysis is able to single out specific discrepancies between some reconstructions and the ensemble of simulations. The disagreement is found in the cases where the reconstructions show reduced covariability with external forcings or when they present high rates of temperature change.

Introduction
The level of understanding of the present climate state relies to a large extent on the analysis of instrumental data (e.g. Forster et al., 2007;Trenberth et al., 2007;Lemke et al., 2007) and of experiments with atmosphere-ocean general circulation models (AOGCMs; e.g. Randall et al., 2007;Meehl et al., 2007). Current climate conditions can be viewed as the result of different processes interacting at a range of timescales, many of which are longer than the length of the instrumental period (Peixoto and Oort, 1984;Houghton, 2005;Jansen et al., 2007). Such sources of variability may be related to internal dynamics (Delworth and Zeng, 2012) and feedbacks in the system, or may be a response to changes in natural or anthropogenic external forcings (Ottera et al., 2010). The limited time span of the instrumental period (e.g. Brohan et al., 2006;Lawrimore et al., 2011) makes it difficult to study mechanisms operating on long temporal scales and characterize the level of internal and forced variability of the system. The use of AOGCMs and the analysis of indirect (proxy) sources of climate information can add to the knowledge obtained from instrumental records alone (Jones et al., 2009).
The Late Holocene climate offers an immediate temporal context, similar enough to the present climate, against which the recent warming can be compared (Jansen et al., 2007). The availability of high resolution proxy data in comparison to earlier periods allows the development of reconstructions of the time evolution, and sometimes the past spatial distribution, of some important climate parameters as well as of past external forcing factors. The latter have been used, in turn, as boundary conditions for climate model simulations, many of them spanning at least the last millennium. Reconstructions and simulations are both subject to their own strengths and weaknesses since they are both affected by different sources of uncertainty.
Climate reconstructions are based on documentary observations as well as on geological and biological data that offer proxy information of past climate variability (Jones et al., 2009). These data are used in statistical models that are calibrated with instrumental data and subsequently used to provide an estimation of the past climatic evolution of a particular parameter of interest (North et al., 2006). Proxy data have different temporal resolutions and spatial coverages, are often biased to certain seasons, and show sensitivity to different climate parameters, as well as environmental factors not necessarily related to climate (Jones et al., , 2009. When used in multiproxy approaches that integrate local or regional information from different sources and areas, all these factors may contribute to larger uncertainties and noise. Climate reconstructions have targeted different spatial scales, from local, regional and continental (e.g. Jones et al., 2009;Luterbacher et al., 2004;Büntgen et al., 2008;Garcia-Herrera et al., 2008) to hemispheric and global (e.g. Briffa et al., 1998;Mann et al., 2008). Although most of these stud-ies have focused on the reconstruction of past temperature and precipitation, a considerable number of them have also explored atmospheric circulation patterns and indices (e.g. Luterbacher et al., 2002;Trouet et al., 2009). The integration of local and/or regional proxy information into large-scale, hemispheric or global reconstructions is performed with a variety of methodological approaches, from simple compositing and scaling of local/regional series (e.g. Hegerl et al., 2007a;Mann et al., 2008;Ljungqvist, 2010) and regressionbased approaches of various levels of complexity (e.g. Tingley and Li, 2012;Luterbacher et al., 2004; to Bayesian methods (Tingley and Huybers, 2010;Li et al., 2010;Werner et al., 2012). Most of these methods are prone to various degrees of variance loss that can affect the amplitude of low-frequency variability. This loss can arise, among other factors, from variance underestimation implicit in regression methods, contribution of low-frequency nonclimatic noise from proxies that perturbs the calibration process, low climate signal to noise ratios in proxies or spatial underrepresentation (e.g. Buerger and Cubasch, 2005;Buerger et al., 2006;Juckes et al., 2007;Christiansen et al., 2009;Smerdon, 2012). This uncertainty adds to the previous factors inherent to each proxy source and will affect assessments involving comparisons of climate reconstructions and simulations.
Numerical simulations of the climate over the last millennium have used models of varying complexity, from energy balance models (EBMs; e.g. Crowley, 2000;Hegerl et al., 2006) and Earth system Models of Intermediate Complexity (EMIC; e.g. Bauer et al., 2004;Goosse et al., 2005) to comprehensive AOGCMs (e.g. González-Rouco et al., 2003;Servonnat et al., 2010;Swingedouw et al., 2010) or Earth system models, which include a more realistic representation of some system components, such as a dynamic vegetation or a carbon cycle component (e.g. Jungclaus et al., 2010). Such simulations contribute to the understanding of the climate of the last millennium and may have implications for estimations of future climate change. Some examples are a better understanding of the analysis of the response to natural and anthropogenic external forcings and the mechanisms involved (e.g. Zorita et al., 2005;Goosse et al., 2006b;, a validation of the simulated signal of the external forcing through comparison with climate reconstructions (e.g. Crowley, 2000;Bauer et al., 2003;Hegerl et al., 2011), a narrowing of the ranges of climate sensitivity estimates , and the use of the simulated climate as a pseudo-reality to validate the statistical methodologies applied in proxy-based climate reconstructions (see Smerdon, 2012, and references therein).
AOGCM simulations provide the most comprehensive and exhaustive representation of the climate system, but they also contain errors arising from simplification due to computational constraints and the limited knowledge of the climate system and external forcings. These sources of error also contribute to model uncertainty in projections of future climate change (Meehl et al., 2007;Randall et al., 2007). The slightly different approaches to simulating atmosphereocean dynamics and different schemes for unresolved physical processes such as cloud feedbacks in the atmosphere (e.g. Soden and Held, 2006) are major contributors to model uncertainty. Additionally, the limitations in the representation of the various climate subsystems like ice sheets, permafrost, land surface processes, convection parametrizations, etc., all contribute to each model having different climate sensitivities. Further, uncertainties in the estimation of the evolution of forcing factors during the last millennium add to the uncertainty of the model response. Forcing factors such as changes in orbital parameters are well known (e.g. Berger and Loutre, 1991;Laskar et al., 2004), and this is arguably the case also of changes in the concentrations of greenhouse gases (e.g. Joos and Spahni, 2008). Other forcings, such as solar variability, the effect of volcanic eruptions and land use, are comparatively more uncertain (Schmidt et al., 2011(Schmidt et al., , 2012. Thus, different simulation of the climate of the last millennium have used different forcing specifications as new and improved estimates became available. Also depending on the ability of models to account for given forcings in their code, the implementation of external forcings has been modeldependent (see Sect. 3). Within this context, the present paper attempts to take advantage of this model and forcing diversity to explore the range of simulated temperature for the last millennium and the sensitivity of models to external forcing.
In spite of existing uncertainties, assessing the consistency between climate reconstructions and simulations seems pertinent given that AOGCMs are the main tools for producing projections of future climate change (Meehl et al., 2007). The comparison between both approaches offers one of the few possibilities for climate model evaluation in periods of time before the instrumental era Braconnot et al., 2012). These exercises are also important because knowledge of past external forcings and the temperature response of the system is informative about the relative role of internal versus forced variability and ultimately about the Earth system energy balance (Crowley, 2000;Trenberth et al., 2009) and its climate sensitivity .
Assessments of consistency between climate reconstructions and simulations are not only burdened by the various sources of uncertainty discussed above, but also by the fact that both approaches target conceptually different representations of reality. While climate reconstructions aim to capture the precise evolution of a climate variable in the past, simulations provide a time evolution that is consistent with the physical equations and with the imposed initial and boundary conditions. In fact, different simulations performed with the same climate model generate different climate solutions (e.g. González-Rouco et al., 2009) when started from different initial conditions (Lorenz, 1963). Therefore, an ensemble of simulations made using identical boundary specifications (external forcing) and performed either with different models or with the same model starting from different initial condi-tions would only be comparable in those aspects that relate to the forced model response. Accordingly, climate simulations and reconstructions will only be correlated in the aspects that are driven by the the forced response of the system and to the extent that the current estimations of external forcing used to drive the model experiments are reliable.
An additional important point in model-data comparison relates to the spatial aggregation of proxy and AOGCM information. This rationale relates to the fact that AOGCMs show highest skill on large scales (von Storch, 1995(von Storch, , 2004 while proxy-based reconstructions often target local and regional scales. Strategies to circumvent this problem may be derived based on upscaling (e.g. Jones and Widmann, 2003), downscaling (e.g. Wagner et al., 2007) or forward modelling of proxy variables (e.g. Evans et al., 2006;Ohlwein and Wahl, 2012;Baker et al., 2012;González-Rouco et al., 2009). Alternatively, considering hemispheric and global scales, where the influence of internal variability is minimized by spatial averaging, constitutes a sound basis for the comparison of simulations and reconstructions, and optimizes the links to external forcings.
Several authors have presented assessments of consistency between simulated and reconstructed last millennium Northern Hemisphere (NH) temperature changes (Jones and Mann, 2004;Mann, 2007). The most exhaustive comparison of simulations and reconstruction uncertainties is provided in Jansen et al. (2007), who report an overall qualitative agreement of simulated and reconstructed climate in reproducing the major changes in the history of the last millennium, such as the relatively warm Medieval Climate Anomaly (MCA), the subsequent Little Ice Age (LIA) and the temperature rise during the industrial period. Changes associated with major volcanic eruptions and some anomalous solar activity intervals could be identified in the simulated and reconstructed climate.
This work presents a review of the current state of climate simulation and reconstruction of the last millennium, updating the assessment of Jansen et al. (2007) and elaborating on it. Jansen et al. (2007) gave account of the progress achieved since the previous IPCC report (Houghton et al., 2001) involving new reconstructions of last millennium NH temperatures and presenting a more complete evaluation of uncertainties. However, the consistency of the available reconstructions could be examined only using a few AOGCM experiments (González-Rouco et al., 2003Tett et al., 2007;Ammann et al., 2003) and was mostly based on a set of EMIC simulations. We consider herein several new reconstructions (e.g. Ljungqvist, 2010;Mann et al., 2008) and AOGCM millennial simulations (e.g. Swingedouw et al., 2010;Jungclaus et al., 2010;Servonnat et al., 2010) that have become available since Jansen et al. (2007). These reconstructions have incorporated new methodologies, or new or updated proxy data sets. In turn, the new simulations have been produced with a variety of models that consider 396 L. Fernández-Donado et al.: Last millennium temperature response different sets of forcing factors and often incorporate different estimations of the variations in each forcing.
In the first part of the manuscript (Sect. 2) the models and experiments included in this review are described. The simulations included herein correspond to most experiments performed before the joint CMIP5-PMIP3 (Coupled Model Intercomparison Project Phase 5-Paleoclimate Modelling Intercomparison Project Phase 3) effort Taylor et al., 2012) in which external forcing configurations have been agreed upon as described in Schmidt et al. (2011Schmidt et al. ( , 2012. In this respect, Sect. 3 provides a catalogue that allows comparison of the forcing configurations used in recent AOGCM simulations of the last millennium and complements the sets of external CMIP5-PMIP3 forcings recommended in Schmidt et al. (2011Schmidt et al. ( , 2012. Section 4 presents an update of the current state in climate reconstructions and associated uncertainties at hemispheric and global scales. The simulated climate is compared with existing reconstructions in Sect. 5, examining their behaviour in both the time and frequency domains. We use multidecadal averages to maximize the impact of external forcing. The spatial characteristics of simulations and reconstructions are specifically considered in the MCA-LIA transition. Finally, an evaluation of the response of the climate system to forcing changes is provided using reconstructions and simulations of NH temperatures (Sect. 6). We compare this evaluation to estimations of climate response obtained from future climate transient simulations and from double CO 2 equilibrium experiments. This allows us to establish a simple metric to assess the consistency between simulations and reconstructions.
These simulations have been developed during the last decade and constitute the currently available AOGCM simulations for the last millennium, prior to the ongoing CMIP5-PMIP3 experiments (Schmidt et al., 2011(Schmidt et al., , 2012Braconnot et al., 2012). The ensemble has been built by incorporating all AOGCM experiments presented in Jansen et al. (2007), except for the Stendel et al. (2005) simulation of the last 500 years with the ECHAM4/OPYC3 model, plus all AOGCM runs developed between then and the beginning of the coordinated CMIP5-PMIP3 effort. The ensemble is therefore heterogeneous in terms of forcing configurations and initial conditions, since the simulations were produced with different AOGCMs. Further and most importantly, different ex-ternal forcing boundary conditions were used, depending on the institutions that carried out the simulations and successive updates of the forcing estimates that progressively became available (see Sect. 3). The variety of forcing factors and forcing reconstructions considered herein complement those used for the CMIP5-PMIP3 experiments, by considering some estimations not included in Schmidt et al. (2011Schmidt et al. ( , 2012. This will allow exploration of a larger spectrum of plausible scenarios for the last millennium and, in some instances, assessment of the degree to which the agreement between simulated and reconstructed responses is modified by different specifications of the same forcing. This analysis focuses on the last 12 centuries (800-2000 AD). A description of the general details for each simulation involved herein, including horizontal resolution, number of atmospheric and oceanic levels, the set of external forcings considered and the exact period of simulation, is included in Table 1. The shorter simulations (the 550-yr HadCM3 and CCSM3 runs) are used in some parts of this study, while the longer runs (the 8-kyr ECHO-G and the 2-kyr CSIRO simulations) are only considered after 800 AD.
Given the number of models involved in this analysis, an in-depth description of each one is outside the scope of this paper. The reader is guided to references in Table 1 for that purpose. Six out of the eight models are effectively different AOGCMs, whereas the ECHO-G and the CCSM1.4 are earlier versions of the EC5MP and CCSM3 models, respectively. Some of the simulations (CCSM3, CSM1.4, CNRM, HadCM3) have been corrected for climate drift by estimating long-term trends from available pre-industrial control runs.

External forcing factors
The sets and temporal evolution of external forcings applied vary among simulations, sometimes also among those produced with the same model (Table 1). The simulations may include different sets of forcing factors, and the time evolution of each forcing factor may also be different. While all simulations incorporate solar and greenhouse gas (GHG) forcing and most of them consider volcanic forcing (except for the IPSL, one ECHO-G and some CSIRO simulations), only some of them introduce anthropogenic aerosols (CSM1.4, EC5MP, CNRM, IPSL and HadCM3) and land use changes (EC5MP, CNRM and HadCM3). This variety of configurations allows exploration, to some extent, of the uncertainty stemming from our lack of robust knowledge about the past evolution of some of the forcing factors. The estimations of past Total Solar Irradiance (TSI) can be clustered in two groups, which assume a weak (ss) or strong (S) amplitude of variations, respectively, and this classification carries over to the ensemble of simulations (Sect. 3.1). An example of the latter are the eight EC5MP simulations that group into two sub-ensembles (EC5MP-E2 and EC5MP-E1) produced following comparatively stronger and weaker changes in TSI L. Fernández-Donado et al.: Last millennium temperature response 397 Table 1. Models and experiments considered for the analysis (column 1), the resolution and number of levels in their atmospheric (column 2) and oceanic (column 3) components, the set of external forcings considered in the experiment configuration (column 4), the period of simulation (column 5) and the original reference describing the experiments (column 6). Legend for external forcing: (S) solar forcing using stronger changes in amplitude (i.e. larger than 0.2 % TSI change from the LMM to the present); (ss) solar forcing using weaker changes in amplitude (i.e. lower than 0.1 % TSI change from the LMM to the present; (V) volcanic activity; (G) greenhouse gases; (A) anthropogenic aerosols; (L) land use changes and (O) orbital variations.

Model
Atmosphere  Servonnat et al. (2010) through the last millennium, respectively (Jungclaus et al., 2010, see Sect. 3.1). This section illustrates and compares the main differences between the various forcing estimations shown in Fig. 1. See also Table 2 for the original references of the source reconstructions used with each model for natural forcings and well-mixed GHGs. The text will be organized herein into natural (Sect. 3.1) and anthropogenic (Sect. 3.2) forcing factors. In the case of GHGs, these will be included in the group of anthropogenic forcings, even if most contributions to their global variability before 1850 AD may be of natural origin. The same exception applies to land use. Additionally, in Sects. 5 and 6, a total external forcing expressed in radiative forcing units is obtained for each model integrating all natural and anthropogenic contributions for the purpose of a better comparison among the various forcing configurations, simulations and reconstructions (Sect. 3.3).

Natural forcing
Solar irradiance changes can play a major role in forcing decadal to centennial variability through the last millennium (e.g. Crowley, 2000;Zorita et al., 2005). The amplitude of its variations is nowadays estimated to be much smaller (e.g. Lean et al., 2002;Foukal et al., 2004;Solanki and Krivova, 2004;Wang et al., 2005;Krivova et al., 2007;Gray et al., 2010) than previous published estimates (e.g. Hoyt and Schatten, 1993;Lean et al., 1995;Bard et al., 2000). Yet, a recent reconstruction (Shapiro et al., 2011) still endorses large background variations in irradiance (see discussion in Schmidt et al., 2011Schmidt et al., , 2012. Figure 1a shows the estimations for solar forcing used to drive the suite of simulations listed in Table 1. TSI anomalies are shown with respect to the 1500-1850 AD period for consistency with the figures discussed in Sect. 5. All scenarios agree in depicting higher irradiance values during the socalled Medieval Maximum (ca. 1100-1250 AD) and between two minima in the 11th (Oort minimum; 1010-1080 AD) and 13th (Wolf; ca. 1280-1350) centuries. The lowest irradiance during the millennium is reconstructed during the 15th to 17th centuries in the Spörer (ca. 1460 to 1550) and Maunder (ca. 1645 to 1715) minima, the last interval with low reconstructed TSI being the Dalton minimum (ca. 1790 to 1820). During the 19th and 20th centuries all models use irradiance values that are comparable in magnitude or higher than those of the Medieval Maximum. The various solar forcing scenarios group into two types, one involving TSI variations of comparatively larger amplitude through the last millennium (STSI hereafter; comprising the CCSM3, CNRM, CSM1.4, EC5MP-E2, ECHO-G, HadCM3 and IPSL runs) and one involving changes of comparatively weaker amplitude (ssTSI hereafter; comprising the CSIRO and EC5MP-E1 runs). This is quantified in Table 3 where the percentage of TSI change between three key periods of the last millennium is provided: the interval of highest TSI during the Medieval Maximum (1130-1160 AD), the Late Maunder Minimum (LMM, 1680-1710 AD) and the late 20th century (1960( -1990. The STSI group clusters with an increase in TSI larger than 0.23 % from the LMM to the present and a decrease of more than 0.17 % from the Medieval Maximum to the LMM. The EC5MP and ECHO-G show the largest percentage of change in the transition from the LMM-present as is also evidenced in Fig. 1a. The ssTSI group displays a reduction of 0.04 % during the Medieval to LMM transition and an increase of less than 0.09 % from the LMM to the present. The coherent evolution within the STSI solar forcing stems from the use of a single record of production rates of the cosmogenic isotope 10 Be in Antarctic ice cores from Bard et al. (2000). The CSM1.4 and the EC5MP-E2 ensemble use the original values provided by Bard et al. (2000) (note that series overlap in Fig. 1a) and do not include estimations of the 11-yr solar cycle. In turn, the CCSM3, CNRM, ECHO-G, HadCM3 and IPSL simulations use a version of the Bard et al. (2000) record spliced by Crowley (2000) to a reconstruction of TSI (Lean et al., 1995) based on the sunspot record of solar activity since 1610 AD. Therefore, all these records include an estimate of the 11-yr solar cycle since this date. The slightly different forcings adopted by the various AOGCMs are due to different calibration of the net radiative forcing data provided by Crowley (2000) to the original TSI values of Lean et al. (1995).
Within the ssTSI group, the EC5MP-E1 simulations use TSI sunspot-based reconstructions since 1610 (Krivova et al., 2007) spliced to records of 14 C isotope concentrations in tree rings Usoskin et al., 2007;  Reconstructions of natural forcing and well-mixed GHG reconstructions applied to each model in Table 1. Key for labels: Amma03 , Bard00 (Bard et al., 2000), Batt96 (Battle et al., 1996), Blun95 (Blunier et al., 1995), Crow00 (Crowley, 2000), Crow03 (Crowley et al., 2003), Crow08 (Crowley et al., 2008), Crow12 (Crowley and Unterman, 2012), Ether96 (Etheridge et al., 1996), Ether98 (Etheridge et al., 1998), Fluck02 (Fluckiger et al., 2002), Gao08 (Gao et al., 2008), Johns03 (Johns et al., 2003), Kriv07 (Krivova et al., 2007), Lean95 (Lean et al., 1995), Marl03 (Marland et al., 2003), McFM06 (MacFarling Meure et al., 2006), Stein09 (Steinhilber et al., 2009). Lean95   Table 3. Percentage of TSI change between the period with highest TSI values (1130-1160) in the Medieval Maximum and the LMM, and from the LMM to the late 20th century. Percentages are calculated with reference to the LMM average. Note: a 0.25 % change between the LMM and late 20th century is equivalent to a variation of the TSI between the two periods of ∼ 3.4 W m 2 . Solanki, 2008). The reconstructed 11-yr cycle is extended before the 17th century by artificially superimposing the average 11-yr solar cycle between 1700 AD and the present. The CSIRO simulations use a 10 Be-based reconstruction by Steinhilber et al. (2009) with no 11-yr cycle. None of the simulations consider stratospheric photochemistry and ozone interactions (Shindell et al., 2001). Estimations of variability in solar wavelengths  are also not included. Figure 1a also shows the TSI reconstructions suggested by Schmidt et al. (2011) to serve as boundary conditions for the CMIP5-PMIP3 last millennium simulations. Figure 1a shows additionally the recent reconstruction of Shapiro et al. (2011) that estimates TSI changes of larger amplitude than any of the reconstructions discussed above (see Table 3). Such variability is difficult to reconcile with the solar forcing estimations in Schmidt et al. (2011) and with comparisons of climate reconstructions with simulations using the Climber-3α EMIC driven by the Shapiro et al. (2011) estimates (Feulner, 2011). Nevertheless, this solar forcing reconstruction may be useful for sensitivity modelling experiments (see Schmidt et al., 2012). Figure 1a also shows the mean value of TSI for each reconstruction calculated within the reference interval 1500-1850 AD. These vary from ∼ 1362 to 1369 W m −2 . The range of average TSI values is defined by IPSL and CNRM models in the lower and upper limit, respectively, both using identical TSI anomaly changes during the millennium. For the current study, however, the difference in TSI mean values between simulations is not expected to have an influence on the simulated climate evolution during the last millennium.
www.clim-past.net/9/393/2013/ Clim. Past, 9, 393-421, 2013 Table 4. Climate reconstructions used in this work. For each record, the temporal extension (column 2) and spatial coverage (column 3) are shown. Column 4 indicates the spatial scale (NH, SH, GLB) that the reconstruction was used for in Fig. 3. For the NH reconstructions, column 5 indicates if a reconstruction was used in Jansen et al. (2007), or if it is a new record. If the reconstruction substitutes a record used in Jansen et al. (2007) , then the reference of the replaced record is indicated. Notes: resolution is annual for all reconstructions except for Ljungqvist (2010), which is provided in decadal values; in the case of Mann et al. (2008) two reconstructions, the CPS and the EIV method, are considered. ( * ) In AR4 Pollack and Smerdon (2004) was considered. Instead, the reconstruction provided in Huang (2004) has been selected herein because it includes high-frequency variability that will be useful for the analysis in Sect. 6.

Reconstruction Period Region
Target spatial scale

AR4
Ammann and Wahl (2007)  Orbital forcing is globally small during the last millennium but potentially important for seasonality changes at high latitudes (Kaufman et al., 2009). Only CSIRO, IPSL, HadCM3 and one of the ECHO-G simulations include these changes following Berger (1978), and Laskar et al. (2004) in the case of the IPSL model. EC5MP follows Bretagnon and Francou (1988) for orbital changes and additionally considers nutation.
Volcanic forcing is included in all simulations except for the IPSL, the 8000-yr ECHO-G run and 3 simulations of the CSIRO model (Table 1). The various estimations of volcanic forcing are shown in Fig. 1b, in radiative forcing units. CCSM3 and CNRM originally incorporate this forcing in aerosol optical depth values, and their conversion into radiative forcing units has been done following Hansen et al. (2002), where a factor of −21 W m −2 is suggested for the conversion. This value is within the range of estimations made also by Wigley et al. (2005).
The reconstructions of stratospheric aerosols from volcanic eruptions are based on ice core data from Antarctica and Greenland. The derived time series of volcanic forcing tend to display consistent timing for major eruptions (Fig. 1b). However, they often present differences on the magnitudes of individual events. Our knowledge of volcanic forcing over the past millennium is poorly constrained, par-ticularly in regard to the strengths of individual eruptions (Forster et al., 2007;Schmidt et al., 2011).
The implementation of volcanic forcing into AOGCMs was done only by including the net effect of stratospheric volcanic aerosols on the global radiation balance (ECHO-G, CSIRO) or latitudinally resolved changes in optical depth in the stratosphere (EC5MP, HadCM3). These differences may have an impact on the climatic effects in the aftermath of volcanic eruptions on the atmospheric circulation, especially over the extratropical hemispheres during wintertime (e.g. Robock, 2000;Fischer et al., 2007). Although volcanic impacts are restricted to a few years, the temporal clustering of volcanic outbreaks may also have impacts beyond these time scales (see Sect. 5).
ECHO-G and CSIRO incorporate volcanic forcing following Crowley (2000) and Gao et al. (2008), respectively. HadCM3 uses also annual globally defined data updated from Crowley et al. (2003) and converted to monthly aersol depths assuming a Pinatubo optical-depth time decay. CCSM3, CSM1.4 and CNRM incorporate latitudinal distributions of aerosols following Ammann et al. (2003), albeit with different parametrizations and scalings. EC5MP uses time series of aerosol optical depth at 0.55 µm and of effective radius (Crowley et al., 2008;Crowley and Unterman, 2012). See original references in Tables 1 and 2 for details on aerosol load and forcing conversions and on parametrizations. The CMIP5-PMIP3 volcanic forcing standards for last millennium simulations will rely on the most recent reconstructions (Crowley et al., 2008;Gao et al., 2008;Crowley and Unterman, 2012). Comparison and details are given by Schmidt et al. (2011).

Anthropogenic forcing
Estimates of the concentration changes of the main wellmixed GHGs (CO 2 , CH 4 and N 2 O) are obtained from Antarctic ice cores (Forster et al., 2007;Joos and Spahni, 2008). The records used to produce the simulations in Table 1 were selected according to the availability of data at the time of production of model experiments. The CO 2 concentrations were prescribed in each model (Fig. 1c) except for the EC5MP, which calculates it interactively (see Jungclaus et al., 2010). Figure 1d shows an estimation of the GHG radiative forcing obtained from the concentrations of the three GHGs in each model following Myhre et al. (1998). This allows the comparison of the total effect of CO 2 , CH 4 and N 2 O between different simulations, and later with other anthropogenic and natural forcings.
HadCM3 used estimated changes in GHGs only in the industrial period; values after 1750 AD were taken from Johns et al. (2003), and constant pre-industrial values were assumed before this date. The ECHO-G model used last millennium reconstructions from Etheridge et al. (1996) for CO 2 and from Etheridge et al. (1998) for CH 4 ; Battle et al. (1996) estimates for N 2 O were included after 1850 AD and assumed constant before. CSM1.4 and CCSM3 used reconstructions from Etheridge et al. (1996) for CO 2 , Blunier et al. (1995) for CH 4 , and Fluckiger et al. (2002) for N 2 O. All simulations from the last three models use different spline interpolations to obtain annual concentration values. In addition to the different origin of the data, the different interpolation approaches produce variability in the evolution of pre-industrial concentrations and forcings in Fig. 1c Blunier et al. (1995) and Fluckiger et al. (2002), respectively. Before this date the concentrations are kept constant. These pre-1850 AD concentration values are higher than those suggested by MacFarling Meure et al. (2006). Thus the CO 2 concentrations in the CNRM and IPSL simulations were lowered by about 5 ppmv to compensate for the relatively high CH 4 and N 2 O levels. This can be appreciated in the lower CO 2 of CNRM/IPSL in Fig. 1c, while in Fig. 1d the GHG forcing of CSIRO, CNRM and IPSL co-vary in phase.
CO 2 and GHGs forcing evolve very similarly for all simulations that prescribed GHG concentration values in Fig. 1c, d. Excluding arbitrary changes produced by spline interpolations, the multicentennial changes displayed by the various forcings are due to natural feedbacks from the ocean and terrestrial biosphere in response to variations in climate; additional effects of land cover change are also possible . The diagnosed ensemble averages of CO 2 concentrations simulated by EC5MP (Fig. 1c) are below the MacFarling Meure et al. (2006) observations in the 20th century. This discrepancy is arguably due to an underestimation of the emissions related to land use change, among other factors discussed in Jungclaus et al. (2010). The pre-industrial CO 2 concentration values show more variability in the E2 ensemble, albeit with changes of somewhat smaller magnitude than in the MacFarling Meure et al. (2006) reconstruction. The larger variability in the E2 (relative to the E1) ensemble may be related to its slightly larger temperature fluctuations (see Sect. 5), with higher values during the MCA and lower during the LIA. The smaller number of members in E2 (3) relative to E1 (5) may also have contributed to this effect, with less chances of cancelling out deviations associated with internal variability among ensemble members. The observed minimum in the 17th and 18th centuries is not reproduced.
Land use and land cover changes are considered by some of the simulations that incorporated information from several data sets available through time. Land use changes as reconstructed by Ramankutty and Foley (1999) are used in the CNRM simulation from 1700 AD onward. HadCM3 uses land surface data from Wilson and Henderson-Sellers (1985) modified with crop history from Ramankutty and Foley (1999) and pasture change data from Goldewijk (2001). The EC5MP ensembles use a reconstruction of global agricultural areas and land cover from Pongratz et al. (2008). The latter is recommended for use in CMIP5-PMIP3 simulations (Schmidt et al., 2011) together with newly available reconstructions Goldewijk et al., 2011). Details regarding the inclusion of anthropogenic sulphate aerosols (CNRM, CSM1.4, HadCM3, IPSL) and halocarbons in the simulations are not provided here; the reader is addressed to the original references for more information. Figure 1e shows the equivalent anthropogenic forcing for all the models integrating the available forcing data of GHGs, anthropogenic sulfate aerosols and land use changes. Aerosol forcing data were available from the CNRM and IPSL experiments in which it was prescribed, but not for the CSM1.4, HadCM3, and EC5MP. For the latter, an aerosol-only sensitivity experiment that would allow assessment of the effects of aerosol parameterizations involved in each model does not exist. Instead, we used values from Forster et al. (2007) to provide an estimation of the potential effect of the forcing in these simulations. This approximation does not take into account either the original sulphate mass loading or any of the physics involved in the model parametrizations and therefore is not accurate in representing the actual forcing in the simulations. However, it is arguably more realistic than excluding the effect of anthropogenic aerosols when estimating total anthropogenic forcing in these simulations. A note of consistency is provided by the aerosol forcing calculated for the www.clim-past.net/9/393/2013/ Clim. Past, 9, 393-421, 2013 NCAR model, which is well within the range of the values estimated herein for the CSM1.4 model . Land use changes are considered through the whole period of interest in Fig. 1e in the EC5MP simulations. The radiative forcing associated with these land use changes was calculated off-line using the radiative code of the ECHAM5, thus including only the albedo effect (Jungclaus et al., 2010); it causes a long-term cooling that adds to that of aerosol forcing (Pongratz et al., 2009. The land use forcing used in the HadCM3 and the CNRM simulations after the 18th century was not available and hence not considered in subsequent analyses, even when the effect of this forcing during the 19th and 20th century may still be non-negligible (Bauer et al., 2003).
On the basis of the previous description, the total anthropogenic forcing illustrated in Fig. 1e accurately represents the actual forcings used in the model simulations over the whole millennium for the CCSM3, CSIRO, ECHO-G and IPSL, and for the CNRM, CSM1.4, EC5MP and HadCM3 until the 19th century; thereafter, these forcing estimations are subjected to the approximations and limitations described above (dashed lines in Fig. 1e).
During pre-industrial times all simulations where CO 2 concentration was prescribed display a very similar evolution of the total anthropogenic forcing. EC5MP shows less low-frequency variability during this period. In the 20th century the models in which the only anthropogenic forcings are GHGs (CCSM3, CSIRO and ECHO-G) are also, as expected, the ones showing the largest increase in total forcing. According to the approximations shown in Fig. 1e, the other simulations gradually decrease, with the EC5MP anthropogenic forcing being the lowest during the 20th century. The resulting temperature response in each model will be built upon the balance of this anthropogenic effect on forcing, the effect of natural forcings displayed in Fig. 1a, b and the climate sensitivity of each model.

Total external forcing
The addition of the natural and anthropogenic forcings considered in Sects. 3.1 and 3.2 builds a total external forcing (TEF hereafter) for each model as shown in Fig. 1f-h. The use of TEF helps us to better understand the temperature response of the models described in Sect. 5 and the assessment of climate sensitivity developed in Sect. 6. Figure 1f shows the sum of anthropogenic forcing in Fig. 1e and solar forcing, thus excluding the volcanic contribution. Figure 1g shows TEF by adding also the effect of volcanic forcing. For comparison with the analysis in the following sections, Fig. 1h shows 31-yr filtered outputs of TEF expressed as anomalies relative to 1500-1850 AD. Forcing changes in the pre-industrial period are dominated by solar and volcanic activity. The comparison of the IPSL TEF, for which volcanic forcing is not included, with that of the other models (see Fig. 1h) serves as an illustration of the non- negligible effect of volcanoes at low frequencies. This effect is noticeable for instance in the 12th, 13th, 15th and 19th centuries, during which decreases in TEF are produced during times of recurrent large volcanic events, sometimes also coinciding with minima in solar forcing. Note also the increase of low-frequency modulation in the EC5MP-E1 and CSIRO forcing due to volcanic activity in comparison to Fig. 1f. In the 20th century, the distribution of forcing trends is similar to that in Fig. 1e, with the exception of the EC5MP-E1 ensemble that is weighted down relative to EC5MP-E2 in which the low-frequency variability of solar forcing is larger. We examine the normalized spectra of the forcing time series in order to identify the forcing signatures in the frequency domain. The relative distribution of variance in the L. Fernández-Donado et al.: Last millennium temperature response 403 frequency domain is displayed for several of the forcing combinations in Fig. 1. Figure 2a shows spectra for the various solar forcing configurations (Fig. 1a). The estimation of these spectra are somewhat burdened by the fact that solar forcing specifications do not have variability at high frequencies, which produces Gibbs oscillations in this part of the spectrum (grey lines, Bloomfield, 1976) and precludes for this case a clear comparison of the relative contribution to total variance of low and high frequencies. However, this is useful as an illustration of the relative importance of the variance accumulation at decadal timescales, and for comparison with other forcings in Fig. 2. CSM1.4, CSIRO and EC5MP-E2 forcings do not include an 11-yr solar cycle signal and thus do not show any contribution to variance centred around the 11-yr band. Their spectra suffer from Gibbs oscillations below the 20-yr timescale. This problem is reduced in the other simulations that do consider an 11-yr solar cycle. In the case of EC5MP-E1, this variability is imposed through the whole millennium, thus showing maximum variance at these frequencies. CNRM, ECHO-G, HadCM3 and IPSL use the 11-yr variability in the last few centuries (see Sect. 3.1). This is reflected in an overlap of their spectra at these timescales, albeit showing a smaller proportion of variance than EC5MP-E1. It is also interesting to note the relative increases of variability from the 20-to 40-yr timescales. Figure 2a can be compared with Fig. 2b which shows the sum of solar and anthropogenic forcings. Here, the proportion of variability accounted for by the 11-yr solar cycle is importantly diminished and only noticeable in the EC5MP-E1 case. The radiative forcing associated with the land use changes in the EC5MP introduces variability at high and low frequencies (Jungclaus et al., 2010; Fig. 1a), thereby contributing to avoidance of Gibbs oscillations in the EC5MP spectra of anthropogenic forcing (red line in Fig. 2b). Figure 2c shows spectra for TEF (Fig. 1g). Two features are prominent. Firstly, the relative contribution of the 11-yr solar cycle has been greatly diminished in all cases. This suggests that only a small simulated global or hemispheric signal, at this frequency, is to be expected in the model response for the last millennium. Additionally, the proportion of variance from interannual to multidecadal timescales, up to 40-yr periods, is increased in comparison to Fig. 2a, b. This is arguably due to the multidecadal variability associated with the occurrence of large volcanoes. An exception here is the IPSL case, which does not include volcanic forcing and serves as a reference that can be compared with the other spectral curves. For longer timescales, the simulations that show relatively lower contributions to variance are the ssTSI group (EC5MP-E1 and CSIRO) and the CNRM forcing. For CNRM this may be due to the relatively higher proportion of high-frequency variability due to volcanic activity (this is the simulation with highest volcanic forcing in Fig. 1b) and the somewhat intermediate trends in TEF amplitude (Fig. 1g, h).

Hemispheric and global reconstructed temperatures
This section presents the set of hemispheric and global reconstructions considered in this study to update and discuss new evidence since Jansen et al. (2007). These reconstructions (Table 4) will be compared with the simulations and forcing estimates in the following sections. The criteria for including a reconstruction in Table 4 was that it used a new methodological approach or new data relative to existing ones. Some reconstructions that were considered to be superseded by new versions were not included in the data set. This is the case, for instance, of the Mann et al. (1999) reconstruction that was omitted in favour of an improved version provided by Ammann and Wahl (2007), or the case of Esper et al. (2002) which has been superseded by Frank et al. (2007). All reconstructions in Table 4 have a minimum length of four centuries and in some cases start well before 800 AD, the beginning of the simulations considered here (column 2). The time resolutions are annual for all reconstructions except for Ljungqvist (2010), which has decadal resolution. Even if records have annual resolution, this may not represent the real time resolution, for instance, in the case of borehole data (Huang et al., 2000) that provide information on multicentennial trends. This also occurs with some other reconstructions in the table that show low variance on interannual timescales despite the data being provided at annual resolution Hegerl et al., 2007b;Loehle, 2007;Leclercq and Oerlemans, 2012;Christiansen and Ljungqvist, 2011). Seasonality biases may also be present, particularly in summer due to the important contribution of tree ring data (Jones et al., 2009). The group of reconstructions is heterogeneous not only from the time domain perspective. Different reconstructions use proxy information from different regions and were developed from different land and ocean spatial coverages (column 3). The majority of them target NH temperatures, but some of them aim to reconstruct Southern Hemisphere (SH) and/or Global (GLB) scale temperatures (column 4). This will allow the state of knowledge at these spatial scales to be illustrated and compared. Column 5 in Table 4 indicates whether a given record had already been considered in AR4 (Jansen et al., 2007).
The present ensemble includes 16 reconstructions for the NH, out of which 7 are new records and 3 are updates or improved versions of their AR4 counterparts (see Table 4). Therefore, this assessment provides new evidence that leans on new data, methods or updates that improve previous versions. Figure 3 shows the evolution of hemispheric and global temperature anomalies using 1850-1990 AD as the reference period. This interval encompasses most calibration periods used for the reconstructions. Therefore, the spread before the 19th century is actually a measure of uncertainty in our knowledge of past temperatures. The grey shading is the overlap among the ensemble of reconstructions, taking into account their uncertainties as in Jansen et al. (2007). For each year, the temperature axis is divided into 0.01 • C pixels that receive a score of 1 (2) if they lie within the range of the reconstructed temperatures ±1.645 (±1) standard deviation. The scores are summed over all reconstructions considered and scaled to range within 0 and 100 % of overlap. The resulting uncertainty distribution is the basis for the modeldata comparison in Sect. 5.1. Contributions to the spread stem not only from uncertainties in proxy data but also from various other sources, including (Jansen et al., 2007;Jones et al., 2009) the different reconstruction methods; the diversity of data and periods used in the calibration process; the different proxy data sets that in some cases overlap and in others bring information from various source regions, the land or marine character of the source locations; and the different seasonalities. These limitations should be kept in mind in the comparison with model simulations discussed below, as well as in the evaluation provided in Sect. 6. Figure 3a shows a qualitative agreement among the reconstructions. The display depicts a warm MCA followed by a colder LIA and a subsequently warmer instrumental period. Temperatures in the first half of the 20th century are comparable to those in the MCA, but instrumental measurements in the last decades of the millennium are above those of any reconstruction since the MCA. However, some differences with AR4 are caused by two reconstructions: Christiansen and Ljungqvist (2012a) and Loehle and McCulloch (2008). The extratropical NH reconstruction of Christiansen and Ljungqvist (2012a) (Fig. 5 in their paper) displays larger low-frequency amplitude changes than any other reconstruction in the ensemble, noticeably enlarging the spread during the MCA and mid-20th century. As in the case of Christiansen and Ljungqvist (2011), reconstructions using the LOC method aim to preserve lowfrequency variability, perhaps to the detriment of high frequencies (Christiansen, 2011). An overestimation of variability can not be ruled out, and some studies suggest these reconstructions may be taken as an estimation of maximum bounds for low-frequency amplitude changes during the last millennium (Tingley and Li, 2012;Moberg, 2012;Christiansen, 2012;Christiansen and Ljungqvist, 2012b). This reconstruction shows noticeably more variance at low frequencies not only in the pre-industrial period but also during the 20th century. The corrected non-tree ring reconstruction of Loehle and McCulloch (2008) shows noticeably larger temperature anomalies during medieval times than at present and reaches values larger than late 20th-century observations during the 9th century.
As a noticeable difference with AR4, this ensemble shows greater multicentennial variability. This is illustrated in Fig. 3b where the ensemble average for AR4 and for this study are compared. The new ensemble average in Fig. 3b shows larger differences between the MCA and the LIA. These differences, however, fall within the envelope of uncertainty and therefore cannot be regarded as a significantly Fig. 3. NH (a, b), SH (c) and GLB (d) temperature anomalies wrt 1850-1990 AD as provided from reconstructions (colours) listed in Table 4 and observations (Brohan et al., 2006). Panel (b) shows the average of the ensemble of reconstructions (solid) in (a) in comparison to that of Fig. 6.10 in AR4 (Jansen et al., 2007). The grey shaded areas in the background are the overlap of uncertainty calculated from the errors provided with each reconstruction and following Jansen et al. (2007). SH and GLB are shown for informative purposes. Note that the small number of reconstructions precludes a reliable estimation of ensemble uncertainties. All series are 31-yr moving average filtered outputs for comparison with Jansen et al. (2007). Anomalies are calculated wrt 1850-1990 AD. different evolution of last millennium temperatures from this perspective. However, the new ensemble average has 1.55 times more variance than the AR4 average (1.81 times more variance for the 31-yr filtered series shown in Fig. 3b). This increase is significant based on an F-test with a significance level α = 0.05 (also accounting for the loss of degrees of freedom in the filtered version). An important part of this enhancement of low-frequency variability is due to the contribution of the Christiansen and Ljungqvist (2012a) curve. If this is not included in the evaluation, raw data variances are significantly increased by a factor of 1.23 (α = 0.05), whereas the low-frequency amplification of variability (1.33) is significant only at the α = 0.21 level. Figure 3c, d show similar plots for the SH and GLB scales. Some notes of caution are pertinent. These panels are shown for the sake of illustration of the present stage of information at these spatial scales, but the estimation of their uncertainty bounds is limited by the very few reconstructions available. For the SH, the five existing records reflect very similar multicentennial trends, whereas multidecadal variability can be quite different among the records. It should be kept in mind that three of the records considered share identical or overlapping proxy information or use comparable methods as in the case of the CPS approach Mann et al., 2008). The evidence shown in Fig. 3c is suggestive of relatively higher temperatures before the 15th century, comparable to those at the beginning of the 20th century, and a colder interval spanning the period between the 15th and 19th centuries. The last decades show values above the reconstructions during the whole period.
At a global scale, only three records are available (Fig. 3d): a borehole reconstruction (Huang et al., 2000), a glacier record (Leclercq and Oerlemans, 2012), and a multiproxy reconstruction (Mann et al., 2008); only the latter of these reconstructions goes back in time beyond 1600 AD (Mann et al., 2008). The amplitude of changes over the last 400 yr is in broad agreement among the three records.

Hemispheric and global simulated temperatures: model-data consistency
This section provides an analysis of the simulated temperature response in comparison to the forcings described in Sect. 3. The response of the simulations listed in Table 1 is also compared to available information from the last millennium climate reconstructions presented in Sect. 4. Section 5.1 assesses the millennial temporal evolution in the reconstructions and simulations. Section 5.2 explores the spatial detail of the transition from the MCA to the LIA. Figure 4 shows hemispheric and global temperature anomalies with respect to the period 1500-1850 AD for the suite of  Table 1. The reason for this choice instead of the 1850-1990 AD period used for reconstructions (Fig. 3) is that during 1850-1990 AD the various simulations use different forcings (see Sect. 3), and therefore their trends during this period are not comparable and render this time interval as invalid as a common reference for model simulations. Additionally, the choice of a longer period (e.g. the whole millennium) is precluded by the fact that some of the simulations in Table 1 only span the last 500 yr. Thus, the choice of 1500-1850 AD as a reference secures a period of data availability for all the simulations and in which the forcings applied were similar. The grey background shading represents the spread of the reconstructions using this reference period and is calculated as in Fig. 3. Note, however, that the spread changes from Fig. 3 to Fig. 4. Due to the choice of a different reference period in this figure, the grey shading is somewhat narrower during the 1500-1850 AD interval and slightly wider over the rest of the millennium relative to what is shown in Fig. 3. Hence, care must be taken not to interpret the spread in Fig. 4 as a measure of the reconstruction ensemble uncertainty of past temperature changes. The shape of the spread can be used for the purpose of a qualitative comparison of agreement or disagreement between simulations and reconstructions. Figure 4a shows results for the NH. As in the case of the reconstructions, the simulations show the sequence of temperature stages discussed above, i.e. higher temperatures in the MCA and industrial period and a relative minimum in the LIA. For the sake of clarity, Fig. 4b shows the reconstruction ensemble average (Fig. 3b) plotted together with the average across all simulations available for each model with an identical forcing configuration. In spite of the relative differences among the inter-model forcing configurations, the trajectory of all simulations shows a high degree of similarity. Most of the simulations show minima during the Wolf, Spörer, Maunder and Dalton intervals, albeit modulated by the presence of volcanic activity for those simulations that incorporate it. The simulations showing less low-frequency variability during pre-industrial times are the ssTSI group, comprising the EC5MP-E1 and the CSIRO ensembles, whereas the STSI group shows larger changes in amplitude. Overall, the simulated trajectories follow closely the TEF in Fig. 1h. The distribution of warming trends in the last two centuries of the simulations follow also a similar arrangement in spite of the limitations discussed regarding the estimation of TEF. The largest temperature increases are simulated by the runs incorporating only GHGs and natural forcing, and decreasing according to the inclusion of additional factors (aerosols, land use; see Sect. 3) that contribute with negative forcing during this period. The 20th-century trends are, however, not solely a function of the applied external forcing but also of model sensitivity. As a mean of complementary information, Table 5 shows values of equilibrium climate sensitivity (Schneider et al., 1980) and transient climate response (Knutti et al., 2005) for the various models in Table 1. Both quantities serve  Tables 1 and Solomon et al. (2007). The conversion to • C/2 × CO 2 was done following Houghton et al. (2001)  as informative estimates of the general sensitivity of climate against external perturbations and will be used below and in Sect. 6. The evolution of simulated NH temperatures in preindustrial times is embedded within the area of larger probability defined by the reconstruction spread (Fig. 4a), although some differences may be noted and briefly discussed herein. Firstly, the areas of larger density in the reconstruction spread tend to show more low-frequency variability than the simulations. These areas are mostly a contribution of the reconstructions showing larger amplitude changes at multidecadal and centennial timescales (Fig. 3). The STSI ensemble seems to follow most closely the low-frequency changes in the reconstruction spread, while the ssTSI ensemble shows somewhat less low-frequency variability, particularly in the transition from the MCA to the LIA. It is difficult to ascertain whether this could indicate a problem on the side of the models or on the side of the reconstructions. If the latter were to be taken as a reliable estimate of the amplitude of past low-frequency variability, this would suggest that either models underestimate the real world sensitivity or that forcing changes in pre-industrial times in the ssTSI group (Fig. 1) are underestimated. Alternatively, low-frequency multidecadal and centennial changes produced by internal variability at hemispherical scales could optionally also account for the reconstructed spread. However, such low-frequency variability is not simulated in any of the available control runs (not shown) and would mean that AOGCMs underestimate internal variability at these timescales. Secondly, the largest differences between the reconstructions and the simulations take place in the 10th and 11th centuries, during which the simulations are well below the area of maximum probability of the reconstructions. The weak model response during this period is due to the relatively low TEF values (Fig. 1h) during the 10th and 11th centuries. Thus the discrepancy can essentially be established on the ground of differences between reconstructions and forcings. Although forcing factors over the last millennium are not perfectly constrained (e.g. Plummer et al., 2012;Schmidt et al., 2011), the quality of their reconstructions (Sect. 3) can be considered homogeneous through time, and thus it can be argued that this discrepancy points to a problem in the reconstructions during this period. For instance, one possibility is that climate reorganizations during medieval times would invalidate the proxy-instrumental relationships during this period (Seager et al., 2007;Graham et al., 2011;Diaz et al., 2011;Trouet et al., 2012), thereby affecting reconstruction quality. If this were the case, the implications would be relevant since this is the period showing the largest temperature anomalies in the last 1200 yr. Thirdly, the simulations showing the largest discrepancies with the reconstruction spread in the 20th century are the CCSM3, the ECHO-G and the EC5MP ensembles. CCSM3 and ECHO-G clearly suffer from not including aerosols and land use, and they overestimate the warming in comparison to the reconstructions. In turn, the EC5MP temperature increase is lower than in the reconstructions. The reasons for this are unknown since EC5MP shows, together with HadCM3 and IPSL, one of the highest transient climate responses in future scenario simulations (see Table 5). Arguably, the physics related to the treatment of aerosols or land use changes may exacerbate the related cooling in this model during the 20th century. Figure 4a still shows overall a very similar situation to Fig. 6.13 in AR4 (Jansen et al., 2007). Progress since then is related to the existence of a considerable number of AOGCM simulations compared to the ensemble in AR4, which comprised only a few AOGCMs and a few EMIC simulations. The ensemble in Fig. 4a shows considerably more variability at multidecadal timescales than the simulations used in AR4 did. This may be related to internal variability in AOGCMs being larger than in EMICs. The response at lower frequencies and the amplitude of changes from the MCA to the LIA is larger in the STSI simulations shown in Fig. 4a, b than in the EMIC simulations illustrated in Fig. 6.13 of AR4. Additionally, the qualitative comparison that can be derived from simulations and reconstructions in Fig. 4a, b and 6.13 in AR4 evidences that, irrespective of using AOGCMs or EMICs, the evolution of simulated changes during the last millennium is very similar and suggestive of a linear relation between NH temperatures and TEF applied in each model simulation. Section 6 will introduce a metric that will provide a more quantitative approach to model-data comparison. Figure 4c, d show the model-data comparison for the SH and GLB averages. Similar conclusions can be reached to those obtained from Fig. 4a, b. The simulations show less spread in the SH than in the NH. Also, trends in the 20th century are of smaller amplitude. This is consistent with observations and in general with the few reconstructions available. Therefore, the smaller temperature spread in the SH reconstructions (Fig. 3c) may arguably be a realistic feature and not a result of having a small number of reconstructions. The  (Fig. 4). Grey lines depict Gibbs oscillations in reconstructions missing high-frequency variability. The inset in (b) shows the normalized spectra of the NH instrumental data (Brohan et al., 2006). lower multicentennial variability can be related in the SH to the lower extent of land areas. The comparison of GLB between simulations and reconstructions shows as expected an intermediate situation to that of the NH and SH. It is noteworthy that in the 9th and 10th centuries the simulations, consistent with the TEF (Fig. 1h), are below the Mann et al. (2008) EIV reconstruction. Some final comments related to specific simulations in the ensemble are pertinent. The dashed blue lines correspond to the high values simulated in the 10th and 11th centuries by one of the ECHO-G reconstructions (González-Rouco et al., 2003). These values have been demonstrated to be exceptionally high due to a problem in initial conditions (Goosse et al., 2005) and corrected for the NH using an EMIC (Osborn et al., 2006).

Temporal evolution
It is interesting to extend the comparison of simulations and reconstructions to the spectral domain. Figure 5 shows normalized spectra for NH temperatures in both the simulations and reconstructions. The reconstructions that do not provide variability at high frequencies (2-10-yr timescales) suffer from spectral noise (spectra with dashed lines; Gibbs oscillations depicted with grey colour). The spectra of the other reconstructions compare well with simulations at all frequency ranges. For the high frequencies (interannual www.clim-past.net/9/393/2013/ Clim. Past, 9, 393-421, 2013 timescales) the reconstructions tend to show a somewhat stable level of variance density, whereas most of the simulations (except for EC5MP and CNRM) show a continuous decay. Interestingly, the spectra of TEF show a similar decay for all models except for the CSIRO (Fig. 2) at high frequencies.
This suggests either a possible noise contamination at high frequencies on the proxy side or an underestimation of interannual variability by the models. The inset in Fig. 5b shows the normalized spectra of the NH instrumental data (Brohan et al., 2006), evidencing a behaviour that is more similar to that of the proxies and thereby suggesting an underestimation of interannual variability by the models. Additionally, some of the simulations (CNRM, EC5MP) show an anomalously high accumulation of variability at 3-5-yr timescales which is not visible in the reconstructions. This is produced in these models by enhanced variability in the tropics at these timescales (not shown, Zhang et al., 2010) and is also not supported by instrumental data. The instrumental observations accumulate noticeable variance in the 10-yr timescale. This is evident for many of the reconstructions (see solid lines in Fig. 5b) and less so in the simulations. The EC5MP-E1 ensemble is the forced simulations with more variability at this timescale, as expected, since they incorporate an 11-yr cycle in the solar variability throughout the millennium (Sect. 3.1).
A small decay of the spectral density between 10-and 20-yr timescales is apparent in the simulations, reconstructions and observations, as well as in TEF. At multidecadal and longer timescales, the spectra of the simulations and the reconstructions shows a similar shape, which also resembles that of TEF (Fig. 2). In Sect. 3.3 we argued that the relative increase of variance at 20-40-yr timescales was due to both solar and volcanic variability. The spectra of the EC5MP-E1 ensemble shows the lowest proportions of low-frequency variability, lying below those of the reconstructions. The factors that may contribute to this are the lower solar forcing variability and the small warming trends in the 20th century simulated by this model. Within the ssTSI group, the CSIRO simulations show levels of low-frequency variability similar to models of the STSI group, most likely as a result of the larger trends simulated by the former in the 20th century.

MCA-LIA transition
In this section, we focus on the transition between the MCA and the LIA in order to assess the response of models to large changes in forcing during the last millennium. Figures 1 to 5 suggest that there is low-frequency variability, which is in agreement in reconstructions and simulations, both of which are consistent with TEF changes. Despite the aforementioned discrepancies in the MCA temperatures, reconstructions and simulations (Figs. 3 and 4) show higher (lower) temperatures during the MCA (LIA) which are also seemingly in agreement with larger (lower) values in TEF. Therefore, external forcing (mostly solar and volcanic variability) may have been  Tables 1 and 4, respectively. Solid (hollow) dots depict simulations including (excluding) volcanic forcing. (b) TEF change corresponding to the forcing applied to the simulations in (a). In the case of models for which simulations are available with and without volcanic forcing, the change in the sum of anthropogenic and solar forcing (see Fig. 1f) is shown with a hollow dot and the TEF (see Fig. 1g) change is indicated with a full dot. In the case of the simulation with the ECHO-G model showing warmer medieval temperatures (González-Rouco et al., 2003), the version corrected by Osborn et al. (2006) is considered instead. In the cases where there is overlap of dots, these have been slightly moved in the vertical direction for visibility. an important contributor to the energy balance in the MCAto-LIA transition (e.g. Crowley, 2000;Bauer et al., 2003;Goosse et al., 2005). However, other factors like land cover changes (Goosse et al., 2006a) or internal variability (Goosse et al., 2012a,b) may have also been relevant, particularly at regional scales.
We examine the temperature response of each reconstruction and model simulation and compare them with the range of forcing variations during this climatic transition (Fig. 6). A proper definition for the extension of these periods is controversial at the NH spatial scale since temperature and hydrological changes at regional scales were not necessarily synchronous (e.g. Graham et al., 2011;Diaz et al., 2011;Trouet et al., 2012). Nevertheless, Ljungqvist et al. (2012) show evidence for widespread NH warming (cooling) in the MCA (LIA) that support the notion of a NH scale MCA and LIA for temperature. The convention adopted here is 1400-1700 AD for LIA and 950-1250 AD for MCA, the same as L. Fernández-Donado et al.: Last millennium temperature response 409 in , facilitating the comparison with previous works. Additionally, such a definition of the MCA (950-1250 AD) is convenient here since it spans a period long enough to include the respective medieval maxima of the simulations, their associated forcings and the reconstructions (see Sect. 5.1). Figure 6 displays positive MCA-LIA differences in forcing as well as in simulated and reconstructed temperature changes, except for the reconstruction of Frank et al. (2007). This curve peaks in the late 10th and early 11th century and reports steeply cooling temperatures until the beginning of the 14th century, similarly to D' Arrigo et al. (2006) and Christiansen and Ljungqvist (2012a). The balance of these temperature anomalies in the Frank et al. (2007) reconstruction during the MCA interval is negative. The range of temperature changes in the reconstructions is somewhat larger than in the models, which makes it difficult to determine which combination of solar and volcanic forcing is more realistic. Changes in forcing are clearly grouped into the STSI and ssTSI division established in Sect. 3. The temperature changes are also organized accordingly, suggesting that solar forcing is a major player in the model simulations of the MCA-LIA transition. Nevertheless, the ssTSI simulations of CSIRO and EC5MP-E1, with the largest temperature changes, present values close to those of some STSI experiments showing the lowest temperature change (CSM1.4 and CCSM3).
The effect of volcanic forcing is only evident in the ECHO-G simulations for which including volcanic forcing enhances both the MCA-LIA forcing change and the simulated response. For the CSIRO model, inclusion of volcanic forcing has no effect on the forcing and temperature changes. These results do, however, depend on the periods used to define the MCA and LIA (not shown). Interestingly, members of an ensemble sharing the same external forcings (CSIRO, ECHO-G, EC5MP) show a spread of temperature changes due to internal variability. This spread may be larger than the temperature difference due to the inclusion of volcanic forcing (ECHO-G and CSIRO) or than the differences between simulations within the ssTSI and STSI groups. For instance, intra-model variability in the EC5MP and the CSIRO ensembles is larger than inter-model differences between the CSM1.4, CCSM3, ECHO-G or IPSL simulations. Therefore, this suggests that internal variability could have had major impacts on the temperature response at hemispheric scales.
Additional insights on the relative roles of internal versus forced variability can be gained by considering the spatial distribution of simulated temperature changes during the MCA-LIA transition. Many studies (e.g. Seager et al., 2007;Graham et al., 2011) suggest that during this period there was a pattern of coordinated temperature and hydrological anomalies, evidencing an increased zonal gradient in the tropical Pacific produced by anomalous cooling in the eastern Pacific and anomalous warmth in the western Pacific and Indian Ocean. Additionally, a broad expansion of the Hadley cell with an associated northward shift of the zonal circulation might have led to a more positive North Atlantic Oscillation (NAO) like signature (Graham et al., 2011;Trouet et al., 2009Trouet et al., , 2012. The relative importance of forcing and internal variability in producing this coordinated pattern of anomalies is not clear.  showed a reconstructed pattern of MCA-LIA temperature change indicating enhanced and pervasive cooling in the eastern equatorial Pacific cold tongue region, often referred to as a La Niña-like background state, as well as positive anomalies dominating at mid-and high latitudes of the NH.  also showed that the negative anomalies in the eastern equatorial area were not reproduced by forced simulations with the GISS-ER and CSM1.4 models. Extratropical warmth was also reported by Ljungqvist et al. (2012) and was found to be consistent with results of assimilation experiments (Goosse et al., 2012a,b) in response to a weak solar forcing and a transition to a more positive Arctic Oscillation state. AOGCM experiments without data assimilation, however, do not seem to support an enhanced zonal circulation during medieval times (Lehner et al., 2012;Yiou et al., 2012). Figure 7 shows the MCA-LIA annual temperature differences (hatched areas indicate non-significance for an α < 0.05 level) in the forced simulations from each of the models considered herein except for the HadCM3 run, which is not included due to the limited time span of this simulation (see Table 2). The reconstructed MCA-LIA temperature pattern from the  multiproxy climate field reconstruction is also shown in Fig. 7 for comparison. The value of the spatial correlation coefficient between each simulated and the reconstructed MCA-LIA pattern is shown at the bottom of simulated panels in Fig. 7. For the ssTSI models a selection of the ensemble members is made considering either the runs with a more complete configuration of external forcings (i.e. the three members including volcanoes), in the case of the CSIRO, or the three runs that represent the most different spatial patterns, in the case of the EC5MP-E1 ensemble. For the EC5MP-E2 all the members of the ensemble are shown. For the ECHO-G, the run with a too-warm MCA (Osborn et al., 2006) is excluded.
All simulations tend to produce an MCA warming that is almost globally uniform. Warming tends to be higher, especially in STSI, over the continents than the oceans, particularly over the sea-ice boundary at high latitudes, in agreement with the temperature response pattern described in Zorita et al. (2005). The agreement among the suite of simulated MCA-LIA temperature differences can be quantified by the values of inter-model spatial correlations between all the possible combinations of two simulations within the STSI (ssTSI) group. Larger correlation values are obtained for STSI (r = 0.69 in the case of CNRM with a member of EC5MP-E2) than for the ssTSI (r = 0.55 between members of EC5MP-E1 and CSIRO), in consistency with the larger TEF changes applied to STSI. There also appear  Table 1 and in the multiproxy climate field reconstruction from . The spatial correlation (r) between each simulated MCA-LIA pattern and the reconstructed field is shown at the bottom right of each panel. For the simulations starting in 1000 AD (CCSM3, ECHO-G, IPSL, CNRM), the period 1000 to 1250 was selected instead to define the MCA.
Hatched areas indicate non-significant differences according to a two-sided t-test (α < 0.05). multiple regional features that are dependent on the model considered, i.e. cooling in the North Pacific (ECHO-G-1), in the North Atlantic (CCSM3 and ECHO-G-2) or in Northern Asia (CNRM). This causes the lowest inter-model spatial correlation values, both in the STSI (e.g. r = −0.09 between CNRM and CSM1.4) and in the ssTSI group (e.g. r = −0.14 between members of EC5MP-E1 and CSIRO). Many of these regional-scale features may well be simulationdependent and related to initial conditions and internal variability as evidenced by the differences within the members of each EC5MP or the CSIRO ensembles. For example, differences arise in the magnitude of warming and cooling over the North Pacific, South America or Africa in EC5MP-E2 or in the spread of cooling regions in the EC5MP-E1 and CSIRO members. These differences within an ensemble lead to intra-model spatial correlation values that range between 0.21 and 0.52 for the EC5MP-E1, between 0.05 and 0.52 for the CSIRO, and between 0.60 and 0.81 for the EC5MP-E2 subensemble. Among the different subensembles, EC5MP-E1 and CSIRO simulate more regional/largescale widespread cooling, a sign of the lower weight of TSI changes that allows for internal variability to become more prominent. This fact can be observed not only from the wider range of intra-model correlation values but also from the lower values obtained in the ssTSI group. Therefore, even if widespread warming is simulated in the MCA, the spatial pattern of temperature change is very heterogeneous and can considerably vary across models and even across simulations with the same model. The spatial pattern observed in the reconstructions by  is not obtained with the available model simulations. This is evidenced by the low values of spatial correlation obtained between the reconstructed and the suite of simulated patterns, ranging from −0.18 to 0.36 (Fig. 7).
The lower values obtained for these simulated-reconstructed pairs than for the inter-model comparison suggest a higher consistency among the simulated than between the simulated and reconstructed patterns.  present the only spatial reconstruction that offers global-scale information about the MCA-LIA transition, and, although supported by several studies (Seager et al., 2007;Graham et al., 2011), it is also subject to important uncertainties (Li and Smerdon, 2012;Smerdon et al., 2011). These uncertainties are mostly associated with the reconstruction methodology and the low proxy replication in the Pacific and North Atlantic basins. However, if this proxy-based reconstruction were to be considered reliable, two possible explanations could be suggested for the aforementioned model-data discrepancies. One is that the spatial pattern of changes for the MCA-LIA was largely influenced by internal variability. The other is that transient simulations with AOGCMs fail to correctly reproduce some mechanism of response to external forcing, as long as the changes in radiative forcing factors are considered to have contributed importantly to the the MCA-LIA temperature change. One example of the latter may be discussed in relation to the so-called "ocean thermostat" mechanism (Zebiak and Cane, 1987). The complex response of the tropical Pacific to radiative forcing still shows important inter-model disagreement in future climate change simulations (Collins et al., 2010). It is thus expected that AOGCMs will struggle to correctly represent potential responses in the past.
The models from the STSI group, which used high TSI variations from the MCA to the LIA, show a pattern of response that is typically a uniform warming in the earlier period (see three upper rows in Fig. 7). In spite of this, there are considerable differences among the simulations, highlighting a potential influence of initial conditions and internal variability. Furthermore, if the reduced levels of past TSI are given more credit (ssTSI group; Schmidt et al., 2011Schmidt et al., , 2012, as in the EC5MP-E1 or CSIRO ensembles, the temperature response for the MCA-LIA is less uniform in sign and more likely influenced by internal variability. Moreover, it also presents lower values than in the  pattern. Therefore, under both high and low TSI change scenarios, it is possible that the reconstructed MCA-LIA anomalies would have been largely influenced by internal variability .

Last millennium transient climate response
The temporal coherence in TEF and reconstructed and simulated temperatures described in Sect. 5 is quantitatively analysed herein based on the estimation of climate sensitivities.
The sensitivity of climate to changes in external forcing can be characterized in the context of future climate by two quantities: equilibrium climate sensitivity (ECS), defined as the temperature change, after reaching equilibrium, to a doubling of atmospheric CO 2 above pre-industrial levels (Schneider et al., 1980); and the transient climate response (TCR), defined as the change in global surface temperature in a 1 % CO 2 increase experiment at the time of atmospheric CO 2 doubling (Knutti et al., 2005). Here, we will define a last millennium TCR (LMTCR hereafter) based on regression estimates between TEF and NH temperature response in simulations and reconstructions of the last millennium. The use of this metric as a measure of consistency between reconstructions and simulations is motivated by the findings reported throughout this manuscript, to wit: (i) the broad agreement between the TEF applied in each simulation and the simulated temperature response (Figs. 1, 4, 5); and (ii) a tendency for the simulated temperature changes to cluster according to the different magnitudes of change in forcing (Fig. 6), despite the presence of a substantiated influence of internal variability.
The relationship between TEF and NH simulated temperatures for intermediate and low frequencies is illustrated in Fig. 8a, where the results of a simple linear regression between NH temperatures and TEF in one of the EC5MP-E2 ensemble members is shown. Simulated temperatures (green) are compared to temperatures estimated (black) from their respective TEF changes after linear regression between both variables for timescales longer than 31 yr; grey shading indicates uncertainties for regression estimates. This indicates that the NH temperature response can be, to a good approximation, considered linearly related to the imposed TEF forcing at these timescales.
Similar results to those in Fig. 8a  forcing was not included, the sum of anthropogenic and solar forcing is considered instead, and correlation values are depicted with hollow dots. Most simulations show high correlation values between temperature and the applied forcing, independently of belonging to the STSI or ssTSI ensembles. Minimum correlation values are attained for the CNRM model and some of the simulations of the EC5MP-E1 ensemble. In the case of the CNRM model there may be at least two reasons that contribute to this. On one hand, this linear relationship to forcing may be reduced by a relatively high internal variability and strong feedbacks in atmosphere dynamics (Swingedouw et al., 2010). On the other hand, the temperature response of the model is not proportional to the large volcanic forcing applied (see Figs. 1h and 4). The EC5MP-E1 ensemble shows correlations in the range of 0.38 to 0.71, highlighting again the influences that internal variability can have on the temperature response at hemispheric scales. Based on the linearity of the relationship between external forcing and temperature, the analysis is expanded to calculate the rate of changes in temperature relative to forcing. We define these ratios as the LMTCR and calculate them as the regression coefficients between both variables. These values constitute estimations of climate sensitivity that integrate the response of the climate system to different forcings operating from multidecadal to multicentennial timescales. It should be kept in mind that, because its definition is based on linear regression, LMTCR addresses only the quasi-instantaneous response of temperature to forcing changes. Any nonlinear feedbacks or delayed adjustment of temperature do not fall into this definition, which contrasts with the concept of TCR and ECS. Figure 8c shows LMTCR values with estimated regression errors obtained from simulated NH temperatures and TEF. Results for the models and forcings within the STSI and ssTSI groups are shown on the same panel and compared to model TCR and ECS values (Table 5). We only consider TEF and simulations that include volcanic forcing to allow sensible comparison with reconstructions (see discussion below). We establish ranges for LMTCR based on the minimum and maximum values obtained within the STSI and ssTSI groups. The resulting intervals have similar minimum values in the STSI and ssTSI groups, but their width is narrower for the ssTSI group (0.17-0.49 KW −1 m 2 ) relative to the STSI group (0.23-0.58 KW −1 m 2 ). As more simulations become available from CMIP5-PMIP3 ) the ssTSI range may potentially change. It is interesting to note that LMTCR values do not necessarily change when grouping the models into STSI and ssTSI, since EC5MP-E1 and E2 LMTCR estimates overlap with each other. Moreover, LMTCR estimated ranges are, as expected, lower than ECS values and overlap with the range of values in TCR. This is a reasonable feature since, as commented above, TCR and particularly ECS include system readjustments that involve nonlinear relationships, either in a monotonously warming climate simulation (TCR) or in equilibrium simulations (ECS).
It should also be kept in mind that ECS and TCR are climate sensitivity estimations based on global temperature response, while LMTCR is herein obtained based on simulated and reconstructed NH temperatures.
We now turn the focus to address whether the real system response to external forcing can be also considered linear from the evidence provided by the reconstructions. Figure  9a shows correlations that quantify their linear relationship with external forcing. Since it is not known which one of the available forcing specifications better represents the real past forcing, reconstructions are cross-compared with all external forcing configurations. Also, the reconstructed temperature response arguably includes the effects of volcanic activity, thus the forcing configurations that do not consider volcanic activity (e.g. IPSL) are excluded from this analysis. The correlations between the reconstruction ensemble average in Fig. 4b and all forcing series are also shown. Some of the correlations values of reconstruction-forcing pairs are lower than the minimum values obtained between simulations and forcing (grey shaded area), but most correlations are above this threshold, with maximum values reaching 0.98. The ensemble average shows correlation values around 0.6, the highest (lowest) being attained for the CCSM3, CSIRO and ECHO-G (EC5MP, CNRM and CSM1.4). Overall, this supports the hypothesis that the reconstructed temperatures show a linear relationship to the total forcings used by models.
The analysis can finally be extended to consider the LMTCR values that may be obtained from the regression of NH temperature reconstructions and TEF, and whether these may be consistent with those attained from simulations. However, some comments are pertinent at this point. Since LMTCR is based on regression estimates, the resulting values will depend both on the correlation between temperatures and TEF and on the ratio of temperature and TEF standard deviations. This implicitly establishes the requirement that reconstructions and TEF show covariability in time, a feature that is supported by many of the temperature-TEF pairs in Fig. 9a. However, if the correlation between both variables is low, this will bias the regression between reconstruction and forcing to zero. Figure 9b, c show LMTCR values obtained from the reconstructed NH temperatures after regressing each reconstruction with all possible forcings in the ssTSI (b) and STSI (c) groups (excluding those that do not consider volcanic contributions). Each estimate obtained from a reconstruction-forcing pair is plotted with the corresponding colour of the model that the forcing is ascribed to. The temperature-forcing pairs depicting low correlations (grey shading in (a)) are shown in grey. The results can be compared with the LMTCR ranges estimated from simulations (dashed lines), thus providing a metric of consistency. The LMTCR values obtained for the reconstruction ensemble lie well within the simulated LMTCR ranges both for ssTSI and STSI. In the STSI case, LMTCRs cluster around a value of 0.3 KW −1 m 2 , near the low end of the model range. Most,   Table 4 and the various estimations of TEF (Fig. 1h). Only the configurations that include volcanic forcing have been considered. The correlations with the reconstruction ensemble average in Fig. 3 are also shown. Colours identify the model to which forcings correspond according to the labels. but not all, LMTCR values derived from individual reconstructions lie within the simulated LMTCR ranges. Some of the values fall out of the model range both for ssTSI and STSI forcings due to low correlations between temperature and forcings. Also, some temperature-forcing pairs showing relatively high correlation stand out of the model range.
In the ssTSI panel (Fig. 9b), the simulated range is not consistent with the reconstruction of Huang (2004), Frank et al. (2007), and Christiansen and Ljungqvist (2012a) that show larger LMTCRs. Since these reconstructions present relatively high correlations, this discrepancy can be traced to a high ratio of temperature vs. TEF variability (standard deviation) relative to that in the simulations. In the STSI panel (Fig. 9c) the reconstruction of Rutherford et al. (2005) indicates a lower LMTCR than the simulation, while that of Christiansen and Ljungqvist (2012a) is above the model range for all forcing configurations (except marginally that of CNRM). Therefore, this analysis highlights discrepancies between reconstructed and simulated climate that report different rates of temperature response to forcings.
As a last comment, it can be argued that LMTCR adds a possible quantitative framework beyond the temporal comparison presented in Fig. 4. This new insight in the analysis of model-data consistency leans on two requirements in the reconstructed-simulated climate response. One is that past temperatures show covariability with forcings above multidecadal timescales. The other is that the rates of temperature change to forcing are comparable. If both of these are met, most reconstructions agree with simulated LMTCR ranges, albeit with some clear discrepancies, either based on a low correlation with forcing (e.g.  or based on reflecting different rates of temperature change compared to simulations (e.g. Christiansen and Ljungqvist, 2012a;Rutherford et al., 2005). No conclusive statements can be drawn regarding which level of solar variability is more realistic based on this assessment of consistency between reconstructions and simulations.

Conclusions
This work addresses the temperature evolution over the last millennium from the perspective of the available AOGCM simulations and of the forcings used to drive them, as well as the analysis of independent information provided by proxybased reconstructions. This enables the exploration of the consistency between the reconstructed and simulated forced temperature response and the evaluation of the importance of internal variability at hemispheric scales.
A catalogue of the forcing factor reconstructions used to drive last millennium AOGCM simulations has been presented and discussed. The set of simulations considered in this paper is those available prior to the CMIP5-PMIP3 project. Therefore, this part complements the information provided in Schmidt et al. (2011). The forcing configura-tions used by the different models have been classified into two groups according to the amplitude of low-frequency TSI changes.
An update of the set of reconstructions used in Jansen et al. (2007) has been presented for the NH, as well as available evidence for SH and the globe. The warmer medieval climate, followed by a cooler period and the subsequent warming during the industrial period are evident on both hemispheric and global scales. For the NH, the higher medieval temperatures are comparable to those in the first half of the 20th century and lower than instrumental measurements of the last few decades. This is the case for all but two reconstructions (Loehle and McCulloch, 2008;Christiansen and Ljungqvist, 2012a). A robust assessment of past variability for the SH and the globe would require more proxy reconstructions. Overall, the ensemble presented herein conveys a similar message to that in AR4. Variability and uncertainty are significantly increased, albeit mostly due to the influence of one reconstruction (Christiansen and Ljungqvist, 2012a).
A total of 26 forced simulations performed with 8 different models have been considered. This constitutes an improvement from the five members considered in Jansen et al. (2007) and allows additional insights into the relative roles of internal and forced variability. For multidecadal and lower frequencies there is a high degree of linearity in the simulated temperature response to the forcings imposed, despite the presence of internal variability.
Overall, reconstructed and simulated temperatures tend to agree on multicentennial timescales, albeit with some differences. An example is the period covering the 10th and 11th centuries, where proxies indicate higher temperatures that are not supported by the simulations and the reconstructions of external forcing. Models and reconstructions both agree that there was a warmer MCA followed by a colder LIA. The spatial distribution of changes during this transition is characterized in the simulations by a land-ocean thermal response with polar amplification at high latitudes. Internal variability contributes to pronounced inter-model differences, particularly in the low solar forcing variability scenarios. The spatial pattern of the simulated response shows little resemblance with that obtained from reconstructions ). If we rely on the information provided by multiproxy reconstructions, it is arguable that either the spatial pattern of changes for the MCA-LIA was largely influenced by internal variability or that transient simulations fail to correctly reproduce the potential causal mechanisms of response to external forcing.
We introduce an evaluation of the system response to external forcing based on regression estimates of the rates of temperature-to-forcing changes. This approach does not consider other system feedbacks and delayed responses that are implicit in the definitions of ECS and TCR in future climate change experiments. In fact, the ranges of calculated LMTCR are always lower than ECS and at the most overlap in some cases with TCR. The LMTCR of the mean of the ensemble of reconstructions and that of most single reconstructions agree with those of the simulated ranges. However, some reconstructions are identified for which a clear disagreement can be quantified with simulated values. These discrepancies could not be screened on the basis of the previous qualitative comparison of consistency among time series and uncertainties.
The definition of a LMTCR establishes an additional metric to compare reconstructed and simulated temperature. This may be considered a convenient approach for the climate of the last millennium, during which the magnitude of forcing changes is comparatively much lower than that of other past climate transitions (e.g. Last Glacial Maximum to Holocene, Crucifix, 2006). The implicit requirement in LMTCR of a covariability between reconstructed temperatures and external forcing changes arises, from the results shown herein, as a reasonable feature that should be expected to be found in reconstructions at multidecadal and longer timescales.