A regional climate palaeosimulation for Europe in the period 1500–1990 – Part 2: Shortcomings and strengths of models and reconstructions

. This study compares gridded European seasonal series of surface air temperature (SAT) and precipitation (PRE) reconstructions with a regional climate simulation over the period 1500–1990. The area is analysed separately for nine subareas that represent the majority of the climate diversity in the European sector. In their spatial structure, an overall good agreement is found between the reconstructed and simulated climate features across Europe, supporting consistency in both products. Systematic biases between both data sets can be explained by a priori known deﬁciencies


Introduction
Confidence in projections of future climate change is supported by a better understanding of current and past climate change and by the assessment of the skill of climate models in simulating past and present climate variations (Schmidt et al., 2014).In turn, evidence about the climate in preindustrial times stems from various sources, such as instrumental observations, documentary evidence, environmental proxy archives or climate simulations.Given this variety, gaining reliable insight into past climate variability requires climatological, statistical and dynamical consistency across these different sources, especially between reconstructions and simulations.However, numerous uncertainties affect the assessment of past climate variability.
Disagreements between simulations and reconstructions may be caused by deficiencies in reconstruction methods (e.g.Tingley et al., 2012), by model limitations that reflect inadequate spatial resolution and missing and simplified (parameterised) physical processes (Gómez-Navarro et al., 2011, 2013) or by both.Beyond these methodological shortcomings, both data sources ultimately rely on inferences from environmental archives, since simulations require, to some extent, input from reconstructions of past forcing data, for instance input related to changes in solar and volcanic activity or land use changes.Environmental proxies (Evans et al., 2013) record influences of various environmental factors and, in turn, palaeo-observations do not necessarily perfectly reflect one particular environmental variable (e.g.Franke et al., 2013).Rather, they usually only explain part of the variability in the variable of interest.
In addition to shortcomings in the data sets, internal variability may become dominant compared to externally forced signals in the variable of interest, especially on a regional scale (Gómez-Navarro et al., 2012).This implies that a single model simulation represents only one possible realisation, among an infinite number, of a possible past climate evolution constrained by initial and boundary conditions and the presence of unforced natural internal climate variability.Thus, a perfect agreement with reconstructions cannot be expected on local scales.An important aspect of modelreconstruction comparison exercises relates to the fact that part of the simulated and reconstructed variability is associated with non-climatic effects due to the intrinsic characteristics of the reconstruction methods, i.e. model deficiencies and proxy-specific error terms.On larger scales (continental to global), it is assumed that the random internal variability is averaged out.However, a recent comprehensive study indicates that, even on continental scales, (global) climate models fail to reproduce specific periods in the historical past, especially over the Southern Hemisphere and periods immediately following volcanic eruptions (PAGES2k-PMIP3 group, 2015).
In addition, internal modes of climate variability may respond to external forcing events, such as large tropical volcanic eruptions (Yoshimori et al., 2005;Zanchettin et al., 2012) or variations in components of changes in solar activity (Shindell et al., 2001;Vieira et al., 2011).However, especially the influence of low-frequency solar activity changes on climate and climate variability, is still under discussion (Gómez-Navarro and Zorita, 2013;Anet et al., 2013Anet et al., , 2014;;Raible et al., 2014).Environmental archives integrate these internal variations, and while climate simulations cannot be expected to replicate the exact unforced variations, they ide-ally should be capable of replicating the forced variability (if they include the relevant processes).This is particularly the case for surface air temperature (SAT) and to a lesser extent for precipitation (PRE) as both variables are thought to be sensitive to the external forcing variability during the last millennium (Gómez-Navarro et al., 2012).
Attempts to reconcile climate simulations and reconstructions are further hampered by fundamental differences in the characteristics of the information they provide.Simulations and reconstructions represent data on different spatial and temporal scales.Simulations provide information with high temporal resolution, spatially averaged to the gridcell size.Reconstructions are based on archives affected by local (environmental) climate conditions.Additionally, the specific relation between local-and large-scale environmental factors is only partially constrained (Kim et al., 1984).Various approaches exist for combining the information obtained from reconstructions and simulations.Among them are proxy-forward models (Phipps et al., 2013;Evans et al., 2013), data assimilation (Goosse et al., 2006(Goosse et al., , 2012;;Widmann et al., 2010) and proxy surrogate reconstructions, i.e. analogue methods (Franke et al., 2010;Luterbacher et al., 2010a).In addition to these techniques, dynamical and statistical down-and upscaling methods are currently introduced (Gómez-Navarro et al., 2011;Wagner et al., 2012;Gómez-Navarro et al., 2013;Eden et al., 2014).
Dynamical downscaling is based on the implementation of a regional climate model (RCM), driven at its boundaries by a global circulation model (GCM).This allows spatially highly resolved climate simulations over limited areas, consistent with the driving model.This downscaling approach provides the potential to bridge the spatial scale gap between simulated and reconstructed estimates of past climate variability.Besides refining the spatial resolution of the model dynamics, the more highly resolved orography of regional simulations also allows for an improved representation of the regional scale boundary conditions.This approach has been successfully applied over the Iberian Peninsula (Gómez-Navarro et al., 2011) and the Baltic Sea (Schimanke et al., 2012).However, the relatively low number of available regional palaeoclimate simulations is a fundamental restriction.Recently, Gómez-Navarro et al. (2013) have shown how a high-resolution regional climate simulation with the RCM MM5 (Mesoscale Model version 5) is able to improve the performance of its driving GCM when compared to 20thcentury observations over Europe for the distributions of precipitation over regions with complex terrain.
Despite the limitations of climate models, a potential benefit relates to their dynamically consistent estimates for different variables because the evolution of the climate within the model is produced by the application of well-known physical conservation laws.This allows us to assess, through a suitable comparison between reconstructed and simulated climates, to what extent the reconstructions provide dynamically consistent estimates of past climate variability.Like-wise, it permits us to evaluate the consistency of climate reconstructions for different variables, their spatio-temporal distributions and their main variability modes.
Here, we extend the previous assessment of Gómez-Navarro et al. ( 2013) by evaluating the level of agreement between a regional simulation over Europe for the period 1500-1990 and available reconstructions of seasonal SAT and PRE.We focus our analysis on regions where Gómez-Navarro et al. (2013) found that the regional model provides added value beyond the skilful spatial scales of the global climate model.This way we increase our confidence not only in potential agreement between simulations and reconstructions but also in the conclusions we can draw from potential disagreements.That is, we do not benchmark the simulation against the reconstruction, instead we jointly analyse both uncertain estimates with the aim of increasing our understanding of past seasonal climate changes in Europe.
The manuscript is organised as follows: in the following section we introduce the observations, simulation and reconstructions used for analysis, including a short overview of the methods.In Sect. 3 we discuss the past climate evolution in terms of seasonal surface air temperature and precipitation variability present in the data for a number of European subregions.We analyse the evolution of probability density functions of precipitation and temperature.In Sect. 4 we turn our attention from the temporal agreement towards the variability modes; we first compare the dominant reconstructed and simulated variability modes (Sect.4.1) for temperature and precipitation.Then, we investigate the consistency between these variables and sea level pressure in terms of canonical correlation.A discussion and subsequent concluding remarks close the study.

Climate simulations
Our analysis uses the output of a high-resolution climate simulation carried out with a RCM over Europe for the period 1500-1990.The RCM consists of a climatic version of the meteorological regional model MM5.This simulation is driven at its boundaries by the GCM, ECHO-G.The horizontal model resolution is 45 km, and its domain covers Europe almost entirely (see Fig. 1).This nesting set-up is referred hereinafter as MM5-ECHO-G.Both models are driven by identical reconstructions of several external forcings to avoid physical inconsistencies: greenhouse gases, total solar irradiance (TSI) and the radiative effect of tropical volcanic events.This simulation is described in detail by Gómez-Navarro et al. (2013), including a discussion of the skill of the model MM5-ECHO-G in reproducing the European climate against gridded observational precipitation and temperature data sets.Results of this validation indicate an added value with respect to the driving GCM.However, there are still deviations between the regional simulation and the ob- servations.Prominent problems relate to the divergent 20thcentury temperature trends.Gómez-Navarro et al. (2013) argued that this could originate from missing anthropogenic aerosol forcing in the simulation, which is an important factor with a potential net cooling effect, especially in the second half of the 20th century (Andreae et al., 2005).Furthermore, the driving simulation with ECHO-G simulates a strong positive trend in the North Atlantic Oscillation (NAO) index under anthropogenic forcing, which is absent in the observations.This leads to a negative trend in winter precipitation in southern Europe and a positive trend in near SAT over northern Europe.

Observational data sets
The analysis employs various observational data sets: SAT and precipitation are taken from the monthly data set developed by the Climate Research Unit (CRU) at the University of East Anglia (Harris et al., 2014).This global gridded product includes several climatic variables over land areas with a spatial resolution of 0.5 • × 0.5 • for the period 1901-2005.In this comparison exercise only temperature and precipitation series up to 1990 are considered, since this is the overlap period between observations and simulation.The data are bilinearly interpolated onto the MM5 grid to provide a suitable basis for comparison.To maintain consistency with reconstructions, only land points are considered for the comparison.
The sea level pressure (SLP) field consists of monthly means of this variable extracted from the NCEP reanalysis for the period 1948-1990(Kalnay et al., 1996)).This data set has a spatial resolution of 2.5 • × 2.5 • , slightly higher than ECHO-G, and has been used on its original grid without any further spatial interpolation.

Gridded reconstructions
We use climate reconstructions for three variables: winter and summer SAT, PRE and SLP.In particular, we use the gridded data sets by Luterbacher et al. (2004Luterbacher et al. ( , 2007) ) for SAT and Pauling et al. (2006) for precipitation.Both data sets consist of seasonal series on a 0.5 • × 0.5 • regular grid over land areas of Europe.Similar to observations, these data sets were interpolated onto the MM5 grid prior to analysis.These reconstructions are based on a large variety of long instrumental series, indices from historical documentary evidence and natural proxies (see Luterbacher et al., 2004Luterbacher et al., , 2007;;Pauling et al., 2006, for details).The basis for the reconstruction is related to the use of linear methods (i.e.principal component regression).Despite the underlying assumptions, e.g. the stationarity of the relationship between the proxy and the climatic variable, the method is able to provide gridded fields for both temperature and precipitation.Luterbacher et al. (2004Luterbacher et al. ( , 2007) ) and Pauling et al. (2006) critically addressed the uncertainties and skills of their reconstructions, especially in the early period of the 16th and 17th century, when fewer records and only those with lower quality are available.Pauling et al. (2006) also provide performance maps for their precipitation reconstruction.This allowed for a rigorous assessment of the spatial pattern of the reconstruction's skill.An important characteristic of the reconstructed precipitation in contrast to reconstructed temperature relates to the large spatial heterogeneity caused by a considerably shorter spatial de-correlation distance of precipitation.This characteristic becomes critical when attempting to reconstruct hydrological fields from a sparse network of proxy data (Gómez-Navarro et al., 2014).
Additionally, the SLP reconstruction by Küttel et al. (2010) is used, which is based only on station pressure data and ship logbook information; it is thus completely independent from the SAT and PRE reconstructions.This selection ensures that the dynamic consistency between SLP and SAT and PRE reconstructions can be assessed avoiding circularity (Luterbacher et al., 2010a, b).This data set has a resolution of 5 • × 5 • and spans the period 1750-1990.

Framework of the joint analysis of simulated and reconstructed climate
As discussed in the introduction, besides model and reconstruction errors, the presence of internal variability and reconstruction-specific errors a priori prevents perfect agreement between the temporal evolution of the simulated and re-constructed climate variables (Gómez-Navarro et al., 2012).
A simple way to partially ameliorate this problem is lowpass filtering the climate series.The underlying argument is that the ratio of forced to internal variability increases at lower frequencies.Since the degree of required filtering is unknown, we apply a multi-decadal 31-year running mean using a Hamming window.
In the following we compare the temporal evolution of SAT and PRE as simulated by MM5-ECHO-G with the reconstruction of Luterbacher et al. (2004Luterbacher et al. ( , 2007) ) and Pauling et al. (2006), respectively, in nine European subdomains (Fig. 1).The separation into these nine subregions is a compromise between being able to amalgamate information and taking into account Europe's climatic complexity.The division is based on the guidelines for coordinated efforts such as the project PRUDENCE (Prediction of Regional scenarios and Uncertainties for Defining EuropeaN Climate change risks and Effects) (Christensen and Christensen, 2007).We restrict the analysis to the period prior to 1900 to prevent an overlap with the calibration period.As the reconstructions are calibrated using the observational or reanalysis data sets, they should basically agree with the observations used in Gómez-Navarro et al. ( 2013) for validation purposes.The authors highlighted the general overestimation of temperature trends in the simulation during this period, which is strongest for winter in northern Europe.Similarly, precipitation trends in observations and the simulation during the 20th century are often not consistent.We note the contrast between observed wetter conditions and simulated drying in southern Europe in winter.Gómez-Navarro et al. (2013) also found that the regional simulation improved the representation of the observed climatology in the European subdomains of Scandinavia and the Baltic Sea (SCA), Britain and Ireland (BRI), the Iberian peninsula (IBE), the Alps (ALP), the Balkan peninsula (BAL), the Carpathian region (CAR) and Turkey (TUR) relative to the global simulation, whereas the representation did not improve much for central Europe and eastern Europe.The reasons mostly pertain to the complex terrain over those regions including a more complex coastline, whereas central and eastern Europe do in general show less complex topographic characteristics.Therefore, we restrict our analysis to those five regions which show an added value in the regional simulation.
A simple comparison between the reconstructed and simulated time series might be misleading given the presence of internal variability in the simulation.For this reason, we also use empirical orthogonal function (EOF) analysis to identify the main variability modes of mean seasonal SAT and PRE.These patterns are not critically dependent on the precise temporal evolution within each data set.Thus, they facilitate the comparison of the climate variability reproduced by the model and the reconstructions.Similarly, canonical correlation analysis (CCA) helps to identify the representation of the spatial co-variability between climate variables in a linear sense, which indicates potential underlying physical mechanisms.Thus, this statistical tool allows us to assess the dynamical consistency among different reconstructions.The two aforementioned techniques are widely used in climate research; therefore, we provide only a brief introduction here (the reader is referred to von Storch and Zwiers (1999) for a comprehensive overview).
The basic philosophy of EOF analyses relates to decomposing the spatial (anomaly) fields of the climate variable under consideration into patterns representing most of the variable's variance.An important characteristic of the resulting patterns (denoted as EOFs) and their corresponding timedependent amplitudes relates to the fact that they are mutually orthogonal in space and time.From a statistical point of view this characteristic is often of interest, but from a more physical point of view the interpretation of the EOF patterns may be complicated because the real-world processes and patterns are not necessarily orthogonal.Therefore, the physical interpretation of EOFs has to be performed with caution, especially when consecutive EOFs explain a similar amount of variance compared to EOFs with a higher index.To overcome this limitation, several techniques have been proposed to rotate EOFs.They allow us to obtain other variability patterns as a result of linear combinations of the original ones.However, there is no unique criterion to perform such a rotation, and thus results are affected by a certain degree of subjectivity (von Storch and Zwiers, 1999).Given that in this study we are concerned with the way variance is distributed throughout the spectrum of EOFs rather than with obtaining physical meaning from such modes, we restrict the analysis to the standard EOFs.
CCA is a technique related to EOF analysis.It also decomposes the original variable into a number of components or patterns.However, in this case the aim is to identify pairs of patterns in two variables whose temporal component in the original series exhibits a maximal temporal correlation.Similarly to EOFs, the resulting CCA pairs of time series are ranked according to their mutual correlation, although an important difference compared to EOFs is that, in this technique, the canonical pairs do not form an orthogonal decomposition of the original space.Instead, the CCA time series corresponding to consecutive pairs are uncorrelated in time.
Often the most physically meaningful information is spanned by the leading CCA patterns, although the associated patterns may not explain the largest amount of variance.An advantage of CCA for our purposes is that it helps to disentangle the most important (canonical) relationships between climate variables in the observations, the reconstructions and simulations.Hence, from a physical point of view the leading patterns should show similar characteristics when the mechanisms leading to the relationships between the climate fields are controlled by the same processes.Conversely, deviations from this behaviour are indicative of physical inconsistencies among variables.
3 Temporal agreement of regional series and climatologies

Regional time series
Figures 2 and 3 depict the evolution of the averaged winter and summer SAT, respectively.It is estimated as the pointwise median value within each subregion in the ECHO-G-MM5 model, in the driving GCM and in the Luterbacher et al. (2004Luterbacher et al. ( , 2007) ) reconstruction.For the sake of brevity, the figures corresponding to the intermediate seasons are shown in the Supplement, but the respective main characteristics are also outlined here.As mentioned in Sect.2, the series are low-pass filtered (with a 31-year Hamming low-pass filter) to emphasise the low-frequency variability.The evolution of the 25-75 interquartile range is also shown in order to illustrate the heterogeneities within each subregion.A first result is the reduction in warm biases in winter through downscaling the GCM output, mainly over areas of a strong land-sea contrast near the Mediterranean (Fig. 2 and  3).The width of the interquartile range is similar in the data sets, although the GCM exhibits a larger width of the probability density function (PDF) of winter SAT in the BAL and SCA regions.In summer (Fig. 3) the RCM is not able to reduce biases clearly, and both simulations are generally too cold.It is noteworthy how the RCM increases the width of the PDF compared to the driving GCM, resulting in better agreement with the reconstructions.Intermediate seasons (see Supplement) show a more heterogeneous pattern.Absolute biases in autumn are generally smaller: ECHO-G exhibits biases that are positive and negative depending on the season, whereas MM5-ECHO-G is systematically colder.A similar behaviour is found in spring, when the RCM simulations are slightly but consistently colder than reconstructions.However, the sign of the biases is reversed across areas, and also in different seasons, which precludes drawing a simple picture of the behaviour of biases.The added value of the RCM becomes more clear-cut in the width of the PDF in areas of complex topography such as ALP or IBE, where the GCM produces too small a variability (Figs. 2 and 3).These results resemble those described for observations (see Fig. 10 in Gómez-Navarro et al., 2013).This is an indication that the biases between the simulation and the reconstructions are probably associated with model deficiencies (e.g.too zonal a simulated atmospheric circulation) rather than with potential errors in the gridded reconstructions.Similarly, variability is larger in winter than in summer in both data sets as well as in northeastern areas (note the different scales in Fig. 2 and 3).This agreement is related to the skill of the model set-up to reproduce the general climatic features of the European climate (Gómez-Navarro et al., 2013) and the fact that the reconstructions are calibrated with observational records over the 20th century.Hence, this agreement is linked to the consistency of both data sources and their ability to reproduce the observed climate during the 20th century.
Focusing on the temporal evolution, the RCM follows the evolution of SAT of the GCM.Therefore, the following discussion is solely based on MM5-ECHO-G.Both the reconstruction and the RCM simulation generally agree better in their low-frequency evolution over northern Europe.Over southern Europe no clear-cut similarities are found.Regarding the centennial to decadal evolution, the simulation and reconstruction generally agree until 1700.There are anomalous episodes which appear to be synchronised between different regions (Fig. 2 and 3).This can be seen for both the reconstructions and the simulation and is indicative of prominent anomalies taking place on larger spatial scales.However, these episodes are not synchronised across both data sets, indicating that these decadal variations might be unrelated to variations in external forcings.Since the early 19th century, the simulated summer and winter temperatures show a clear warming trend across all regions.The trends in reconstructed temperatures start rising later, are generally lower and/or restricted to one of the two seasons.Thus, regional decadal anomalies of simulated and reconstructed data diverge for most regions over the past approximately 200 years.However, disagreement on decadal scales increases in some regions as early as the beginning of the 18th century.While IBE, BRI, ALP, BAL and TUR reconstructed and simulated series start to diverge in the early or the mid-19th century, CAR and SCA show pronounced anomalies in the 18th century which lead to large simulation-reconstruction deviations.This is also seen in the central and eastern European domains.Overall, there are no statistically significant correlations between the filtered series of reconstructed and simulated SAT (taking into account the presence of serial autocorrelation in the filtered time series).Also, the temporal agreement does not show any seasonality signal.Considering that SAT is potentially strongly influenced by the external forcings (Gómez-Navarro et al., 2012), the lack of agreement points toward inconsistencies between the smoothed simulated and reconstructed SAT that cannot be explained by internal variability alone.
The time series of seasonal precipitation are shown in Figs. 4 and 5.In contrast to temperature, the RCM improves the seasonal precipitation compared to the driving GCM, which is in agreement with earlier findings (Gómez-Navarro et al., 2013).This is mainly due to the fact that precipitation processes are more notably influenced by orographic features, which are better resolved in the RCM.Similarly to SAT, there are noticeable biases that can be explained with model deficiencies.For example, the model tends to overestimate winter precipitation in central and northern Europe in the observational period since 1905 (Gómez-Navarro et al., 2013), which generates a wet bias in SCA (also in CEU and EEU; see Supplement).It is noteworthy that biases are not as prominent in summer.This is also the case when the model is compared to observations for the 20th century (Gómez-Navarro et al., 2013).Indeed the RCM is able to improve the general underestimation of precipitation of the GCM in summer (Fig. 4 and 5).In autumn and spring, biases are generally smaller and do not show any systematic sign because the systematic biases in the zonal circulation play a minor role in the precipitation during these seasons.Independently from the biases, the agreement between simulation and reconstruction is expected to be lower for this variable due to the great importance of internal and small-scale variability in precipitation (Gómez-Navarro et al., 2012, 2014).
A comparison between seasonal reconstructed and simulated precipitation shows less variability in northern than in southern areas.The temporal variability appears to be particularly large in areas of complex orography such as ALP, TUR or IBE.Both data sets show strong low-frequency variations in most regions with pronounced dry and wet episodes over the period 1500-1900.However, these episodes are synchronised neither for both data sets nor for the two seasons (Fig. 4 and 5).Variability also appears to change over time.
For instance, simulated winter variability increases in TUR, whereas reconstructed summer variability weakens in CAR.
The most prominent features and discrepancies between reconstructions and the simulation are as follows.In the early 16th century, CAR and ALP suggest prominent summer dryness, which is absent in the other series.Reconstructions further show wet winters in BRI in the 16th century.There are hints of coherence between reconstructed and simulated summer ALP precipitation.Reconstructed summer precipitation in the 17th century indicates very wet conditions for CAR, BAL and ALP, while BRI summers appear to have been dry.Anomalous dryness is also seen in the early 18th century in summer in CAR, TUR, BAL and ALP reconstructions, while summers were wet in BRI and SCA during that period.Winter wetness in the 19th century is prominent in many regions in the simulation (Fig. 4).
A regional peculiarity is a pronounced alternation between drier and wetter conditions with diminishing amplitude and a shortening period between 1500 and 1800 in reconstructed CAR summer precipitation.Variations in TUR winter precipitation are very large in the simulation but rather low in the reconstruction.Iberian winter precipitation shows an apparent antiphase between simulation and reconstruction.
In summary, we do not find clear temporal agreement between the simulation and the reconstructions, especially for PRE.Although forcing leaves an imprint in the simulated SAT, no general congruence between the simulation and reconstructions is found.Pronounced anomalous periods are evident in reconstructed winter temperature in the early 18th century and in reconstructed 17th and 18th summer precipitation but are absent in the simulation.Section 3.3 assesses the anomalies in some key periods in more detail.

Evolution of climatological PDFs
The nine regions depicted in Fig. 1, are comparatively large in their spatial extent.Indeed, they often include very different climatic characteristics, where the model produces opposite biases (see Figs. 4 to 8 in Gómez-Navarro et al., 2013).Further, the mean value potentially discards valuable information, such as regional deviations or a widening of the distributions of temperature or precipitation within a region in different periods of time.To account for this important aspect of climatic variability, Figs. 2 to 5 also show the interquartile range time series of the spatial distribution of the seasonal means of grid-cell temperature and precipitation within each region.This range is used as a proxy for the actual PDFs, which are not shown to avoid figures that are too complex.This range provides information beyond the mean value alone, also enabling the evaluation of the evolution of SAT and PRE spatial PDFs within regions, particularly the presence of skewness in the distributions.Low-frequency variability in the median generally translates to variability in the PDF, i.e. the distributions shift in time as a whole, with little changes in their shape.This indicates that the median is a valid indicator for the regional evolution of all percentiles.The relation holds less well for precipitation, especially in summer and to a larger degree in the reconstruction.This is potentially due to the convective and localised character of summer precipitation that leads to nonnormal PDFs (Gómez-Navarro et al., 2014).
The median series in Figs. 2 to 5 already suggest that differences between the Late Maunder (1675-1715) and Dalton (1780-1820) minima and the recent 20th-century climatology ) disagree between the simulation and reconstructions.However, while the percentiles reflect changes in the mean temperature, shifts in the distributions are rather small, of the order of 1 to 2 • C colder means and quartiles.Most notable is the cooling for both periods in the winter SCA temperature.Distinct precipitation changes occur only for SCA and only in winter, with low solar forcing periods being drier than the recent climatology (see Fig. 4).
The underlying temperature PDFs generally agree well in the simulation and the reconstruction, in contrast to the evolution of the median time series.Simulated winter temperature distributions are similar for IBE, SCA, BRI, TUR and ALP.Simulated summer temperature distributions are clearly biased towards a colder mean in all regions.Nevertheless, the shape of the distributions is generally similar (not shown).
The simulation and reconstruction disagree more regarding the PDFs of winter and summer precipitation.The differences between the Late Maunder Minimum, the Dalton Minimum and the late 20th-century climatology are spatially less homogeneous across regions.Generally, the mean is underestimated and the extremes are overestimated for southern European winter precipitation, while summers are generally less dry in those regions in the simulation.On the other hand, northern Europe shows the opposite for both seasons.

SAT anomalies during key periods
Given its relevance for assessing climate sensitivity and given that it is an important benchmark for climate reconstructions, we analyse SAT anomalies around a prominent cold period in the preindustrial period, the Dalton Minimum (DM).This event is characterised by the simultaneous occurrence of lower TSI and two strong tropical explosive volcanic eruptions.Fig. 6 shows the anomalies of winter and summer SAT in the simulation and the reconstruction.Note that the other seasons show an intermediate behaviour and are omitted here.The simulation (top row) exhibits a clear cold period, in particular in northeastern Europe in winter and central and southern Europe in summer.These results, and the particularly cold summers in Iberia, are consistent with results obtained by Gómez-Navarro et al. (2011).The reconstruction (bottom row) shows slightly negative SAT anomalies in northern Europe, particularly around the Baltic Sea.Compared to the simulation, the reconstruction exhibits no cold anomaly at all in summer.A similar spatial distribution as with the cold period mentioned above can be found for the period of the Late Maunder Minimum (1675-1715), and the comparison between reconstruction and simulation yields similar results, i.e. the model reproduces a stronger cold anomaly.Again, this lack of agreement can have multiple explanations.Given the relatively small variability in the reconstructions (see Gómez-Navarro et al., 2011, and the results in the next section), especially in summer, this mismatch might be partly attributable to an underestimation of variance in the SAT reconstruction.
A remarkable feature in reconstructed winter SAT is the strong warming trend during the first decades of the 18th century in several parts of northern Europe.Indeed, this warming is embedded in a very anomalous period characterised by a large climatic variability and culminating in an exceptionally cold winter in 1740 (Luterbacher et al., 2002;Jones and Briffa, 2006;Zorita et al., 2010).The anomalous warming trend is mostly detected in areas such as SCA or EEU and less notably so in ALP or CAR.Fig. 7 depicts the winter SAT anomalies in the reconstruction and the simulation in the 1700-1750 period with respect to the preceding century.There is an apparent warm anomaly in winter temperatures extending from northern to southeastern Europe.Such an anomaly is not reproduced in the simulation, neither in this nor any other period prior to the late 20th century.The in-  ability of the model to reproduce such a noticeable anomaly has several implications: on the one hand, internal variability could be responsible for such an anomalous event, rendering an agreement very unlikely.On the other hand, the fact that such an anomalous period is not reproduced in any other pe-riod of the simulation points towards fundamental limitations in the simulation, unrealistically restricting the spectrum of possible simulated extreme events (see also the discussions in Wetter et al. (2014).reconstructions The available gridded reconstructions of winter and summer temperatures, precipitation and sea level pressure allow us to not only evaluate the temporal evolution at certain locations but also to analyse the spatial structures of dominant modes of variability.Moreover, the temporal evolution of such variables in different periods and the relation between modes of different variables can be investigated with CCA.With this approach we gain insight into the dynamical consistency among reconstructions and between reconstructions and the simulation (Luterbacher et al., 2010a, b).

Modes of variability for SAT and PRE
Figure 8 shows the first EOF for winter (left) and summer (right) SAT for the CRU data set (top row), MM5-ECHO-G (middle) and the Luterbacher et al. (2004Luterbacher et al. ( , 2007) ) reconstructions (bottom row).The patterns are based on observations for the 1901-1990 period, whereas for the model and the reconstructions, they are calculated for the period 1500-1990.The time period used to calculate the EOFs appears to be of minor relevance.Indeed, the patterns are robust, exhibiting only minor changes when the 1901-1990 period is used in the simulation and reconstructions (see discussion below).The second and third EOFs, also representing a remarkable amount of variance, are discussed here just briefly and are shown in the Supplement.Note that the EOF patterns, i.e. the eigenvectors, are not normalised but contain the corresponding units for each variable, so the spatial integral of the square of the pattern is proportional to the variance explained by the respective pattern.In order to facilitate the comparison, the same colour scale is used in all maps.Therefore, the patterns are multiplied by different scaling factors, indicated in the top right corner of each panel (Fig. 8).
Reconstruction and simulation agree well on the shape of the main EOF pattern for winter and summer SAT variability.They represent similar amounts of variability (indicated in each map), and the total variance is also similar.Note, for example, that the scaling factors consistently vary among data sets and that summer maps had to be multiplied by a larger factor, indicating that summer series show less variability, as already pointed out by Gómez-Navarro et al. (2013) and discussed in Sect.3.Although only the leading variability mode is shown, this general conclusion applies also to the EOFs that have a higher index (see Supplement).The simulation, coherently with the observations, exhibits a monopole pattern centred over eastern Europe, whereas this centre is slightly shifted towards the Baltic Sea in the reconstruction in both seasons.The resemblance between observations and reconstructions increases when the 1901-1990 period alone is considered (not shown), resulting in the slight sensitivity of the pattern to the choice of period.Note that a resemblance between the CRU data and the reconstructions can be expected, especially when the same period is used for the calculation.This is due to the fact that the reconstruction is calibrated against observations and the reconstructions are bound by PCA regression to show very similar EOF patterns through the whole period (Raible et al., 2006).In the simulation, there is a larger agreement between 20th century and the full-period EOFs (not shown), suggesting that the main patterns of variability are not very sensitive to their respective base period and, more importantly, that the arguably short length of observation records appears to be adequate to calibrate the proxy data.
The simulated and reconstructed SATs tend to attribute more variance to the first EOF in winter (71 and 72 % of total variance in the model and reconstructions, respectively) compared to observations (61 %).This difference is stronger in summer, when the leading mode in the observations represents 36 % of the total variance compared to 57 and 48 % in the model and reconstructions, respectively.This indicates that the simulated temperature covariance matrix is too homogeneous, particularly in summer, which is a reminder of the limitations of climate simulations: the zonal circulation in the driving GCM is too strong.This leads to a circulation regime in the RCM that is reminiscent of that observed in winter.Regarding the reconstruction, the larger proportion of variance represented by the reconstruction's leading EOF highlights again that using a truncated EOF basis in the PCAregression results only in a partial representation of the true variability.The reconstructions and observations for summer temperature are broadly similar in the second and third EOFs, but the reconstruction in one is the same as the observation in the other and vice verse.They still show similar gradient-like patterns, with the direction of the greatest gradient slightly tilted in the simulation compared to that in the reconstructions and the observations.Figure 9 is similar to Fig. 8 but for PRE (higher-order modes of variability are shown in the Supplement).In winter all data sets agree well and show a strong north-south dipole with the node at about 55 • N.This pattern highlights the well-known difference between the Mediterranean area and northern Europe.However, although the spatial structure agrees, the first mode represents more variance in the reconstruction than in the observations.The simulated leading variability mode represents 34 % of the winter variance compared to 30 % in the CRU data set.However, the difference is larger in the reconstruction, where this mode explains up to 46 % of the total variance.In summer the leading mode of variability represents just 15 % in the observations.This can be explained by the fact that the precipitation regime is less influenced by the large-scale circulation.Despite the zonal circulation that is too strong in the driving global simulation, this variability is consistent with the regional simulation, where the leading EOF also represents a low percentage of variance (12 %).However, this is in strong contrast to the reconstruction, where the first EOF alone is able to account for 40 % of total variance.For the summer season, the spatial pattern of the observed and the simulated precipitation agree relatively well, while the north-south gradient observed in these data sets is changed mostly to a strong pole over the Alpine region, with a slight gradient to the northeast.The dominating first mode in the reconstructions shows that the reconstructed precipitation regime is too homogeneous.This conclusion matches similar findings obtained through pseudoproxy experiments (Gómez-Navarro et al., 2014), where it has been shown how the linear regression used in Pauling et al. (2006) tends to underestimate the high spatial variability in precipitation.
In the following we briefly describe how the main variability modes compare regarding the GCM and the RCM.This comparison allows identifying when the downscaling adds value, and it represents an aspect of the analysis not shown by Gómez-Navarro et al. (2013).The main variability modes of SAT exhibit very similar patterns in both models and seasons, although the GCM reproduces less spatial variability, as is to be expected from its coarser spatial resolution (not shown).The percentage of variability represented by the main mode is 72 % in winter, indistinguishable from the RCM (Fig. 8).In summer this percentage drops to 38 %, in better agreement with observations, although the spatial structure generally shows less resemblance to observations, with a lower southwest-northeast gradient.For PRE, the GCM compares worse than the RCM with CRU.In winter, the GCM is able to reproduce the characteristic main variability mode dominated by a north-south gradient shown in other data sets (see left column in Fig. 9).However, the imprint of orography that is clear in the RCM is not seen in the GCM, resulting in too spatially homogeneous a pattern.In summer, not only is the spatial structure not realistic, but the main variability mode also represents 26 % of variance.Thus, results indicate that main variability modes are similar in both simulations, resulting from the strong forcing provided by the GCM through the boundaries of the domain.Nevertheless, the RCM is able to add regional details to the simulated fields.However, this depends on the variable and season.SAT is more strongly influenced by the driving conditions than precipitation, where the presence of complex orogra-phy is more important.This is especially evident in summer, where precipitation in the GCM is barely able to reproduce the observed patterns.These results agree with similar findings described in other RCM studies (Gómez-Navarro et al., 2011, 2014).

Dynamical consistency between variables
CCA provides insight into the interrelation between different variables in the spatial domain.Comparing observed relationships with the corresponding simulated ones provides an assessment of the model skill.Evaluating these relationships in reconstructions of different variables gives an indication of the consistency among independent reconstructions (e.g.Luterbacher et al., 2010a).Figure 10 shows the canonical pair of patterns of SLP and SAT and of SLP and PRE with the largest canonical correlation as simulated by the MM5-ECHO-G and their counterpart in the observational record in winter.Note that in summer the evolution of temperature and especially precipitation is driven to a lesser degree by the large-scale circulation.This is reflected by small canonical correlations.Hence, CCA is more useful for the winter season, and therefore only results for this season are discussed in detail.
Figure 10 shows the results for the observations and the simulation in the control period.Considering the first canonical pair of SLP and SAT (top row), the canonical correlation is 0.93 for the observations.The patterns represent 42 % of total variance for SLP and 53 % for SAT, respectively.The SLP resembles the NAO pattern and is related to a northsouth gradient pattern in SAT.The physical explanation for this correlation is the well-known relationship between NAO and European temperature: a more zonal circulation in the north of Europe advects oceanic warm and moist air eastwards, leading to a positive temperature and precipitation anomaly in northern Europe (Luterbacher et al., 2010a).
A similar SLP pattern and physical mechanism is found for the SLP-PRE pair (third column), with a correlation of 0.95 (Fig. 10).The SLP-PRE pair roughly resembles the SLP-SAT pair, although the zonal circulation is shifted southwards.Despite the fact that the zonal circulation supports the same physical relation between variables, in this case the canonical correlation is lower (ρ = 0.75).The SAT pattern represents a large amount of variance and indeed resembles the leading EOF (see Fig. 8).The leading canonical pair of SLP-PRE exhibits a centre of high pressure in the North Atlantic which reinforces the northwestern component of wind and is responsible for increasing precipitation in western Europe, whereas it produces precipitation deficits in Norway and Turkey.This mechanism results in a strong link, producing a correlation of 0.91, although it explains a relatively small amount of winter precipitation variability (only 19 % in the simulation).
Using the period , where SLP reconstructions are also available, shows that none of the patterns for the longer period resembles the pair in the observations perfectly (compare Figs. 10 and 11), indicating that relationships between variables are sensitive to the period used.There are two potential reasons for this lack of robustness: first, the strong forcing in the 20th century may influence the canonical pairs either due to the strong anthropogenic trend in the zonal circulation in the driving simulation or due to a strong trend component in the temperature field.Second, we have to keep in mind the simplified covariance and the potentially reduced signal in the reconstruction.The simulated canonical pair of SLP-SAT has a canonical correlation of 0.79 whereas the correlation for the reconstruction is 0.28.Again, the canonical pairs appear to be dominated by the temperature variability.The leading pairs for reconstruction and simulation both show a temperature gradient from the southwest to the northeast, which is dynamically related to a slight wave-like disturbance of the zonal flow and related changes in the advection of air masses.The reconstruction and the simulation disagree on the location and character of flow centres.
The first SLP-PRE pair in the simulation (fourth row in Fig. 11) corresponds to the second canonical pair over the 1948-1990 period in the observations (not shown).Note that the first two pairs derived from observations are very similar, especially with respect to canonical correlations but also considering the representation of variances.However, the second pair represents more SLP variance than the first one.The separation between both pairs is more distinct in the longer period of analysis, and in that case the ranking of the two leading pairs is exchanged.Hence we show the third canonical pair for the reconstruction, which is the apparent dynamic equivalent to the simulated one but shows much smaller canonical correlations (0.12 in the reconstruction and 0.89 in the simulation) while representing a broadly consistent amount of variance.The small correlation signals that dynamical relations between both patterns may be weak.Indeed we would expect the NAO-like SLP pattern to link the intensified zonal flow to a decrease in precipitation in southern Europe, which is the opposite of the pattern implied by the reconstructed pair.

Discussion and conclusions
This study investigates agreements and disagreements between a regional climate (high-resolution) simulation for Europe and empirical proxy-based reconstructions for SAT, PRE and SLP from the 16th century to the 20th century.Our analyses complement the work by Gómez-Navarro et al. (2013), who compared the same simulation to observations for the 20th century.
Results indicate biases in regional means, especially noteworthy for summer temperature and winter precipitation.The biases between the simulation and reconstructions are similar to those described when comparing the model with an observational data set.In part, they are explained by an enhanced zonal circulation in the GCM simulation that cannot be substantially ameliorated by the RCM rather than being explained by deficiencies within the reconstructions.Although reconstructions and the simulation seem to correctly reproduce most of the spatio-temporal variability, there is little agreement in their temporal evolution.The mismatch in the temperature, especially in the last decades, can originate from the missing anthropogenic aerosol forcing in the simulation.Additionally, early instrumental time series can show warm biases caused by the lack of modern thermometer screens (Frank et al., 2007a, b).Although we do not necessarily expect the reconstructed and simulated temperature evolution to agree in the earlier periods due to the potentially dominant internal variability, we also acknowledge that the lack of stratospheric dynamics in both the regional and the global simulation may account for some disagreement.Specifically, too low a top atmospheric layer in the model and no ozone chemistry reduce the ability of the model to correctly represent the potential top-down influences of solar activity changes on the atmospheric circulation in the North Atlantic sector, e.g. the North Atlantic Oscillation, and in turn European climate variability (Shindell et al., 2001;Anet et al., 2013).Finally, the simplification of using reduced TSI for volcanic forcing might be an additional source of errors reducing the agreement between the simulation and reconstructions.
Obviously, the reconstructions also suffer from uncertainties, which have to be considered in addressing the reliability of the simulation by comparing it to the proxy-based data sources.A prominent disagreement is the winter warming trend within the first half of the 18th century (Jones and Briffa, 2006), which stands out in the reconstructions but is not present in the simulation.This disagreement could be an indication of too simplistic a simulated climate, which is not able to produce extreme situations comparable to this event recorded in the reconstructions.Also internal variability could dominate the temporal evolution, effectively hiding the imprint of external forcing on the regional scale.A further source of error complicating the comparison between models and reconstructions relates to method-specific nonclimatic errors.These can be related to simplified physics and too coarse a resolution in the models and proxy-type-specific uncertainties in empirical reconstructions.
Internal variability, reconstruction uncertainty and potential shortcomings of the simulation in representing forced climate may also explain the disagreement in the magnitude of change between recent decades and the periods of the Maunder and Dalton minima.Again, the lack of 20th-century anthropogenic aerosol forcing is likely the most important factor.
EOF and CCA analysis unveiled the lack of dynamic consistency between reconstructions and the weak explanatory power of dominant canonical pairs.Although this is not surprising, it highlights the large uncertainties in our estimates about past climates.This further implies that we should not expect to understand past climate changes based on one data source alone.On the other hand, the plausibility of simulated dynamics has to be assessed through tests with proxy-based hypotheses.
Other assessments of consistency among independent reconstructions have been carried out in the literature.Casty et al. (2007) employed gridded reconstructions of SAT, precipitation and geopotential height at 500 hPa to investigate combined patterns of climate variability over Europe for the 1766-2000 period.A prominent difference compared to the data sets employed in the present analysis is that the three reconstructions employed by Casty et al. (2007) use completely independent indicators, entirely based on instrumental data for each variable.This reduces the length of the reconstructions but in turn ensures independence, which enabled the authors to evaluate the consistency between reconstructions through EOF analysis applied to the combined fields of the three variables.The authors reported similar NAO-like behaviour to that described in this study for the observations and simulations, with the large-scale flow driving seasonal temperature and precipitation over Europe, especially in winter.They also analysed the co-variability between SAT and precipitation.This study carefully avoids establishing such a link, since the data sets used here are not fully independent (both SAT and precipitation reconstructions share some indications).However, the CCA approach adopted here allows studying the co-variability between SLP and the other two variables.The weaker and physically inconsistent link we identify, especially with respect to the Pauling et al. (2006) reconstruction, raises concerns about the reliability of these reconstructions.
Coordinated reconstruction efforts as, for instance, related to PAGES2k (Past Global Changes) (PAGES 2k Consortium, 2013) will increase the number of available proxy records.This, in conjunction with newly developed reconstruction methods, is expected to provide more realistic uncertainty estimates of the spatial fields and spatially averaged reconstructions.In addition, proxy system models (e.g.Evans et al., 2013) will provide a better basis for proxy-model comparisons as they enable a direct modelling of the proxy under consideration within the virtual world of a climate model.This may help to evaluate, e.g., the stationarity of proxyclimate relationships and the different sources and degrees of uncertainty implicit in empirical reconstruction methods.
In conclusion, although regional climates are generally better represented by the RCM compared to the driving GCM (Gómez-Navarro et al., 2013), the downscaling is not able to compensate for biases in the driving circulation.This leads to biases in the comparison with the reconstructions that are clearly attributable to model deficiencies.However, we cannot describe simulated and reconstructed anomalies with respect to today's climate as generally inconsistent, although the temporal evolution is different enough to raise concerns over the ability of the simulation to produce exceptionally anomalous situations comparable to those recorded by the SAT reconstructions during the first decades of the 18th century.Furthermore, the dynamical inconsistencies that we identify between the reconstructions of SLP, SAT and PRE hamper addressing the reliability of forced changes in the dynamics.It remains an open question whether a lack of common forced signals is due to weak forcing effects relative to the internal variability in the climate system, due to erroneous representation of climate dynamics in the model or due to uncertainty in the reconstructions.
The Supplement related to this article is available online at doi:10.5194/cp-11-1077-2015-supplement.

Figure 1 .
Figure 1.Topography and land mask implemented in the regional simulation, with a horizontal resolution of 45 km.The rectangles show the nine subregions used for more detailed analysis.IBE -Iberian Peninsula; BRI -Britain and Ireland; CEU -central Europe; EEU -eastern Europe; SCA -Scandinavian peninsula and Baltic Sea; CAR -Carpathian region; BAL -Balkan peninsula; ALP -Alps; TUR -Turkey.

Figure 2 .
Figure 2. Temporal series of winter SAT in the seven areas indicated in Fig. 1 that exhibit added value in the MM5-ECHO-G simulation compared to the GCM alone, according to Gómez-Navarro et al. (2013).The series corresponding to the three different data sets are shown with different colours: driving GCM (i.e.ECHO-G model alone) -black; RCM (i.e.MM5-ECHO-G) -orange; and gridded reconstruction (i.e. the Luterbacher et al. (2004) reconstruction) -blue.Bold lines correspond to the median, whereas the light shading indicates the 25-75 interquartile range to illustrate heterogeneities within each region.After the calculation of the annual values, the series are smoothed through a Hamming window of 31 time steps to emphasise the low-frequency variability.Note the different scale in different panels.

Figure 3 .
Figure 3.As Fig. 2 but for simulated summer SAT.

Figure 5 .
Figure 5.As Fig. 4 but for summer PRE.

Figure 6 .
Figure 6.SAT anomalies in winter (left) and summer (right) around the Dalton Minimum (1780-1820) with respect to the control period (1900-1990).Top and bottom rows show the results corresponding to the simulation and the reconstruction, respectively.

Figure 7 .
Figure 7. SAT anomalies in winter during the first decades of the 18th century (1700-1750) with respect to the previous century (1600-1700).Left and right maps show the results corresponding to the simulation and reconstruction, respectively.

Figure 8 .
Figure 8.First EOF of winter (left) and summer (right) SAT.Rows depict the results for the CRU data set (top), the MM5-ECHO-G simulation (middle) and the reconstructions (bottom).For the first case, the 1901-1990 period is employed, whereas for the other the period 1500-1990 is considered.Note that the patterns carry the units of the variable, and thus they are proportional to the square root of the variance that each pattern represents.Hence, and to facilitate the comparison, each pattern has been multiplied by a scaling factor, indicated in the top right corner of the figure.The percentage of total variance represented by each pattern is also indicated.The units are • C.

Figure 9 .
Figure 9.As Fig. 8 but for PRE.The units are millimetres per month.

Figure 10 .
Figure10.Canonical correlation pattern pairs of SLP and SAT (rows 1 and 2) and SLP and precipitation (rows 3 and 4) in winter.Each panel depicts the percentage of variance explained by each pattern and the canonical correlation associated with the pair.The results are calculated in the observational record (rows 1 and 3) and in the MM5-ECHO-G data set (rows 2 and 4) during the period 1901-1990.Note that the SLP has been obtained directly from the driving GCM, since the window of interest lies outside the RCM domain.As in Figs.8 and 9, the patterns have been multiplied by a scaling factor that allows using the same colour scale in every map.The SAT unit is • C, SLP is shown in Pa, whereas precipitation units are millimetres per month.