Northern Hemisphere temperature patterns in the last 12 centuries

We analyse the spatio-temporal patterns of temperature variability over Northern Hemisphere land areas, on centennial time-scales, for the last 12 centuries using an unprecedentedly large network of temperature-sensitive proxy records. Geographically widespread positive temperature anomalies are observed from the 9th to 11th centuries, similar in extent and magnitude to the 20th century mean. A dominance of widespread negative anomalies is observed from the 16th to 18th centuries. Though we find the amplitude and spatial extent of the 20th century warming is within the range of natural variability over the last 12 centuries, we also find that the rate of warming from the 19th to the 20th century is unprecedented in the context of the last 1200 yr. The positive Northern Hemisphere temperature change from the 19th to the 20th century is clearly the largest between any two consecutive centuries in the past 12 centuries. These results remain robust even after removing a significant number of proxies in various tests of robustness showing that the choice of proxies has no particular influence on the overall conclusions of this study.


Introduction
A number of Northern Hemispheric (NH) temperature reconstructions covering the last 1-2 millennia, using temperaturesensitive proxy data, have been made to place the observed 20th century warming into a long-term perspective (Ammann and Wahl, 2007;Briffa, 2000;Christiansen and Ljungqvist, 2011;Cook et al., 2004;Crowley and Lowery, 2000;D'Arrigo, 2006;Esper et al., 2002;Hegerl et al., 2007;Jansen et al., 2007;Jones et al., 1998;Jones and Mann, 2004;Juckes et al., 2007;Ljungqvist 2010;Mann et al., 1999Mann et al., , 2008Mann and Jones, 2003;Moberg et al., 2005;Osborn and Briffa, 2006). Temperature variability during the last 1-2 millennia on a regional scale has been studied for the Arctic region by Kaufman et al. (2009), for eastern China by Ge et al. (2006), Ge et al. (2010),  and Yang et al. (2002), for Europe by Büntgen et al. (2011), Goosse et al. (2006), , Guiot et al. (2010) and Guiot (2012), for the Mediterranean region by Luterbacher et al. (2012), for the Tibetan Plateau by Yang et al. (2003) and for southern South America by Neukom et al. (2011). These studies generally agree on the occurrence of warmer conditions ca. 800-1300 AD and colder conditions ca. 1300-1900 AD, followed by a strong warming trend in the 20th century (Jansen et al., 2007). The earlier warm period is usually referred to as the Medieval Warm Period (MWP) or Medieval Climate Anomaly (MCA) (Bradley et al., 2003;Broecker, 2001;Diaz et al., 2011;Hughes and Diaz, 1994) whereas the later colder period is usually referred to as the Little Ice Age (LIA) (Grove, 1988;Juckes et al., 2007;Matthews and Briffa, 2005;National Research Council, 2006;Wanner et al., 2008. Related to this issue is the question of whether or not the current warmth has exceeded the level and geographic extent of the warmth in the last millennium. our understanding of plausible climate forcings. It has been suggested that only large-scale climate averages reflect a response to external forcing (Jansen et al., 2007) and recent studies of reconstructed global temperature patterns imply that a dynamic response of climate variability due to natural radiative forcing is detectable . At the same time, it has been argued that the use of too few noisy and poorly replicated proxies precludes a satisfactory assessment of spatial temperature anomalies, particularly in medieval times Broecker, 2001). Therefore, it is essential to refine our knowledge of the temporal evolution of spatial climate variability. We suggest this cannot be satisfactorily done without considering all the available proxy evidence.
Recent hemispheric-scale, temperature reconstructions over the past millennium, with two notable exceptions (Mann et al., 1999, have focused on reconstructing temperatures in the time domain only, an understandable consequence resulting from few and sparsely distributed highresolution proxies that can be calibrated directly against instrumental observations. The unique approach of Mann et al. (1999 attempts to overcome this problem by taking advantage of statistically determined spatial teleconnections between instrumental temperature fields and temperature, precipitation or drought sensitive proxy data. An example is the strong correlation between the moisture-sensitive tree-ring series in the American Southwest and sea surface temperatures in the tropical Pacific ENSO region (Wilson et al., 2010). This method relies heavily on the assumption that both the spatial and temporal relationships found between the modern (proxy vs. climate) measurements have remained constant through time and that these relationships are linear. Nevertheless, due to the method, the Mann et al. (2009) reconstructed medieval period is still based on relatively few, spatially well distributed, proxies. The numbers of Northern Hemisphere proxies, extending beyond 1000 AD, used in previous global scale multi-proxy temperature reconstructions are shown in Table 1. Arguably, a substantially denser proxy network should produce a more robust reconstruction. This can be done if one accepts proxies with lower temporal resolution and if the proxies used are constrained to be indicators of local temperature. However, the decision to include low-resolution proxies results in the loss of temporal detail (resolution) and the inability to produce a temperature calibrated reconstruction. We suggest these drawbacks are not detrimental to the exercise and in fact permit accurate descriptions of climate variability in both time and space on centennial time-scales.

Proxy data and method
Here, we present a new reconstruction of the spatio-temporal patterns of centennial temperature variability over the NH land areas for the last twelve centuries based on 120 proxy  Table 1 for details). Note, that a few pairs of proxies share the same geographical position but are based on different archives. Not depicted on this map is the Indo-Pacific Warm Pool sea sediment record located at latitude 3.53 • S and longitude 119.27 • W. records ( Fig. 1; Table A1). An extensive search of the literature for proxy records possessing annual to sub-centennial resolution covering at least the last millennium, and considered by their authors to be temperature sensitive, was conducted. The proxies are retrieved from a wide range of archives including, but not limited to, ice-cores, pollen, marine sediments, lake sediments, tree-rings, speleothems and historical documentary data (Table A1). We concede that each proxy type has its inherent strengths and weaknesses as a palaeo-thermometer. Numerous books and articles describe the use and interpretation of the proxy types used in this experiment. Therefore, we forego a lengthy discussion on climate proxies here and instead refer the reader to Bradley (1999) and Jones et al. (2009), and the references within, for a comprehensive overview of palaeoclimatology.
The data are also diverse not only in their type, resolution and location but also in the temperature signal they are reported to contain. Most high-latitude proxies primarily record summer temperatures while most low-latitude proxies primarily record annual mean temperatures. The midlatitude proxies may have either a summer or annual mean temperature signal. Only eight of the proxies used are purported to be expressions of winter temperature.
but delivers substantially larger spatial coverage, particularly, prior to ca. 1400 AD. Since many of the proxies used cannot be reliably calibrated into temperatures we use centennial mean anomalies normalized with respect to the 11th-19th centuries. This is the period fully covered by all 120 proxies.
For proxies sampled at time steps greater than one year a linear interpolation is used to produce an annually resolved time-series that is then smoothed with a 167-yr spline which has a frequency response similar to a 100-yr moving average. Fitting the spline to the interpolated values minimizes undesirable effects of the linear interpolation step, particularly for those proxies with few data points per century. The resulting splines are then passed through a 100-yr box filter, lagged 25 yr, producing a new time-series of 45 centennial means from 850 AD to 1950 AD for each proxy (Figs. A1-A2). The 45 centennial means from each proxy record are then normalized by their mean and standard deviation over the 11th to 19th centuries (1000 AD to 1899 AD). The twelve normalized centennial mean anomalies, located in the middle of each whole century (e.g., 850 AD, 950 AD, 1050 AD,. . . ,1950 AD), are used for the spatial comparisons in Figs. 2-3. The 45 normalized centennial means are used for producing the time-series plots in Fig. 4. See Appendix A for details.
The spatial-temporal evolution of anomalies is dynamically displayed in an 1101-yr animation from 850 AD to 1950 AD. At every proxy location an Akima spline (Akima, 1970) is fit to each proxy's 45 centennial mean values (raw and weighted) producing a smooth, centennial trend, interpolation with a time step of one year. The four animations produced, available as an electronic Supplement, are (i) the filtered spline values, (ii) the gridded, filtered spline values, (iii) the proxy-centred, weighted mean, filtered spline values, and (iv) the gridded, proxy-centred, weighted mean, filtered spline values. The first purpose of this exercise is to demonstrate how the weighted mean and gridding algorithms affect the transformation of the raw data. The Akima spline is very efficient in handling discontinuous time series data to produce continuous interpolations without inducing spurious wiggles because no parametric curve form is assumed and only the local data nodes are taken into account (Akima, 1970). Secondly, producing these 1101 slices of the spatial field permits one to examine the temporal stability of both proxy-local and proxy extra-local patterns produced by the analysis.

Weighted anisotropic averaging and gridding
The real spatial variability of centennial mean temperatures is certainly more coherent than the centennial mean anomalies shown in Fig. 2. We infer from Jones et al. (1997) that the global-mean correlation decay length, for unforced centennial temperature variability, is at least ∼2000 km and decreases from low to high latitudes. The correlation decay length is the distance at which spatial temperature correlations between meteorological stations, on average, falls to ≈0.37 (see Appendix A for more details). Due to the diversity of proxies used it is more relevant to look into how groups of neighbouring proxies behave than to focus on any individual record. This approach is not that dissimilar to the approach taken in the evaluation of Global Circulation Models by using their ensemble means (Annan and Hargreaves, 2011;Collins, 2007;Knutti et al., 2010;Masson and Knutti, 2011;Tebaldi and Knutti, 2007).
Compared to instrumental data (Appendix C), the proxy records contain more noise; therefore, spatial averaging of proxy anomalies is reasonable. To obtain a clearer view of the spatial patterns of temperature variability provided by the proxies we first applied a weighted averaging to the centennial mean anomalies, centred over each proxy location, for all 45 centennial means. A Gaussian weight function that decreases from 1, at the proxy node, to e −2 ≈ 0.14 at the search periphery was used to compute a weighted mean.
Proxy centred, weighted mean, centennial values are computed only if a proxy has two or more neighbours with data for the same century and those neighbours lie within a meridionally defined, anisotropic, search radius that decreases from 2000 km at the equator to 1000 km at the North Pole. Gridding of these spatially weighted, proxy-centred, centennial means was performed using a modified near neighbour gridding algorithm that requires at least 3 proxies within the search radius of each node of a 1 • × 1 • Cartesian grid over the Northern Hemisphere. The grid values are calculated from the weighted-mean centennial proxy values using the same Gaussian weight and anisotropic search functions described above. Though the oceans have been masked on the maps, coastal marine proxy records may contribute to the land area grid (see Appendix A for more details). The gridding procedure smoothes small-scale variations as seen in the individual proxies (in Fig. 2) and retains only those variations of the proxy means that are spatially distinct (Fig. 3).

Test of robustness
To test the robustness of the proxy data used and the observed spatial patterns they produce we undertook a number of experiments, like the one shown in Fig. 3, using different subsets of the proxies. The results from these experiments are provided in the accompanying supplement to this article. The five different experiments performed are: (i) excluding one proxy data type at a time (ii) using only those proxies that begin before 816 AD and end after 1984 AD (iii) using only proxies with 4 or more, and also with 10 or more, observations per century (iv) requiring that each proxy series used must have data coverage up to 1995 and (v) excluding the 43 proxy series that have either a negative correlation to the mean time-series of their proxy centred, within-searchradius, neighbours or less than two within-search-distance neighbours. No result from of these five experiments significantly changes our main observations regarding the spatiotemporal patterns of past temperature variability. The results of these experiments are only shown in the supplement to this article in Figs. S1-S13 with supporting text.

Results
The spatial and temporal patterns of centennial temperature proxy anomaly values at each proxy location, for the last twelve centuries, are illustrated in Fig. 2. In addition to the large-scale patterns that clearly emerge (a dominance of warm anomalies in the 9th-11th centuries, cold anomalies in the 17th century, warm again in the 20th century) there is notable small-scale spatial variability among the individual proxies.
Temperatures from the 9th to 12th centuries are generally above the long-term mean, gradually cooling to below the mean in the 16th to 19th centuries and reaching a maximum cooling in the 17th century. The 20th century warming raised the centennial mean back to a level comparable to that of the 9th to 11th centuries. The resulting maps (Fig. 3) reveal remarkable large-scale spatial coherency of warm and cold conditions over the NH land areas for the past twelve centuries. The dominance of warm anomalies during the MWP and cold anomalies during the LIA is substantiated by results from the sign test ( Fig. B1) that shows where and when Tree-rings (30) Ice-cores (10) Speleothems (8) Sea sediments (19) Lake sediments (28) Pollen (13) Other (6) Documentary ( there is significant agreement between the sign, positive or negative, of the proxies within their search radius (for more details see Appendix B).
The tests of robustness (see the accompanying supplement to this article) clearly reveal that when the spatial coverage decreases as proxies are excluded, the remaining spatial patterns of warm and cold anomalies are not substantially different from when all proxies are used. For example, the requirement that the proxies used must have data up to 1995 reduces the number of usable proxies to 34 yet, for the limited areas still covered, the overall patterns remain the same (compare Fig. 3 with Fig. S12). Together the various experiments indicate that the observed large-scale spatial patterns of reconstructed normalized temperature anomalies, as seen in Fig. 3, are a robust feature of NH temperature variability over the last twelve centuries. Such an approach to assessing robustness is only possible with a large number of proxy records. The averaged centennial mean anomalies and their block bootstrap confidence intervals (Wilks, 1997), expressed as ±2 standard errors, for subsets of proxies grouped by type, continent, latitude and seasonality of signal are presented in Fig. 4. Essentially, the same overall temporal trends, with the exception of those proxy groups that have insufficient 20th century data (mainly pollen and sea sediment records), are found.
Computing the rate of change within the last twelve centuries produces eleven maps of centennial first differences (Fig. 5). These maps show that the greatest rate of change over a widespread area was between the 19th and 20th centuries where strong warming is observed over nearly all areas with sufficient data. Comparable rates of warming between consecutive centuries are only seen for limited regions such as over Greenland from the 9th to the 10th century. The second largest geographical extensive warming between consecutive centuries occurs from the 17th to the 18th centuries when almost all of North America, and much of the eastern half of Asia, warmed. A cooling trend is seen for most regions between the 10th and 13th centuries. The most widespread cooling between two consecutive centuries is from the 16th to the 17th.

Discussion
The density of proxies is comparatively high over Europe, Greenland, China and parts of North America, implying that the observed patterns over those regions are the most robust. The coverage is sparse over interior Asia and nonexistent in North Africa and the Middle East. Consequently, these areas are either poorly replicated or left blank on the maps which is unfortunate as these are regions important to understanding teleconnection patterns in the climate system (e.g., El Niño/La Niña-Southern Oscillation and drought over southwestern North America, North Atlantic Oscillation and drought over China) (Graham et  2011). More temperature proxies are thus needed, particularly in the interior of Asia, the Middle East and northern Africa to firmly assess past climate variability. It is also essential to reconstruct climate patterns in the Southern Hemisphere (SH) and over the oceans in order to better understand the dynamics of internal variability and external forcings on global climate. This is presently difficult to achieve due to the scarcity of marine and SH proxy data (see, e.g., Neukom and Gergis, 2012). Our reconstructed spatial anomalies cannot directly be compared with the calibrated climate field reconstruction by Mann et al. (2009), but we observe that our reconstructed patterns are not in disagreement. It is worth noting that our anomaly differences in the 9th to 11th centuries looks very similar to the MWP-LIA difference in Mann et al. (2009) when the influence of their 1961-1990 baseline period is removed (Fig. 6).
Analyses of instrumental data (Brohan et al., 2006) shows that the last decade of the 20th century was much warmer than the 20th century mean nearly everywhere over NH land areas with sufficient data (Fig. C1). Moreover, the first decade of the 21st century was even warmer in most locations, thus, providing evidence that the long-term, largescale, NH warming that began in the 17th century and accelerated in the 20th century has continued unabated (see Appendix C for more details).
The warming from the 17th to the 20th century did not occur uniformly or simultaneously over all NH land regions (Figs. 3,5). Almost all of North America, western Europe and much of central and eastern Asia warmed from the 17th to the 18th century but not Greenland, eastern Europe and northwestern Asia. Notable cooling occurred from the 18th to 19th century in northern Europe and much of Asia except in the south to southwest. This cooling caused the 19th century to be the coldest over much of northwestern Eurasia. Only from the 19th to the 20th century is warming observed over nearly all areas. Notable changes between consecutive centuries are also observed before the 17th century but these are more characterised by variability within smaller regions and no clear large-scale spatial patterns emerge apart from the overall long-term cooling from the 10th to the 17th century.

Conclusions
A principal importance of this study is that it helps demonstrate that the science of paleoclimatology, particularly the collection and interpretation of proxy records, is capable of producing a body of evidence that can reveal many details of climate variability over time and space. Our results show, in a comparative manner, the degree to which the various proxy types can be used to assess regional temperature variability on centennial time-scales. We conclude that during the 9th to 11th centuries there was widespread NH warmth comparable in both geographic extent and level to that of the 20th century mean. Our study also reveals that the 17th century was dominated by widespread and coherently cold anomalies representing the culmination of the LIA. Understandably, the centennial resolution of this study precludes direct comparison of past warmth to that of the last few decades. However, our results show the rate of warming from the 19th to the 20th century is clearly the largest between any two consecutive centuries in the past 1200 yr.
It is clear that not all proxies from the same local need exhibit the same centennial signal to infer a robust, regional, climate pattern provided a sufficient number of proxies are available to compute a meaningful average. For the same reason it is also clear that the choice of proxies used does not significantly change the overall conclusions of this study. Even after removing a significant number of proxies within the various tests of robustness, the significant spatial patterns of warm and cold anomalies remain the same as when all 120 proxies are used. This implies that our results depicting the large-scale spatial-temporal patterns of warm and cold conditions, as revealed by using all available temperature sensitive proxy records, can be considered as a robust reconstruction of the thermal conditions over the Northern Hemisphere over the last 12 centuries.

A1 Proxy data
The peer-reviewed literature was systematically searched for all reported temperature proxy records spanning at least the 11th to 19th centuries and considered by their authors to be primarily a quantitative measure of local and/or regional temperature variability. Only records with at least two observations per century were considered. The large majority of raw data were either obtained from public databases (e.g., http: //www.ncdc.noaa.gov/paleo/ and http://www.pangaea.de/) or by direct request from their authors. Those data that could not be acquired in either of the aforementioned ways were obtained by digitizing the figures where the data were published. The longitude, latitude, proxy type, sample resolution, seasonality and original reference of all 120 proxy records used are given in Table A1. The location of all the different proxy records is given in Fig. 1.
The proxy data are divided into eight different categories: (1) Documentary, (2) Ice-core, (3) Lake sediments, (4) Pollen, (5) Sea sediments, (6) Speleothems, (7) Treerings, and (8) Other. All types of information from historical records used to reconstruct past temperatures are included in the category Documentary. The category Ice-core only includes δ 18 O ice-core records. In the category Lake sediments all archives from lakes and peat bogs, excluding any pollen records, are included. The Pollen category includes all pollen records regardless of whether the pollen is derived from lake sediments, peat layers, ice-cores, or sea sediments. Sea sediments include all sediment records that are stated to reflect sea surface temperature. The category Speleothems includes δ 18 O records and annual layer thickness from speleothems. The category Tree-rings includes tree-ring width and maximum latewood density (MXD) chronologies but not stable isotope records. Those proxies that did not fit into one of the above seven data categories were placed in the category "Other". This data category includes fossil wood remains, indicating changes in tree-line elevation, δ 13 C tree-ring records and a N 2 and Ar isotopic ice-core record.
For the purpose of simplification, we have collated the proxy data into three categories of seasonal temperature response: annual, winter, and summer temperature. Documented spring and early autumn temperature proxies are considered summer season records. Proxies expressing a late autumn season signal are included in the winter category. Records reflecting only spring or autumn temperature were so few that it was deemed inadequate to create separate categories for them. If no information on a proxy record's seasonality was available, we assumed the proxy to be an annual mean temperature record. We recognize that Greenland δ 18 O ice-core records, though stated to be a measure of annual mean temperature and used as such in this experiment, may actually be dominated by a winter temperature signal (Vinther et al., 2010).
In those cases where there exist multiple versions of a proxy record from the same site (e.g., the Torneträsk tree-ring record) the latest published version has been used. Whenever possible, preference was given to the highest resolution record available. If a tree-ring record exists both as a chronology of tree-ring widths and MXD we used the MXD record since this measure has stronger correlations to temperature (Briffa et al., 2002;D'Arrigo et al., 2009) and is generally reported as an integration of the whole growing season, whereas tree-ring width records primarily reflect conditions in the warmest months of the growing season (Tuovinen et al., 2009;Wilson et al., 2007).

A2 Centennial variability and normalization of proxy records
The proxies' observational sampling rates vary from annual to a minimum of two observations per century. Prior to fitting a 167-yr interpolative cubic smoothing spline, a frequency response equivalent to that of a 100-yr moving average, those proxies with other than annual resolution are converted to an annually resolved, time-series using simple linear interpolation. Once annually resolved the spline is fit to the interpolated data and every 25th spline value from the year 850 AD to 1950 AD is retained. These 45 spline values become a new time-series representing the average centennial temperature variability as expressed by the proxy. The 45 spline values are further normalized by their mean and standard deviation over a base period defined as the 11th to the 19th centuries (i.e., the mean and standard deviation for the 33 spline values at the time points 1050 AD, 1075 AD,. . . ,1850 AD). Twelve of the 45 centennial mean anomalies, those at the time points 850 AD, 950 AD,. . . ,1950 AD, representing the 9th to 20th centuries, are the centennial mean anomalies presented in the many maps throughout this experiment. Two examples that illustrate the pre-processing procedure are given in Figs. S1 and S2.

A3 Correlation decay length of centennial temperature variability and anisotropic search radii
In order to find an appropriate search distance for the spatial averaging, the sign test and producing maps of gridded anomalies we need to consider the correlation decay structure of centennial temperatures and the spatial density of the R x Fig. A5. Spatial gridding of centennial proxy anomalies on a 1 • × 1 • grid using a modified near-neighbour gridding algorithm. Capital R is the search radius from each grid node as computed by Eq. (A2) where lat is the latitude of the grid node. Lower case x is the great circle distance from the grid node to a proxy location. Provided there are 3 or more proxies within search distance R the grid node value is computed as the weighted average of the proxies centennial mean anomalies using weights defined by Eq. (A3). available proxy dataset. The correlation of temperature variability at different locations on the Earth's surface typically decreases with increasing distance between locations. This correlation decay may be expressed as a negative exponential equation of the form Here, r is the correlation between temperature variations at distinct locations, x is the distance between the locations, and x 0 is the characteristic correlation decay length (CDL). The rate at which the correlation decay takes place is dependent on the time scale of the variations; the correlation decays slower for longer than for shorter time-scales. The CDL also varies geographically and between seasons (Jones et al., 1997). For the current study it is useful to have some knowledge about the CDL of centennial temperature variability because this helps determine the size of geographic regions/areas within which climate can be assumed to behave similarly. If the proxies within a CDL-defined region contain a meaningful temperature signal one would expect, when it was anomalously cold (or warm), the majority of within-area proxies will respond similarly. Therefore, the mean temperature anomaly, calculated from all proxies within the CDLregion, should be a fair estimate of central tendency for that region.
The CDL-region should be small enough to ensure that the real centennial temperature (within-area) variability is preserved and large enough to capture a sufficiently large number of proxies for calculating meaningful areal averages. However, the regions should not be so large that spatial details of temperature variability across the hemisphere cannot be distinguished. Hence, the determination of the size of the region must be based on a judgment that takes into account both the spatial distribution and density of the available proxies and some knowledge about the correlation decay structure for centennial temperature variations.
Unfortunately the CDL for centennial mean temperatures is not well known, as it cannot be estimated directly from the comparatively short instrumental record. Hence, climate model simulations are needed to help obtain some estimates. Jones et al. (1997) studied global patterns of the CDL from both an instrumental observational dataset and in three climate model simulations at inter-annual and decadal time-scales. They also analysed the CDL on centennial time-scales from one model simulation. Their study reveals that the CDL for internal variability, seen in control simulations, is typically shorter than that seen for externally forced simulations and shorter than in the instrumental observations -which must be assumed to contain a certain amount of externally forced variability. Different climate models provide different CDL values. Hence it is not possible to uniquely determine the structure of CDL for centennial temperature variations directly. However, Table 1 in Jones et al. (1997) suggests that the global mean value of CDL for unforced decadal variability, based on the two models that apparently produced the most realistic results, is on the order of ∼2000 km for annual mean temperatures and ∼1500 km for summer temperatures (which is the season with the shortest CDL). Certainly, the global mean value of CDL for real centennial temperatures must be longer. Table 5 in Jones et al. (1997) suggests that it could be on the order of ∼75 % longer than that for decadal temperatures. The CDL can also vary geographically and is typically longer at the equator than at the pole. An average CDL for centennial temperature variability of at least ∼2000 to ∼1500 km is supported by the findings of Wirtz et al. (2010) who studied spatial patterns from 124 globally distributed climate proxy archives for the Holocene. To determine the size of regions within which centennial temperature variability can be expected to be rather strongly, positively, correlated, one should choose regions where the distance between the center and periphery, i.e., the search radius, is smaller than the CDL for centennial time-scales. Guided by the results in Jones et al. (1997) we conclude a flexible search radius of 2000 km at the equator, that is allowed to decrease linearly with latitude to 1000 km at the pole, is small enough to ensure that the mean centennial temperature variability at the search-centre should be positively correlated with most locations within the search radius. Such an anisotropic search function can be expressed mathematically as: Here, R is the radius of a circle centred on any proxy or latitude lat. in the NH, r min is the radius of a circle centred on the North Pole, and r max is the radius of a circle centred on the equator (Fig. A3). Such circles, with r min = 1000 and r max = 2000, are wide enough to capture a reasonably large number of proxies and small enough to ensure that large-scale spatial patterns in temperature variability can be distinguished and are thus used in our sign tests, spatial averaging and producing maps of gridded values as described below (note that if Eq. (A2) is used in the Southern Hemisphere, the latitude lat must be given with its absolute (positive) value). The relative weight given to each proxy decreases from 1 at the grid node to e −2 ≈ 0.14 at the search periphery (Fig. A4), following the Gaussian weight function: Here, weight is the weight given to a proxy value located at distance x from a grid or proxy node and R is the radius of the search circle defined by (Eq. A2). The Gaussian weight function is chosen because the Gaussian filter is frequently used as a low-pass filter for noise suppression both in time-series analysis and image processing (Wessel and Smith, 1998). In our notation, the quantity R/2 corresponds to what is usually referred to as the standard deviation or the scale. In our application the scale varies between 1000 km at the equator and 500 km at the pole. What is important is that the weights decay from large values at the grid node to small values at the search periphery. This is well achieved by Eq. (A3).

A4 Anisotropic spatial smoothing
The same search and weight functions are also used for calculating weighted means of neighbouring proxy anomalies where an anisotropic search is centred over the location of each proxy as opposed to the nodes of a Cartesian coordinate system. A weighted mean of centennial mean anomalies for a proxy location is performed if there are two or more neighbouring proxies (within the search distance) and all proxies, including the center proxy, possess a value for the century being considered. Thus, the minimum number of proxies contributing to any weighted-mean centennial anomaly is three. Using these criteria the maximum number of proxies contributing to a single weighted-mean centennial anomaly, given the length and spatial distribution of the data used in this experiment, is twenty (Fig. A6).

A5 Anisotropic spatial gridding
The gridding of proxy data over a polar projection of the NH is done using a modification of the near-neighbour algorithm. We employ an anisotropic search radius (Eq. A2) to compute the values at each node of a 1 • × 1 • grid covering the hemisphere. Figure A5 illustrates the procedure; all centennial proxy anomaly values within the search radius from each grid node contribute to a weighted mean assigned to the node's location if there are two or more node-local proxies. The weights used are defined by Eq. (A3).

Appendix B
The sign test -a simple robust anomaly test Figure 1B presents results of sign tests (Arbuthnott, 1710) showing the degree of spatial agreement, for each of the 12 centuries considered, of the signs of the anomalies among neighbouring proxies within an anisotropic search radius that decreases from 2000 km at the equator to 1000 km at the pole. The null hypothesis is that all the local proxy anomalies located within a given search circle, centred over each proxy, are equally likely to be positive as negative. If this hypothesis is true, then a strong majority in either direction is unlikely. Hence when such a majority is observed we reject the null hypothesis and conclude that the observed agreement between the proxy anomalies indicates the presence of a signal in this direction.
Using the significance level 5 % and a normal approximation one finds that the number of agreeing anomalies needed is n/2 + √ n, or to put it differently, the number of disagreeing anomalies can be at most n/2-√ n. One of the assumptions underlying the sign test is that the observations are independent, which is difficult to verify in the present situation. For this reason the sign test should be viewed as a simple robust method for deciding which anomalies show reasonable agreement with their neighbours (Table B1).
The regional sign tests strengthen the overall impressions from Fig. 3, the widespread agreement of positive anomalies in the 10th century and negative anomalies in the 17th century. However, in the 20th century there is notably less widespread agreement on the sign of anomalies. In particular, proxies from land areas in and surrounding the North Atlantic region and western Asia do not agree that the last century, as a whole, was warmer than the 11th-19th century average. The lack of agreement on the sign in the 20th  Table B1. The maximum number of permissible disagreeing proxies (d) for a given number of total proxies (n) found within a search radius to pass the sign test. century does not necessarily mean that the proxies fail to capture the thermal state of the climate in the last century: it could be that the proxy values are sufficiently close to the mean over the nine-century long baseline period for a substantial number of them to end up on either side of the baseline period mean. However, not all proxy records that are used for the 20th century analysis have data that completely cover the last 15 yr (1985( -1999. This period is known to have been warmer than the mean of the last century (Fig. C1). If all the proxy records had data up to the end of the last century, more widespread agreement of positive anomalies would be expected.

Spatial patterns of decadal mean temperatures in gridded instrumental observations
To obtain a visual comparison between the spatio-temporal patterns of NH centennial temperatures seen in the proxy data for the last twelve centuries and the instrumentally observed NH temperatures we plot, in a similar manner as in Fig. 2, the decadal means of the 5 • × 5 • grid box temperatures from the HadCRUT3 dataset (Brohan et al., 2006). Figure C1 shows, for each of the last twelve decades, the temperature anomalies (in • C) for each grid box expressed as deviations from the 1900-1999 mean. Grid boxes located over ocean areas are masked for the sake of for all NH land grid boxes in the HadCRUT3 data set (Brohan et al., 2006) having at least 4 80% complete monthly data. The labels 1890s and 1900s etc., denote the mean for the period 5 1890-1899 and 1900-1910 etc. Grid boxes over ocean areas are masked. 6 Fig. C1. Maps of decadal mean temperature anomalies (in • C) from the 1900-1999 mean, for all NH land grid boxes in the HadCRUT3 dataset (Brohan et al., 2006) having at least 80 % complete monthly data. The labels 1890s and 1900s etc., denote the mean for the period 1890-1899 and 1900-1910 etc., grid boxes over ocean areas are masked.
comparison. The decadal deviation is calculated and plotted wherever a grid box has 80 % or more monthly data in the period 1900-1999 and 80 % or more monthly data in the decade in question.
A widespread NH warming since the late 19th century is clearly illustrated in the maps. The regions with sufficient data show that the 1890s to 1910s were colder than the 20th century mean and that the 1990s was the warmest decade in the last century. The first decade in the 21st century was more than 1 • C above the 20th century mean. At a few locations temperatures in the last decade were colder than the century mean. These are located in southern Greenland and North America. A well-documented early warm period is seen in the 1930s and 1940s (Callendar, 1938;Delworth and Knutson, 2000;LiJuan et al., 2007;Tett et al., 2002), but the warmth in that period was not as geographically widespread as the post-1990 warmth (Brohan et al., 2006). The last decade (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) was the warmest observed decade in the NH land areas and also the decade with the most widespread warmth.
In Fig. C1 the spatial coherency of the instrumental decadal temperatures is clearly stronger than the proxy-based centennial temperature anomalies in Fig. 2. Because the spatial coherence is expected to increase with increasing timescales this comparison reveals that the proxy series exhibit a substantial amount of noise which motivates the use of spatial averaging of proxy anomalies. Figure C1 shows us that the areas with poor coverage of instrumental temperature observations are often the same areas as those where proxy data are lacking. Consequently, even if new proxy series are retrieved from areas currently devoid of proxy information, it will still be difficult to calibrate them.