Detecting instabilities in tree-ring proxy calibration

Evidence has been found for reduced sensitivity of tree growth to temperature in a number of forests at high northern latitudes and alpine locations. Furthermore, at some of these sites, emergent subpopulations of trees show negative growth trends with rising temperature. These findings are typically referred to as the “Divergence Problem” (DP). Given the high relevance of paleoclimatic reconstructions for policy-related studies, it is important for dendrochronologists to address this issue of potential model uncertainties associated with the DP. Here we address this issue by proposing a calibration technique, termed “stochastic response function” (SRF), which allows the presence or absence of any instabilities in growth response of trees (or any other climate proxy) to their calibration target to be visualized and detected. Since this framework estimates confidence limits and subsequently provides statistical significance tests, the approach is also very well suited for proxy screening prior to the generation of a climate-reconstruction network. Two examples of tree growth/climate relationships are provided, one from the North American Arctic treeline and the other from the upper treeline in the European Alps. Instabilities were found to be present where stabilities were reported in the literature, and vice versa, stabilities were found where instabilities were reported. We advise to apply SRFs in future proxy-screening schemes, next to the use of correlations and RE/CE statistics. It will improve the strength of reconstruction hindcasts. Correspondence to: H. Visser (hans.visser@pbl.nl)


Introduction
Evidence for reduced sensitivity of tree growth to temperature has been reported for several forest sites along highnorthern latitudes and from some alpine locations. This phenomenon appears to reflect the inability of certain tree-ring width and maximum latewood density chronologies from initially temperature-limited sites to track the warming trends seen in instrumental measurements from some northern locations since around the mid-20th century. A related phenomenon is that some formerly temperature sensitive trees may be losing their ability to reflect high-frequency climate signals in some boreal and alpine forests (e.g., Wilmking et al., 2005;Pisaric et al., 2007;Zhang et al., 2009;Büntgen et al., 2006Büntgen et al., , 2008Büntgen et al., , 2009). These observations have been described as the "Divergence Problem" (DP), which has been discussed in a number of recently published articles (e.g., Jansen et al., 2007;Wilson et al., 2007;D'Arrigo et al., 2008;Wilmking and Singh, 2008;Büntgen et al., 2009;Esper and Frank, 2009;Loehle, 2009). DP has not only led to discussions within the scientific community, but recently also to considerable controversy within the public arena (the "CRU affair", "climategate", the "trick"). See Schiermeier (2010, p. 286-287) for a summary.
If the DP is in fact a real and widespread phenomenon, co-occurring with global warming as well as anthropogenicinduced changes of atmospheric composition, it would potentially put into question the overall ability of tree ringbased temperature reconstructions to capture earlier periods of putative warmth, such as the so-called Medieval Warm Period (MWP), and subsequently to model possible future reactions of forest ecosystems in a warming world. Clearly, if some proxies show unstable relationships with a particular target variable, the validity of the large-scale reconstructions Published by Copernicus Publications on behalf of the European Geosciences Union.

368
H. Visser et al.: Detecting instabilities in tree-ring proxy calibration may be compromised. Error may be magnified for reconstructions representing smaller spatial scales.
In both cases, the Uniformitarian principle -"the present is the key to the past" -would no longer be valid. This principle states that physical and biological processes which link today's environment with today's variations in tree growth, must have been in operation in the past (Fritts, 1976, p. 14-15). Thus, if certain tree-ring proxies exhibit non-stable relations to climate in recent decades, we could expect such relations during the MWP as well, thus conforming with the Uniformitarian principle.
In general, given the high relevance of climate reconstructions for policy-related studies (e.g., National Research Council, 2006;Jansen et al., 2007), it is important for paleoclimatologists to address this issue of potential model uncertainties associated with the DP (Visser and Petersen, 2009;Schiermeier, 2010). Here, we address this issue of uncertainties and non-stable relations by proposing a calibration technique, termed stochastic response functions (SRFs), that can be readily used to trace instabilities in proxy/climate relationships. Since SRFs are one means of detecting and visualizing such instabilities, they are also potentially useful as a screening tool for judging the acceptability of particular proxies in a given reconstruction-network approach. While our focus is on the tree-ring proxy, the SRF method could be applied to the analysis of other proxies as well, such as documentary archives (Dobrovolný et al., 2010) or sediment cores (Oppo et al., 2009).
First, we describe the model underlying SRFs along with specific tests for residuals and model assumptions, with statistical details given in Appendix A. Second, the utility of SRFs as a screening tool for individual proxies is assessed. Third, we apply the SRF approach to two recent dendroclimatic examples in northern interior Canada (D'Arrigo et al., 2009), and the European Alps (Büntgen et al., 2008). Our discussion focuses on (i) the appropriateness of truncation of calibration periods in order to omit the period of instability, and (ii) potential implications for use of SRFs as a screening tool.

Stochastic response functions
Calibration and validation methodologies have been well described within the field of dendroclimatology (Fritts, 1976;Cook and Kairiukstis, 1990). We propose a modification of the traditional calibration approach (Appendix, model (4)). This modification is used to generate an explicit visualization of model instability. It is based on the application of a sub-model from the class of so-called structural time series models (STMs) and denoted hereafter as the stochastic response function model: Here, the "intercept" µ, which is traditionally a constant, is replaced by a slowly bending trend model µ t , the integrated random walk (IRW) model. The constant response weight α has been replaced by a stochastic counterpart α t , based on random walk models for individual climate variables. Both trend and response weights are estimated using the discrete Kalman filter (Harvey, 1989;Durbin and Koopman, 2001). The filter is ideal in the sense that it yields the minimum mean square error estimates (MMSE, normally distributed noise processes) for µ t and α t along with maximum likelihood estimates for unknown noise variances. One of the attractive properties of model (1) is that the traditional multiple regression model is just a special case (models (A2a) and (A2b) in Appendix A). By using maximum likelihood (ML) estimation we also let the model "choose" between being constant or time-varying in nature. Mathematical details of model (1) are given in the Appendix. Applications of SRFs are not new in dendroclimatology (Visser, 1986;Molenaar, 1988, 1992;Van Deusen, 1990;van den Brakel and Visser, 1996). However, the application of STMs for reconstruction studies has been quite limited. Only one other sub-model from STMs has recently been applied in dendroclimatology, using PRECON software (Cook and D'Arrigo, 2002;Rozas, 2005;D'Arrigo et al., 2009;Wilson et al., 2010). Here, Wilson et al. show an interesting application of STMs other than proxy calibration. They explored the coherence between four ENSO reconstructions (period 1550-2000, cf. their Fig. 6).
The STM applied in latter four references is equivalent to model (1), but with the omission of the trend component µ t . One concern regarding this "model without intercept" is that the actual presence of an intercept in certain sub-periods of the calibration period will force response estimatesα to show time-dependent behavior. Therefore, the "model without intercept" is best avoided.

Discussion of model (1)
Four main points should be considered when using the model (1) for the analysis of time-series in palaeoclimatology: First, two other methods in dendroclimatology need to be mentioned which address the issue of model (in)stabilities, in addition to the SRF model. The first method is that of moving response functions (MRFs) (Biondi and Waikul, 2004;Oberhuber et al., 2008). Here, the model (A1) is applied to a number of sub-periods of the calibration period. The length of these sub-periods is equal and fixed, and called the "window". In fact, MRFs can be seen as a special case of SRFs where the IRW trend model µ t is replaced by a random walk (see Appendix). The second method is that of moving correlations, which also utilizes a window (Biondi and Waikul, 2004;Carrer et al., 2007;Wilmking and Myers Smith, 2008;Zhang et al., 2009). The SRF counterpart is the use of model (1) with only one explanatory variable X t .
While the moving correlations method is attractive in its simplicity, care should be taken when tree growth and climate variables interact. For example, if one has estimated a model I t = µ + α 1 · X 1,t +ε t , then the value of α may change considerably if the model is extended with some variable X 2,t due to the correlation between X 1,t and X 2,t ("multicollinearity"). We note that there exist different opinions on the use of correlations versus response functions (Blasing et al., 1984).
Second, model (1) assumes a linear relationship between the proxy and the climate target. For the application of nonlinear models, e.g. those based on neural networks, we refer to Woodhouse (1999) and Carrer and Urbinati (2001). Another recently described method is based on visual inspection, where both I t and X t are scaled to zero mean and unit variance over some predefined period and then plotted and analyzed (e.g., Esper et al., 2010). Loehle (2009) discusses the special case where the correspondence between I t and X t is non-linear, either within the calibration period or outside of it. It is this latter case that we address below. Third, in model (1) the proxy is the dependent variable and climate variables serve as independent variables. After having estimated the response function and having selected the relevant independent variable(s), a linear transfer function model can be estimated to generate the actual reconstruction. The selected climate variables then serve as dependent variables and the chronology I t (or a group of tree-ring chronologies I t or any other set of proxies) serves as the independent variable. The weights α may be scalar, a vector or a matrix (Cook et al., 1994).
Fourth, the climate variables X t in model (1) are assumed to be local instrumental (station) data. Some reconstruction studies base their calibration inferences on climate temperatures deduced from globally gridded temperature or precipitation data, downscaled to the locations of interest, as in Mann et al. (2008, SI). Since such climate variables X t may contain considerable uncertainties, the technique of errorsin-variable (EIV) regression is applied (also denoted as total least square regression), as in Hegerl et al. (2007) and Ammann et al. (2010). This situation is not covered by model (1) since uncertainty is only assumed in I t .
We note that gridded target variables are model-based constructs, and form uncertain substitutes for actual measurements. Hindcasts in the reconstruction period are thus predictions for this constructed target and can significantly deviate from the real historic values of this target at the location of the proxy (this is also true for station data as the proxies are typically at remote locations, often hundreds of kilometers from the closest station). Therefore, such target hindcasts should be interpreted with care.

Diagnostic checks, climate envelope
To test for the validity of model (1), a number of diagnostic checks can be used: homoscedasticity of residuals.
tests on residuals for structural breaks, based on the cumulative sum (CUSUM) of residuals.
scatterplots of residuals against individual explanatory variables. If some non-linear relationship exists between the proxy and X t variables, a systematic deviation from a horizontal straight line will be seen.
For more details the reader is referred to Harvey (1989, Ch. 5.4) and Van Deusen (1990). The latter mentioned scatterplots of residuals are used for checking the assumption of a linear calibration relationship rather than a non-linear relationship of the form The importance of this test has been stressed by Loehle (2009) within the context of dendroclimatic reconstructions.
In addition to evaluation of scatterplots, we propose a second check for linearity, in which model (1) can be estimated with the extension of a quadratic term: Coefficientsα t andβ t can be tested for statistical significance. The quadratic model can be useful for describing the concept of a climatic threshold or tipping point, as discussed in Wilmking et al. (2004) andD'Arrigo et al. (2004). The idea of modeling thresholds is based on the well-known law of the minimum, initially formulated by Carl Sprengel in the 19th century. This law states that linearity may exist over "normal" values of X t , but may level off (or even become reversed in sign) when values of X t become extreme.
As long as such non-linearities exist within the calibration period, tests will show them to be present. However, as suggested by Loehle (2009), non-linearities may also appear in extreme cases where explanatory variables X t attain values not found during the calibration period. If this is the case, reconstructions will be less accurate during such times. Therefore, it may be advantageous to identify a climate envelope [X min , X max ] from the calibration data and discuss the occurrence of X t hindcasts which fall outside this envelope (cf. Fritts, 1976 p. 15). We will return to this point in Sect. 4.1.

Proxy screening
Having estimated µ t and α t , their patterns over time can be used for screening purposes. There are three ways to draw conclusions on model (in)stabilities. The first is simply a visual inspection of the graphs of µ t and α t over time. The second method is of a statistical nature and uses estimates of trend differences [µ t − µ s ] and response weight differences [α t − α s ], along with their corresponding (2-σ ) confidence limits. These confidence limits follow from the Kalman filter equations (these limits are in fact a unique "selling point" for this filter). Here, the symbols µ and α come from model (1) and the indices "t" and "s" refer to two arbitrary years within the calibration period.
A third method is the Likelihood Ratio (LR) test statistic, which is based on maximized and non-maximized noise variances as proposed by Harvey (1989, p. 236). Here, a χ 1 distribution is used with a 2-α significance level rather than a 1-α significance level for a test of size α (the χ 1 threshold is 2.7 for a test with α=0.05). Note that the symbol α used here should not be confused by its use in model (1).
A screening procedure for individual proxies could then be formulated as follows: 1. plot both µ t and α t patterns over time along with 2-σ confidence limits.
3. evaluate these patterns for their biological relevance.
4. as a further confirmation of the constancy of the patterns, one can use the LR test, which measures whether noise variances are zero (H 0 hypothesis) or positive (H 1 hypothesis).

D'Arrigo et al. (2009) presented white spruce (Picea glauca) data for two latitudinal treeline sites in northern interior
Canada: one along the Coppermine River in the Northwest Territories and the other in the Thelon River Sanctuary, Nunavut. Both ring-width (TRW) and density (MXD) chronologies were generated for these two sites. Individual tree measurements were detrended using negative exponential or straight-line curve fits. Local meteorological data were obtained from the closest adjacent stations, for Coppermine over the period 1933-2003 (Coppermine station), and for the Thelon site (Baker Lake station) over the period 1950-2002. These GHCN data underwent rigorous quality assurance reviews. The calibration/validation technique applied was that of splitting the calibration period in two parts and calculating/comparing correlation coefficients over both time periods (their Figs. 4 and 5). Target variables were summer temperatures for various seasons (JJ, JA and AMJJA). We note D'Arrigo et al. did not explicitly generate climatic reconstructions in their study We have applied the SRF approach to the TRW and MXD chronologies for the Coppermine and Thelon sites. Our results indicated that two of the four chronologies showed stable responses using the SRF method: the TRW series for Coppermine and the MXD series for the Thelon (Table 1). Bothα t andμ t appear to be constant. Note that the series are different with regards to level of explained variance: 27% and 39%, and with regards to length of the calibration period: 1933-2003 versus 1950-2002. The explained variance as used herein is defined as In other words, the explained variance is a measure for the explanatory "power" of adding X t to model (1). Trend and response values are based on the smoothed Kalman filter estimates.
We checked the residuals, or innovations in Kalman-filter terms, of the four models for their statistical properties. These innovations should follow a white noise process, i.e. no serial correlations, and should preferable follow a normal distribution (Appendix A). The whiteness results were satisfactory for all four models (based on autocorrelation functions with lags up to 20 years and a log-plot for visual inspection of first-order correlations). Normality was perfect for second and third model in Table 1, reasonable for the fourth model and moderate for the first model (based on visual inspection of normality plots). Our overall judgment of the four innovation series is that they satisfy the necessary condition of whiteness and reasonably satisfy the (not necessary) condition of normality.
As an example we have plotted the MXD estimates for Coppermine in Fig. 1. The trendμ t shows a clearly timedependent behavior (green line in upper panel). The lower panel shows that the trend estimate in the final year 2003 is significantly higher than for the period 1957-1995 (2-σ confidence limits). The response weightα t is time-varying (significance tested by LR test) and shows a steady decrease from ∼0.90 in 1933 to ∼0.50 in 2003 1 . A second example is given in Fig. 2 where the Thelon TRW chronology is analysed. Model estimates show a constant weighing factor (α=0.47±0.22), along with a statistical significant decreasing linear trend. Büntgen et al. (2008) presented a network of 124 larch (Larix decidua Mill.) and spruce (Picea abies Karst.) TRW chronologies across the European Alpine arc. Two Alpine 1 We make a short note here for the interpretation of trend and weight differences, as shown in Fig. 1. Looking at the middle panel one would judge that changes [α t − α s ] will be statistically nonsignificant: the upper bound in 2003 does not exceed the lower bound in the year 1933. The same holds for the trend pattern shown in the upper panel: the changes seem small, relative to the high variability. However, the lower panel shows that trend differences [µ 2003 −µ t ] are significant over the period 1957-1995. There seems to be a contradiction here.

Tree growth in the European Alps
The explanation is as follows. The variance of the difference of any two stochastic variables X and Y is only equal to the sum of the respective variances if X and Y are mutually uncorrelated. If not: var(X − Y )=var(X)+var(Y )−2·cov(X,Y ). In other words, suppose that the variables X and Y show a small difference while their respective standard deviations are large. Then, if X and Y are highly correlated, their difference [X − Y ] may still be statistically significant due to the covariance term. In the extreme case of 100% correlation we have that var(X−Y )=0 and any difference, however small, will be significant. Clearly, weights α t and α s are highly correlated, and the same holds for any trend values µ t and µ s . mean chronologies of 40 larch and 24 spruce sites were selected based on their correlation with early  instrumental temperatures to assess their ability of tracking recent  summer warming. The larch and spruce TRW datasets were standardized in two ways: using 300 yr cubic smoothing splines and using the Regional Curve Standardization (RCS) approach (e.g. Esper et al., 2002). Thus, four index chronology proxies are obtained. Meteorological data consisted of a homogenized long-term (1864-2003) mean of 13 instrumental stations, located >1500 m a.s.l. (Auer et al., 2007). The calibration/validation technique was based on regression and scaling models where the calibration period was split up in two parts. For these periods the R 2 , RE, CE and DW statistics were calculated, using the June-July temperature as the target variable (see Appendix for explanation). Büntgen et al. used the RCS-standardized larch series to reconstruct June-July temperatures up to the year 1000 AD.
We have applied the SRF model to all four chronologies to trace instabilities in their response to climate, where the results for larch are identical since both standardization techniques yielded index chronologies with only marginal differences ( Table 2). The explained variance is quite high in all four cases, ranging from 43% to 50% over almost 150 years. The SRF method, however, indicates that none of the four chronologies has both a stableα t and a stableμ t estimate when using the full calibration period 1864-2003.
We checked the innovation series of the four models presented in Table 2. The whiteness results were satisfactory for the first and second model in the table. The third and fourth model showed a small but significant first-order correlation (R=0.22 and 0.25, resp.). Normality was perfect for the fourth model in Table 1, reasonable for the first and second model, and moderate for the third model. Our overall judgment of the four innovation series is that they reasonably satisfy the necessary condition of whiteness and reasonably satisfy the (not necessary) condition of normality.
The first case from Table 2, that of Larch, detrended by RCS, is given in Fig. 3. The trend estimate (green line, upper panel) shows non-stationary behavior at the record's end, from 1970-2003. The lower panel shows that the trend estimate in the final year (µ 2003 ) is significant larger than those for 1864-1917 and 1937-1995 (2-σ confidence limits). The response weightα t has a slightly decreasing pattern, which the LR test showed to be non-significant.

Discussion
We discuss two topics in this section: (i) the omission of data (truncation) over the final decades of a chronology, one method which has been used to deal with divergencetype phenomena; and (ii) potential implications of SRFs for screening procedures of proxy data.

Omission of data over recent decades
One solution for resolving instabilities towards the end of a calibration period is simply to omit all data after the year in which these instabilities emerge. If validation statistics show a stable response and intercept over the remaining period, a proxy might still be used for reconstruction purposes (e.g., Briffa et al., 2001;Cook et al., 2004;Wilson and Elling, 2004;Mann et al., 2008, SI). Esper et al. (2010) performed a sensitivity study in which the role of the calibration period was explicitly taken into account (amongst other factors such as standardization technique and instrumental data corrections). However, their study deviates from our approach herein in that no regression technique was applied. One potential application of the SRF method in this context is that we can accurately follow the patterns ofα i,t and µ t to determine if it might be advantageous to confine the response analysis to a shorter calibration period. As an illustration, we re-estimated the MXD series for Coppermine, shown in Fig. 1, choosing the shorter calibration period . The results reveal a more variable response weight than shown in the middle panel of Fig. 1. Thus, the omission of recent data does not improve the stability of the climate-tree growth model in this case. We also reestimated the larch example shown in Fig. 3. Theα t andμ t patterns suggest that a re-estimation over the period 1864-1950 yield values that are in fact stable. The estimation results are shown in Fig. 4. There appears to be a small linear trend, which is statistically non-significant. The response weight appears to be constant:α=0.70±0.15. Therefore, this re-analysis shows that the RCS-detrended larch chronology shows a stable response over a shorter calibration period. Taken together, the truncation of calibration periods seems to be a good means of retaining proxies which would not otherwise pass a screening procedure. However, one could argue that if a loss of proxy-target sensitivity occurs in recent decades, it could also have occurred in the past. A notable example is during the MWP, when temperatures have been reconstructed for some regions to be comparable to those of the 20th century (Jones et al., 2009). This argument is in line with the Uniformitarian principle as formulated in the Introduction. The principle implies that the same kinds of limiting conditions affected the same kinds of processes in the same ways in the past as in the present; only the frequencies, intensities, and localities of the limiting conditions affecting growth may have changed (Fritts, 1976). Loehle (2009, p. 241) comes to a similar conclusion, using a mathematical approach: "if a reconstruction already shows divergence, it is an indication that recent temperature are already in the non-linear zone; such reconstructions should not be used for evaluating past climates." Loehle also suggests an argument in favor of omitting data over recent decades. If we could address the cause of recent instabilities to an external factor, only recently in operation (air pollution, soil acidification, etc.), we could truncate the calibration period. However, this introduces a new problem: how could we uniquely attribute instabilities to such (anthropogenic) drivers? E.g., in case of the example shown in Figs. 3 and 4 we do not have such a unique clue.
In conclusion, we feel that the omission of data over recent decades is not a sufficient means to accept a specific chronology for use in a reconstruction network.

SRFs and screening of proxy data
Since SRFs can potentially be used to detect pattern (in)stabilities, along with statistical significance testing, it is interesting to re-evaluate the two examples from the preceding Section. Certainly, any two calibration/validation approaches can show different results for specific proxies. For instance, one may split a calibration period into two parts of equal size and calculate correlation coefficients between the 26 1 Fig. 4 Larch data, standardized by RCS, 1864RCS, -1950 In this example we re-estimated the results 2 shown in Fig. 3. The response weight appears to be constant: α = 0.70 ± 0.15. The explained variance 3 is 55%. The slightly increasing linear trend is statistically non-significant.  , 1864-1950. In this example we re-estimated the results shown in Fig. 3. The response weight appears to be constant:α=0.70±0.15. The explained variance is 55%. The slightly increasing linear trend is statistically nonsignificant.
proxy and target variable over both periods. Supposing that both R 2 values are considered reasonably strong, say above 0.44 (as in Table 1 of Büntgen et al., 2009, for larch and splines) one might conclude that the results are satisfactory. In another approach, CE/RE values are calculated over the same two periods. Negative CE values might cause one to reject the proxy. This marked difference is explained from the fact that well-fitting models, leading to high R 2 values, might have a poor prediction performance on independent data (cf. National Research Council, 2006 - Fig. 9-3). This methodological uncertainty is real since both methods ( Tables 1 and 2? D'Arrigo et al. use a split calibration period and calculate correlation coefficients over both periods. In three out of four cases, a strong decrease in correlations in the most recent period occurs. Their conclusion is that the recent loss of temperature sensitivity varies with tree parameter (TRW or MXD) and site studied. This sensitivity, as well as uncertainties in tree-ring data standardization (Esper et al., 2009) and possibly in instrumental data, needs to be addressed in future efforts, as they state. Our findings, summarized in Table 1, show slightly different results. Two out of four chronologies show constant behavior, both forα t andμ t (MXD Thelon and TRW Coppermine). Thus, these two chronologies pass the SRF screening test. However, three additional remarks have to be made. First, corresponding explained variances are moderate to low: 39% and 27%, respectively. Second, the calibration periods are relatively short (53 years for Thelon, 71 years for Coppermine). Third, non-homogeneities in the instrumental data may play a role, despite the GHCN tests. Therefore, we come to the same conclusion as D' Arrigo et al. (2009) that some of these series have potential losses in sensitivity to climate.
Büntgen et al. split the calibration period in two and calculated R 2 values as well as RE and CE values for each of the four chronologies (two standardization techniques and two tree species). They find CE values around zero for both spline-standardized series, a finding which is consistent with the findings presented herein in Table 2. Thus, the spline detrended series do not perform well in either approach. For the RCS standardized series, much better CE values are found, between 0.25 and 0.29, guaranteeing prediction skill. Squared correlation coefficients over sub-periods vary between 0.18 and 0.55. Büntgen et al. conclude that there is no indication of a DP for either species after scaling the RCS chronologies, although larch generally tracks summer temperature better than spruce (lower R 2 values). Our findings here show time-varying behavior for the RCS chronologies as well. After shortening the calibration period (data after 1950 omitted) we find stable responses for the RCS-standardized larch chronology. Unfortunately, this shortening does not guarantee stability in the past, as discussed in Sect. 4.1.
Clearly, these two examples, covering three screening techniques, are not sufficient for a full evaluation of methods. Further, any comparison of any three approaches will reveal results which differ in some detail. With that respect the analysis shown here is certainly incomplete and more research is needed.

Conclusions
Peer-reviewed literature has described unstable relationships between tree-ring proxies and climate target variables for recent decades at some northern and higher elevation sites, often denoted by the term "divergence". Within this context we have drawn attention to a statistical approach which is ideally suited for tracing instabilities in proxy-target calibration, that of stochastic response functions. This method is also well suited for performing proxy data screening prior to network aggregation. Our inferences in this study have mainly focused on methods of calibration and screening within the field of dendroclimatology. However, the proposed technique is potentially applicable to the calibration of other proxies besides tree rings. Furthermore, the (non-)equivalence of independent reconstructions can be explored as shown by Wilson et al. (2010).
We draw the following conclusions: 1. Stochastic response functions are useful for providing insight when instabilities are expected to occur in generation of climatic reconstructions. Changes in response and trend can be readily observed from year to year.
No specific time window is needed, in contrast with the MRF or moving correlations approach. Furthermore, the use of the Kalman filter allows us to test for timedependent changes. It should be noted that the time stability of the intercept and response weights are of equal importance (in the literature the (in)stability of the first term is never mentioned explicitly).
2. It is advisable to define a climate envelope and determine whether hindcasts fall outside this envelope over the length of the climatic reconstruction. Tests for linearity are also important.
3. Stochastic response functions are ideally suited to localize instabilities over time. However, if these instabilities occur in recent decades ("divergence") and if the cause of these instabilities can not be traced/attributed to drivers which are only in operation during these recent decades, the omission of recent decades in the calibration period is not a valid means of generating an unbiased reconstruction network.

4.
Two examples have been discussed as illustrations of the potential application of stochastic response functions to climatic reconstructions. For both examples, we find screening results that are only partly comparable to those found using other methods of validation (R 2 , RE, CE). It is unclear if the stochastic response methodology would filter out more proxies than these traditional methods. Clearly, much more analysis is necessary to evaluate the various screening methods.
5. Given the policy relevance of paleoclimatic reconstructions, dendrochronologists are especially advised to be more explicit about dealing with uncertainty in proxy screening.

Response function models in detail
Calibration of a tree growth/climate model is typically based on a linear regression relationship between a proxy I t and some climate variable(s) X t : Model (A1) can be termed a response function model, where the vector α with constant regression weights is the response function. The parameter µ is the intercept and is often omitted from the model as both I t and X t are normalized to zero mean and unit variance. If the relationship between I t and X t are multiplicative rather than linear, a log transformation can be applied. The vector X t may consist of the original climate variables or principal components derived from these variables. The reader is referred to Cook and Kairiukstis (1990) and Hughes (2002) for more details.
The suitability of model (A1) is typically tested by splitting the calibration period in two or three sub-periods of generally equal length. Model (A1) is then calibrated for one sub-period and used to estimate I t values over the other subperiod(s). Finally, model performance is assessed using several statistics typically used in dendroclimatology: R, RE, CE and DW. Here, R stands for the well-known correlation coefficient. RE stands for the Reduction of Error statistic and is a measure for the prediction performance for the calibration period chosen. CE stands for the Coefficient of Efficiency statistic and measures the prediction performance for a particular validation period. For this reason the CE statistic is more stringent than the RE statistic. Please refer to National Research Council (2006, p. 92-95 for strengths and weaknesses of these statistics). DW stands for the Durbin Watson test, which tests for lag-1 autocorrelation in residuals of the calibration model. See Fritts (1976), Cook and Kairiukstis (1990), Cook et al. (1994) andNational Research Council (2006) for details. Bürger and Cubasch (2007) use the RE and CE statistics in combination with resampling schemes.
Stochastic response functions can be modeled by choosing a sub-model from the class of structural time series models (STMs), in combination with the discrete Kalman filter (Harvey, 1989;Visser and Molenaar, 1995;Durbin and Koopman, 2001). These models allow the estimation of a regression model with time-varying coefficients: Here, I t stands for a climate proxy, such as a standardized ring-width chronology, or the logarithm thereof (depending on the additivity resp. multiplicativity assumptions in the growth model). The variables X i,t may stand for standardized climate variables or PCs thereof. The coefficients α 1,t , ....., α m,t together form the stochastic response function and ε t is a white noise process with variance σ 2 ε . A timedependent response is gained by defining m random walk models: α i,t =a i,t−1 +η i,t , i=1, ...., m. Here, η i,t stands for a noise process with zero mean and variance σ 2 η,i . These variances may be found by applying maximum likelihood optimization. For the trend component µ t , resembling the age-related trend plus external factors, the integrated random walk trend model (IRW model) is taken (Visser, 2004). This model reads as µ t − 2 · µ t + µ t = η m+1,t , with η m+1,t a white noise process with variance σ 2 η,m+1 . Dendroclimatic consequences for chronology building and the selection of variables in the context of time-varying coefficients are described in Visser and Molenaar (1988), and van den Brakel and Visser (1996).
The discrete Kalman filter is an optimal filter in that it yields the minimum mean square error estimates (MMSE) for µ t and α t along with maximum likelihood estimation for unknown noise variances. This result holds if all noise processes are normally distributed. If noise processes involved do not follow a normal distribution, the Kalman filter still yields the minimum mean square linear estimators (MM-SLE) for µ t and α t . For details please refer to Harvey (1989, p. 111). All estimates shown in this article are based on smoothed estimates using a fixed-interval smoother.
As for initial values of noise variances we have chosen the approach of a so-called diffuse or non-informative prior. This means that we set the initial covariance matrix to the unity matrix with large numbers on the main diagonal. Thus, we simply "tell" the filter that we have no information whatsoever at the first iteration. See Harvey (1989, p. 121) for details. The consequence of this approach is that the filter needs some iterations to arrive at stable state-space estimates. For the models presented in Tables 1 and 2 of this article we have chosen for a transient period of 20 years. The consequence of a "diffuse prior" is that the innovations series starts after these 20 years. In the process of ML optimization the transient period is excluded. See Harvey (1989, p. 256) for details. We note that this transient period is not seen in Figs. 1 through 4. The reason is simply that these graphs do not show the filtered estimates for µ t and α t , but the smoothed estimates.
The IRW trend model reduces to the well-known OLS fit of a straight line if σ 2 η,m+1 is set to zero. And constant responses are found if the variances σ 2 η,i are set to zero. Thus, if all noise variances in model (A2a) are set to zero, this model reduces to I t =α 0 +β · t+α 1 · X 1,t +α 2 · X 2,t +.....+α m · X m,t +ε t (A2b) For stationary data the term β will be zero and Eq. (A2b) equals the well-known multiple regression model.
The SRF model has similarities to the moving response functions (MRFs) (Biondi and Waikul, 2004) if the trend component is set to a random walk, rather than the IRW trend model. However, the advantage of SRFs over MRFs is that no window has to be chosen and, thus, that estimates are found for the full sample period. Furthermore, within the framework of SRFs the selection of variables can be performed over the whole calibration period (Visser and Molenaar, 1988), while MRFs have to do that for a much shorter window, in the order of 50 years of length. This selection of variables may become problematic if the number of explanatory variables becomes large (>24).
Commercial software for estimating model (A2a) is available from Timberlake LTD. They supply the STAMP package which is able to estimate a wide range of STMs, including the case where I t is multivariate rather than univariate. A non-commercial package is TrendSpotter and is freely available from the first author. It should be noted that STAMP does not supply estimates for trend differences [µ t − µ s ] or response weight differences [α t − α s ], based on De Jong and Mackinnon (1988). In the present version of TrendSpotter uncertainty estimates are supplied for [µ t − µ s ] (Visser, 2004). However, uncertainties for [α t − α s ] are not supplied. This would be an important extension of the software in the context of climate reconstruction research.