Correcting for signal attenuation from noise : sharpening the focus on past climate

Technical Note: Correcting for signal attenuation from noise: sharpening the focus on past climate C. M. Ammann, M. G. Genton, and B. Li National Center for Atmospheric Research, 1850 Table Mesa Drive, Boulder, CO 80307-3000, USA Department of Statistics, Texas A&M University, College Station, TX 77843-3143, USA Department of Statistics, Purdue University, West Lafayette, IN 47907, USA Received: 29 May 2009 – Accepted: 3 June 2009 – Published: 16 June 2009 Correspondence to: C. M. Ammann (ammann@ucar.edu) Published by Copernicus Publications on behalf of the European Geosciences Union.

correct for attenuation (Fuller, 1987;Carroll et al., 2006), even at annual resolution.The impact is illustrated in the context of a Northern Hemisphere mean temperature reconstruction.An inescapable trade-off for achieving an unbiased reconstruction is an increase in variance, but for many climate applications the change in mean is a core interest.

The problem of noisy predictors
Random noise in any linear system will affect the estimation process of regression coefficients that tie explanatory variable(s) X to the response Y .Uncertainty in Y can be quantified through the variance of the error from an ordinary least squares (OLS) fit, which by definition, in this case, is unbiased (thus it is known as "BLUE": best linear unbiased estimator).Errors in the predictor(s) X , however, cause the regression slope Introduction

Conclusions References
Tables Figures

Back Close
Full to get attenuated towards zero and the resulting signal in the prediction or reconstruction period will invariably be biased (Fuller, 1987).Figure 1a illustrates this effect for a simple 1:1-linear process where the response Y is only observed over the interval 0.9 to 1 while X is available over the full range of 0 to 1. Increasing the noise contained in X attenuates the OLS-derived slope parameter away from the true linear relationship.
Why does noise in the predictors cause attenuation of the true signal?Consider a simple linear regression model Y =β 0 +β 1 X +ε for which we have instrumental observations Y and the noisy proxy record W =X +U, where X is the desired climate signal and U is the contaminating noise.An OLS regression of instrumental data Y is therefore not directly on X but actually on W , and thus the result is not a consistent estimate of the desired regression coefficient β 1 (Fuller, 1987;Carroll et al., 2006).Rather, the regression slope is, in fact, σ 2 X /(σ 2 X + σ 2 U )*β 1 , where σ 2 X and σ 2 U denote the variance of X and U, respectively.Therefore, the larger the noise U, the stronger the attenuation of the regression slope will be.

Method
Ideally σ 2 U can be obtained through independent replicates of the noisy predictors.Where this is not possible (such as in most paleoclimate applications), it has to be estimated from the data.In a simple linear regression model, if the variance σ 2 U of the noise in the predictor is known, then an Attenuation Corrected Ordinary Least Squares (ACOLS) estimator of the slope be a good estimator of σ 2 U .However, if this is not the case, then a correction of the form σ2 U = σ2 U −k β2 1 * ,OLS must be made, where k≥0 is determined by 5-fold cross-validation on the calibration period based on the objective of minimizing the prediction bias.To ensure finite moments and superior small sample properties of β1,ACOLS , we follow Sect.2.5 in Fuller (1987) and replace σ 2 U in Eq. ( 1) by (1− α n−1 ) σ2 U , where α>0.Although Fuller (1987) provides an optimal choice of α in order to minimize the mean squared error of β1,ACOLS , we are rather interested in minimizing the bias in the reconstruction and to this end α needs to be close to zero.Hence we simply set α=0.01 but note that our results are insensitive to values around this choice.
Using the variance of U, one can correct the attenuation, and the true slope β 1 for X can be recovered (Fuller, 1987;Carroll et al., 2006).This straightforward approach can also be implemented in a multiple linear framework where now the vector of slopes is attenuated, and hence needs to be corrected.Consider now a multiple linear regression model Y =β 0 +β T X+ε with observed pdimensional vector W=X+U representing the signal X contaminated by noise U and variance-covariance matrices Σ X X and Σ UU , respectively.If Σ UU is known, then the ACOLS estimator of where ΣW W is the sample variance-covariance matrix of W. To estimate Σ UU , we first obtain the residual variance-covariance matrix ΣUU from separate OLS regressions of I the simple linear case, why does an attenuation-corrected estimate of β 1 help in reconstructing Y when only a noisy predictor(s) W is available instead of the clean signal X ?In the statistical literature it is often believed that noise in predictors is of no concern if the sole goal is prediction (Fuller, 1987;Carroll et al., 2006).However, this only applies in situations where the range of both W and Y is well represented Introduction

Conclusions References
Tables Figures

Back Close
Full in the calibration period.If this is not the case, then the noise in W does introduce bias in the prediction (Fig. 1a).Intuitively, as W becomes dominated by noise, then the OLS-based regression line will get attenuated away from the true relationship between X and Y and approach a horizontal line where it simply estimates the mean of Y in the calibration period.
Applying attenuation correction in the ordinary least squares (ACOLS) solution effectively eliminates the bias seen in OLS-based reconstructions (Fig. 1b, c).Orthogonal regression methods such as total-least-squares (TLS) can also recover the correct regression coefficients (Hegerl et al., 2006) but, in contrast to ACOLS, its implementation is not readily done.Carroll and Ruppert (1996) have warned that such TLSimplementations can be dangerous because: (a) the ratio η of the variance of ε to the variance of U can be sensitive to small changes in its two estimated components; (b) an additional variance component in the numerator of η is often omitted that should represent the "equation error" (Fuller, 1987), arising from the fact that even in the absence of measurement error data typically do not fall onto a straight line, and consequently (c) the corresponding TLS solution will often overcorrect the attenuation.The range of TLS answers is indicated in Fig. 1b, c by its two practical end-members, η=0 (all proxy noise) and η=1 (equal proxy and instrumental noise).The ACOLS results are more stable and avoid the difficulty of estimating the components of the ratio η.
3 Applications in a paleoclimate context "Measurement error" correction has already been employed in various disciplines (Carroll et al., 2006).Although analyses based on noisy predictors are common in climate research, the need for correction against attenuation has only recently been recognized (Allen and Stott, 2003;Hegerl et al., 2006;Mann et al., 2007Mann et al., , 2008)).In fact, the potential magnitude of the problem in paleoclimate reconstructions -where reconstructions are based on indirect, and thus inherently noisy, proxy records -has only been fully recognized as climate model output has been used in synthetic exercises to Introduction

Conclusions References
Tables Figures

Back Close
Full test reconstruction methods (von Storch et al., 2004;Hegerl et al., 2006;Wahl et al., 2006;Ammann and Wahl, 2007;Mann et al., 2007;Lee et al., 2008;Riedwyl et al., 2009).An often-discussed example concerns the true amplitude of Northern Hemisphere (NH) mean temperature over past centuries and millennia (Mann et al., 1998;Jones et al., 2001;Esper et al., 2002;von Storch et al., 2004;Moberg et al., 2005;Osborn and Briffa, 2006).Currently neither the proxies -because of concerns of potentially unreliable low-frequency information -nor the models -because of uncertainty in the magnitude of the forcings as well as the overall climate sensitivity -can resolve this issue.Lately, different strategies that reduce such amplitude loss have been explored (Juckes et al., 2007;Lee et al., 2008).They include one or a combination of approaches, among which: the selection of a longer, more representative calibration period (Ammann and Wahl, 2007), partial (Mann et al., 2007) or overall smoothing of the data (Crowley and Lowery, 2000;Mann and Jones, 2003;Hegerl et al., 2006;Lee et al., 2008), explicit inclusion of specific low-frequency proxy data (Moberg et al., 2005), application of a Kalman-Filter based reconstruction (Lee et al., 2008), variance-matching scaling (Jones et al., 1998) as well as total-least-squares regression (TLS) (Allen and Stott, 2003;Hegerl et al., 2006Hegerl et al., , 2007;;Mann et al., 2007Mann et al., , 2008)).TLS has received significant attention and new NH reconstructions based on this technique generally exhibit more pronounced amplitude (Hegerl et al., 2006(Hegerl et al., , 2007;;Mann et al., 2008;Riedwyl et al., 2009).While in some applications the dangers mentioned above might be less severe (Ammann and Wahl, 2007) (e.g., Canonical, or Principal Component Regression (Luterbacher et al., 2004;Riedwyl et al., 2009) separates by design the signal from noise) or even minor (Hegerl et al., 2007;Mann et al., 2007), appropriate independent estimates for the necessary parameters might not always be available.ACOLS offers an easy, straightforward alternative that can be implemented in a wide range of simple (univariate) and multiple regression applications.
A practical illustration of ACOLS in a climate reconstruction application is shown in Fig. 2. Using output from a coupled Atmosphere-Ocean General Circulation Model

Conclusions References
Tables Figures

Back Close
Full Screen / Esc Printer-friendly Version Interactive Discussion simulation (Ammann et al., 2007), we subsampled the annual temperature field at the grid-locations of real world proxies used in Hegerl et al. (2007).The correlation between model gridpoint information and hemispheric temperature in the model is similar to the real world data (Hegerl et al., 2007), and thus the important signal-to-noise level represented in the model-based example is broadly comparable.
The twelve distinct, annual grid point samples were calibrated over the period 1900-1999 against the true model NH temperature in both simple (composite plus scale, CPS) and multiple regression approaches.OLS-based reconstructions (Fig. 2a) indicate significant attenuation of the true amplitude of climate over the prediction period.On the other hand, ACOLS-derived reconstructions (Fig. 2b) are essentially unbiased in the evolving temperature amplitude and the true NH temperatures remain inside the 95%-confidence interval of the reconstruction.This is achieved here despite full annual resolution of the data throughout the reconstruction, and results were simply smoothed for visualization (see Supplementary Material: http://www.clim-past-discuss.net/5/1645/2009/cpd-5-1645-2009-supplement.pdf).Recent TLS and other methods' results shown in Lee et al. (2008) were potentially benefiting from the decadal smoothing prior to reconstruction (or by including a low-frequency step, Mann et al., 2007), a process that significantly reduces the noise compared to the signal.Other than TLS, only the KF-approach in Lee et al. (2008) does explicitly take noise in the predictors into consideration, and thus is expected to avoid attenuation from noise, even at annual resolution.Its implementation, however, is much more involved and in the multiple regression framework also computationally much more expensive.

Discussion and conclusions
One trade-off that has to be accepted in regression-based reconstructions is that the correction for bias comes at the cost of increased variance (see Supplementary Material: http://www.clim-past-discuss.net/5/1645/2009/cpd-5-1645-2009-supplement.pdf).This variance increase is mostly concentrated at the interannual scale, and thus Introduction

Conclusions References
Tables Figures

Back Close
Full decadal smoothing of the reconstructions results essentially compensates for this.ACOLS, therefore, could provide a much simpler and more stable way of warding off attenuation in regression-based reconstructions than previously proposed methods.Such improvements are not only possible for the large scale climate application demonstrated here, but are equally expected in any other regression-based inferences where the predictors are carrying substantial noise.In paleoclimatology, for example, this includes local or regional reconstructions based on records such as tree-rings, pollen, corals, or isotopic composition.Because an a priori assumption of "no change" in mean between the calibration and prediction/reconstruction period is not commonly possible (particularly not under current climate where a trend dominates the instrumental record), attenuation correction is not only helpful, in fact, it is necessary if a faithful representation of the true amplitude of the climate signal is to be recovered.Even if the noise in predictors approaches zero and no correction would be necessary, ACOLS will simply tend towards the OLS solution.While keeping a watchful eye on the variance, the reconstructions including attenuation correction will be unbiased.
In the climate arena, re-evaluation of existing reconstructions using ACOLS will likely confirm recent supposition of enhanced amplitudes (Huang et al., 2000;Esper et al., 2002;Moberg et al., 2005;Hegerl et al., 2006;Mann et al., 2008) over the recent past compared to earlier estimates.The overall structure of climate and its interpretation, however, should not be affected because in most cases we are simply dealing with a change in the slope, and thus a scale factor, of a linear relationship(s).Further research is now necessary to evaluate how the full, annual resolution of ACOLS can be used for spatial field reconstructions where enhanced variance, after having achieved a good and unbiased estimate of the mean, has to be controlled at the regional scale to preserve the dynamical structure of interannual climate variability (Luterbacher et al., 2004;Rutherford et al., 2005;Mann et al., 2007).Climate model output will again play a key role in such evaluation exercises of the PAGES/CLIVAR Paleoclimate Reconstruction (PR) Challenge (see: http://www.pages-igbp.org/science/prchallenge/).
) where σ2 W is the sample variance of W .In the absence of replicated W to estimate σ 2 U we first obtain the residual variance σ2 U from the OLS regression of W on Y , i.e.W =β 0 * +β 1 * Y +ε * .If the noise in W is much larger than the noise in Y , then σ2 for each i =1,. . .,p.Then we make the correction ΣUU = ΣUU −k β * ,OLS βT * ,OLS , where β * ,OLS =( β11 * ,. . ., β1 p * ) T .The rest of the procedure is analogous to above.

Fig. 1 .Fig. 2 .
Fig.1.Influence of increasing noise in predictors of simple linear models where the calibration is restricted to the interval 0.9-1.0 in the response variable Y and prediction extends to 0. (a) Traditional OLS regression exhibits rapid increase in attenuation of the true (orange) linear relationship as the signal-to-noise ratio (S:N) becomes dominated by the noise; (b) The OLS (green) regression result S:N=1:1 (a) is corrected using two possible TLS-answers (blue) based on the assumptions that either all noise is in X (η=0) or where the noise in X and Y are thought to be equal (η=1).Depending on estimation of parameters η, solutions for TLS will most likely be somewhere in between.ACOLS (red) solutions are close to the true regression coefficients (orange), yet the variance is somewhat increased.(c) Box-plots representing the range of solutions over 1000 replicates for applications shown in (b).