Inferring climate variability from nonlinear proxies : application to palaeo-ENSO studies

Inferring climate from palaeodata frequently assumes a direct, linear relationship between the two, which is seldom met in practice. Here we simulate an idealized proxy characterized by a nonlinear, thresholded relationship with surface temperature, and we demonstrate the pitfalls of ignoring nonlinearities in the proxy–climate relationship. We explore three approaches to using this idealized proxy to infer past climate: (i) methods commonly used in the palaeoclimate literature, without consideration of nonlinearities; (ii) the same methods, after empirically transforming the data to normality to account for nonlinearities; and (iii) using a Bayesian model to invert the mechanistic relationship between the climate and the proxy. We find that neglecting nonlinearity often exaggerates changes in climate variability between different time intervals and leads to reconstructions with poorly quantified uncertainties. In contrast, explicit recognition of the nonlinear relationship, using either a mechanistic model or an empirical transform, yields significantly better estimates of past climate variations, with more accurate uncertainty quantification. We apply these insights to two palaeoclimate settings. Accounting for nonlinearities in the classical sedimentary record from Laguna Pallcacocha leads to quantitative departures from the results of the original study, and it markedly affects the detection of variance changes over time. A comparison with the Lake Challa record, also a nonlinear proxy for El Niño–Southern Oscillation, illustrates how inter-proxy comparisons may be altered when accounting for nonlinearity. The results hold implications for how univariate, nonlinear recorders of normally distributed climate variables are interpreted, compared to other proxy records, and incorporated into multiproxy reconstructions.


Introduction
A principal goal of palaeoclimatology is to infer climate information from geochemical, physical, or lithological signals embedded in various proxy archives.Implicit in most analyses of palaeoclimate records is the assumption that the observations are linearly related to the target climate quantity, so that traditional calibration approaches (e.g.Brown, 1994) allow climate to be inferred given the proxy values.
The climate system, however, is rife with nonlinearities, as are the processes translating climate signals into proxy records (see e.g.Evans et al., 2013, for a review).Examples of nonlinear processes known to markedly distort climate signals include biological threshold effects on tree growth (Vaganov et al., 2006;Anchukaitis et al., 2006;Tolwinski-Ward et al., 2011), karst effects on speleothem δ 18 O records (Baker et al., 2012;Jex et al., 2013), and hydrodynamic effects on flood proxies.Nonlinearities are especially pronounced in terrestrial proxy records from the tropics, where temperature experiences its lowest dynamic range and precipitation its highest dynamic range, resulting in distributions that are non-normal, with strong positive skew.These records are frequently interpreted as reflecting some aspect of the El Niño-Southern Oscillation (ENSO) phenomenon, involving hydrological balance (Conroy et al., 2008(Conroy et al., , 2009)), river runoff (Rodbell et al., 1999;Moy et al., 2002;Rein et al., 2004Rein et al., , 2005)), or wind speed (Wolff et al., 2011) -all nonlinear functions of ENSO state.These proxies can feature statis-Published by Copernicus Publications on behalf of the European Geosciences Union.tical distributions that are different from those of the climate phenomenon they purport to record, and different from one another.What are the consequences of inferring changes in ENSO state from such land-based, nonlinear records?More generally, do nonlinear proxies allow for reliable statements about changes in climate variability, and if so, under which conditions? Can nonlinearity be overcome to correctly infer changes in the underlying climate?
Even when the target climate quantity is well approximated by a normal distribution, nonlinearities often manifest themselves as non-normality in the proxy distribution.This is because a normally distributed proxy is only expected if the proxy is a linear recorder of a normally distributed climate variable, as the linear transform of a Gaussian random vector is also Gaussian.Temperature fluctuations, especially if averaged over a month or more, typically obey normal statistics, so linear temperature proxies tend to obey normal statistics as well.In contrast, tropical run-off proxies, for instance, are nonlinearly related to land precipitation, which is heavily skewed (thus non-normal) and only indirectly linked to sea-surface temperature (SST).Such records are not directly amenable to analysis using common techniques assuming linearity and normality (e.g.spectral analysis, principal component analysis, least-squares regression methods, correlation analysis, and parametric hypothesis testing).
One strategy to overcome this difficulty is to use generalized linear models (GLMs; e.g.McCullagh and Nelder, 1989;Lindsey, 1997) that enable linear inference methods despite nonlinear relationships between predictors and predictands.Such methods require the specification of the functional relationship between proxy and climate, which may not always be known.Even if these link functions are known, GLMs require a case-by-case treatment, with each proxy potentially displaying a different relation to climate.Such a case-specific treatment can become prohibitive in multiproxy studies, which range from graphical comparisons of a halfdozen records in a stack to assess the synchronous nature of climate fluctuations (e.g.Conroy et al., 2008, Fig. 5, or Kirby et al., 2014, Fig. 5) to using several hundreds of proxies to reconstruct temperature over the late Holocene using statistical models (e.g.Mann et al., 1998;Luterbacher et al., 2004;Moberg et al., 2005;Mann et al., 2008;Kaufman et al., 2009;Mann et al., 2009;Barriopedro et al., 2011;Emile-Geay et al., 2013a, b;Tingley and Huybers, 2013).In such cases, it would be desirable to account for nonlinearity so that nonlinear proxies may be used alongside linear proxies without recourse to individual treatment.
This article draws attention to the pitfalls of ignoring nonlinearity in the proxy-climate relationship and explores a number of approaches to using nonlinear proxies to infer climate variability when the underlying climate obeys normal statistics.Our objective is not to consider all of the vagaries of fitting a straight line through noisy data, including calibration vs. regression, measurement errors in predictor variables, and various regularization techniques.For in-depth discussions of these issues, relevant and complementary to the problem at hand, see Brown (1994) and Christiansen (2014).Here we focus on the related challenge of inferring climate variability from proxy archives that are nonlinear, univariate recorders of the target climate variable.We consider both simple, generic strategies that are amenable to multiproxy studies, and more case-specific treatments.
Section 2 presents a simple toy model for a nonlinear proxy, used to illustrate the effects of nonlinearity on inferences about climate variability.Section 3 outlines various practical solutions to the problem, which we then apply to the Laguna Pallcacocha and Lake Challa records in Sect. 4. Section 5 provides discussion and concluding remarks.

A simple proxy model
To illustrate the challenges posed by nonlinear recorders of climate, we consider an idealized model for a run-off proxy that displays a univariate and stationary but nonlinear response to ENSO-induced rainfall.Sediment-based run-off proxies are expected to feature nonlinearities for at least four reasons: sediment mobilization is a nonlinear function of flow speed (Rose, 1993); flow speed is a nonlinear function of rainfall (e.g.Maidment, 1993); rainfall is a nonlinear function of sea-surface temperature (e.g.Lengaigne and Vecchi, 2010); and finally, we assume that rainfall at the location of the proxy occurs only during the warm phase of ENSO, inducing an asymmetric relationship with climate.
To characterize ENSO state, we use December-January-February averages of the NINO3.4index (average seasurface temperature anomaly in the region (5 • N-5 • S; 170 • W-120 • E)) from the Extended Reconstructed Sea Surface Temperature (ERSSTv3) data set (Smith et al., 2008).Although NINO3.4 statistics are well known to be skewed at monthly scales, the seasonal averaging results in a nearly Gaussian distribution, so that for this example non-normality in the proxy arises on account of the proxy-climate relationship rather than the underlying climate.To add stochasticity, we assume that the idealized land-based proxy is sensitive to the local expression of NINO3.4 via c (t) = c(t) + 1 (t), where c(t) is the NINO3.4index, c (t) is the local expression of ENSO variability, and for convenience 1 is a zero-mean Gaussian white-noise process with variance δ 2 .An additive noise 2 , in the proxy units, represents measurement error.Our idealized model for the run-off proxy then takes the form where α > 0 and β > 1.The exact value of β is relatively unimportant in the context of our example; in the following, we choose α = 1 and β = 3, though our results are qualitatively insensitive to this choice.
As we expect the error introduced by the local vs.global ENSO signal to be much larger, we neglect measurement error in the remainder for the sake of simplicity of exposition.We also ignore chronological errors, though they would be important in nature.This example may be viewed as a bestcase scenario: it is the most parsimonious representation of the fact that the ENSO signal is global but experienced locally by the proxy, with deviations controlled by δ.
A proxy generated according to Eq. ( 1) selectively records El Niño events, and it does so with great sensitivity -in this respect it is an ideal proxy.Fig. 1 shows the NINO3.4index and a pseudo-proxy series derived from it using Eq.(1).It is qualitatively similar to run-off proxy records commonly interpreted as reflecting ENSO variability (e.g.Rodbell et al., 1999;Moy et al., 2002;Rein et al., 2004).The pseudo-proxy correctly identifies maxima in the target NINO3.4index but exaggerates their relative magnitudes, while the minima in the target series are clipped at zero.Analysis of this skewed pseudo-proxy will thus lead to correct conclusions regarding the timing of positive excursions in the NINO3.4index, but the nonlinearity and thresholding can potentially result in overestimates of changes in ENSO variability.Quantitatively, the ratio of the NINO3.4sample variances within the 1950-1999 period and the 1900-1949 period is 1.25, while the distribution of variance ratios for 10 000 realizations of the pseudo-proxy, differing in their noise alone, is peaked at higher values (Fig. 3), with a median of 5.25 and interquartile range (Wilks, 2011, p. 26) of 6.20.Indeed, the variance ratio for 86 % of the realizations of p(t) is more than 50 % higher than that calculated from the original NINO3.4series, indicating the large impact of the nonlinear transform (Eq. 1) on the second-moment properties of the proxy as compared with the climate.Note that this assessment considered positive values of the proxies only, but results are nearly identical without this restriction.

Coping with nonlinearity
Our aim is to explore the quantitative information about climate variability that can reliably be inferred from a proxy record generated according to Eq. ( 1), and to explore the pitfalls of failing to account for the nonlinear relationship with climate.A full review of all existing inference strategies in the presence of nonlinearity and non-normality is beyond the scope of this paper; instead, we focus on three workable approaches that may be useful in palaeoclimatology.
-Functional transformation: when the transfer function between proxy and climate is known a priori, it may be leveraged to infer climate.Indeed, in the absence of noise, Eq. ( 1) expresses a one-to-one mapping between proxy values and local climate when the latter displays positive excursions.This relationship over positive values is therefore invertible and may be used for inference, for instance by specifying this relationship as a
link function within a GLM (Nelder and Wedderburn, 1972).Linear regression can then be used to perform inference on climate given the proxy in a manner that accounts for nonlinearities.However, as all negative excursions are mapped to zero, the GLM framework cannot be used over such points.A related approach is to embed the mechanistic model describing how climate signals are recorded in the proxy archives within a hierarchical model.Under some circumstances, such models may be inverted using Bayes' rule to yield quantitative information about the underlying climate (e.g.Tolwinski-Ward et al., 2011, 2013).This is similar in spirit to the GLM framework, as both exploit the functional relationship between proxy and climate.A limitation common to both frameworks is that the parametric from of the proxy-climate relationship must be known, which may not be the case in practice.
One important difference between the two approaches concerns the choice of dependent vs. independent variable.Under the GLM framework, there is ambiguity in the set-up of the model, as the goal is to infer climate from proxies but the proxies are best understood as the dependent variable.Bayes' rule, in contrast, naturally inverts the etiologically correct specification of climate as the dependent variable to infer climate conditional on the proxies (Brown, 1994;Tingley et al., 2012;Christiansen, 2014).
-Empirical transformation: when a parametric relationship is not known a priori, or when parameter estimation is unreliable, empirical transforms offer a useful alter- native.For example, the well-established power transforms (Box and Cox, 1964) render non-normal time series approximately normal, but the estimation of their exponents can be unstable in the presence of noise (Bickel and Doksum, 1981).An alternative is afforded by inverse transform sampling (ITS; e.g.van Albada and Robinson, 2007), a robust approach for converting any distribution to a standard normal.The transform in this case proceeds by evaluating the inverse normal cumulative distribution at the percentiles of the observations according to the fitted cumulative distribution function; the resulting values are then approximately normally distributed.Because it is quantile-based, this method is less sensitive to outliers than power transforms.It is also non-parametric and hence more widely applicable.We note that the widely used standardized precipitation index (SPI; McKee et al., 1993McKee et al., , 1995)), often used to characterize drought conditions, amounts to a parametric ITS assuming a gamma distribution for precipitation measurements.
We now apply those approaches to better understand past ENSO variability as depicted by nonlinear proxies.

Application to variance changes
In the absence of information on the proxy-climate link, ITS is computationally simple and non-parametric, and allows the transformed time series to be analysed using classical tools.Fig. 2 shows the result of applying this transform to a pseudo-proxy record generated according to Eq. ( 1).The amplitude of positive excursions in the transformed series is now comparable to the original NINO3.4series, and in repeated experiments the distribution of variance ratios between the early and late twentieth century (Fig. 3) is centred at 1.5, much closer to the true value of 1.25, and the interquartile range is now reduced to 0.75 (compare with 5.25 and 6.20, respectively, using the raw proxy values).Similarly, the probability of overstating the true change in variance by more than 50 % is only 22 % (vs.86 % using the raw proxy values).Similar results are obtained by inverting Eq. ( 1), showing that application of the correct transform can effectively correct for nonlinearity.
However, while the transform adequately corrects for the cubic nonlinearity, it cannot overcome the fundamental limitation that this idealized proxy only records El Niño events, while information about La Niña events is irretrievably lost.Proxies generated according to Eq. ( 1) thus represent a lossy transformation of the underlying climate signal, one that no amount of statistical ingenuity can ever rectify.A good probabilistic analysis should nonetheless yield reasonable estimates of positive climate excursions, and accurate error bounds for all climate excursions.

Inference from nonlinear proxies: three approaches
We compare the relative merits of three approaches to reconstructing past climate from a proxy generated according to Eq. ( 1), given a calibration period when both climate and proxy observations are available.The problem is more general than the estimation of variance changes considered above, since it seeks to estimate climate values and quantify the associated uncertainties.We note that the problem of statistical calibration of noisy predictors has received extensive attention in the statistical (e.g.Brown, 1994) and climate (e.g.Christiansen, 2014, and references therein) literature.
Our goal is neither to present an exhaustive list of possibilities nor to re-invent the wheel, but rather to investigate some choice practical methods that may be of use to palaeoclimatologists.

Three approaches to inference
-Method 1 (RAW): nonlinearity is ignored, and the proxy series p(t) is used as a predictor in a standard linear regression (an assumption implicit to many palaeoclimate studies).
-Method 2 (ITS): nonlinearity is recognized, empirically corrected via inverse transform sampling, and used as a predictor in a standard linear regression.
-Method 3 (BPM): nonlinearity is recognized, and a probability model that allows for Bayesian inversion of  1900-1949 and 1950-1999, using positive proxy excursions.Light blue: distribution of ratios from 10 000 samples of p(t) derived according to Eq. ( 1) with noise variance set to 0.25 (compare with the dark grey curve in Fig. 1); dark blue: distribution of ratios estimated from z(t), the empirical transform of each sample of p(t) (compare with the dark grey curve in Fig. 2); green: distribution of ratios estimated from the functional transform ((p/α) β ).In each case, the median is indicated with a dashed vertical line of the same colour, while the ratio estimated from the original NINO3.4series is indicated in red.
the structural dependence of the proxy on the climate (Eq. 1) is constructed.The model, described extensively in Appendix A, is used to infer the posterior distribution of climate values given proxy observations, f (c|p).
These three cases involve increasing levels of sophistication but do not attempt to cover all possible choices.Instead, our purpose here is to illustrate the danger of ignoring nonlinearity (RAW), outline an optimal (but difficult) solution (BPM), and find a practical compromise between the two (ITS).The experimental sample is composed of 10 000 surrogate climate time series c(t) of length n = 300 years generated according to an AR(1) process, with mean zero and a lag-1 autocorrelation of γ = 0.7 to mimic the serial correlation typical of observed sea-surface temperature time series; conclusions are not sensitive to this choice.The innovation standard deviation, σ , is chosen so that the standard deviation of c(t) is unity.We add 1 ∼ N(0, δ), with δ = 0.5 × σ , to produce c (t), and the idealized proxy series is then generated according to Eq. ( 1), with α = 1 and β = 3.
In the RAW and ITS cases, inference is made via linear regression, where the regression model is trained on a calibration interval formed from the most recent 150 time points.To minimize erroneous inference when p = 0, the regression model is fit using only proxy observations that are greater than zero.We adopt a heuristic approach to the p(t) = 0 ob-servations in the prediction interval, assigning to them the value c− , the calibration sample mean of c(t) calculated over time points for which p(t) = 0.While somewhat unorthodox in the regression context, the divergent treatments in the p = 0 and p > 0 cases is a reasonable ad hoc procedure warranted by the thresholding behaviour of the proxy.
For the two regression cases, uncertainties are quantified via 95 % prediction intervals (PIs) following standard regression theory (e.g.Wilks, 2011, Eq. 7.22), where the variance of residuals is again computed separately for p = 0 and p > 0. In the Bayesian model, the posterior mean serves as the optimal prediction of c(t), and 95 % credible intervals (Jaynes, 1976;Gelman et al., 2004) are readily obtained from the posterior samples.
For simplicity, we assume in the Bayesian treatment that all model parameters (α, β, µ, σ, δ) are known.Doing so permits for closed-form expressions for the required posterior distributions that aid the interpretation of results.In practical applications these parameters would be inferred from the data as well, via a hierarchical model and Markov chain Monte Carlo-based inference (e.g.Gelman et al., 2004).By constructing the Bayesian model to match the proxygenerating process, in both structure and parameter values, we give it an unfair advantage over the other methods.We use this best-case scenario as a benchmark, expected to provide the best estimates of climate given the skewed proxy measurements, as well as reliable estimates of uncertainty.Appendix B explores the role played by the prior specification of the variance of the climate process, σ 2 .

Results
Results of performing inference on c(t) using each of the three approaches are plotted in Fig. 4 for one realization of the climate and noise processes.The raw regression overestimates positive excursions (cf.time points 110 to 130), while in contrast the amplitudes of positive excursions as inferred from the regression on the transformed proxies and using the Bayesian inversion of the mechanistic model are in reasonable agreement with c(t).The raw regression could thus exaggerate any changes in climate variability prior to the calibration period, a direct consequence of linearizing the nonlinear climate-proxy relationship.
We investigate the properties of the three inference methods, and the sensitivity of the results to the ratio δ/σ , using residual plots (Fig. 5).For small noise levels (δ = 0.1σ ), and for p > 0, there is most structure in the residual plot for the RAW method and least for the BPM, indicating that the RAW and BPM approaches provide, respectively, the poorest and best fits to the data.The curvature of the residual plot for p > 0 is positive for the RAW regression, as the nonlinearity magnifies large excursions.In contrast, the residual plot for the ITS regression displays negative curvature, indicating that the transform overly compresses large excursions and overly magnifies small ones.In the least noisy case (δ = 0.1σ ), the Bayesian model yields near-perfect estimates for p > 0, while for p = 0 the distribution of residuals is similar for all three methods.As the noise level increases (middle and bottom rows), the residual plots become less structured and the vertical spread in each case becomes comparable to that of the p = 0 case.For higher noise levels, there are large positive outliers in the residuals from the RAW proxy regression, as expected since the method makes no account of the nonlinearity.In contrast, the ITS regression and Bayesian inversion yield generally smaller residuals even for high noise levels.
We further investigate the diagnostic features of the inference procedures using the 10 000-member ensemble of realizations of the climate and noise processes for the intermediate noise level δ = 0.5σ (Table 1).To quantify the properties of the uncertainty intervals, we first evaluate their width, considering both the p = 0 and p > 0 cases for each inference method (top row).All three methods show larger uncertainties in the p = 0 case (around 2.7 • C) with small differences between methods, consistent with the lack of information of-fered by the proxy in this regime.When p > 0, uncertainties are reduced for all methods, as expected from the information provided by the proxy.The raw proxy regression results in the widest uncertainty intervals, while those from the Bayesian inversion are narrowest.
To see how well such intervals encompass the true climate fluctuations, we calculate their actual coverage rates (Guttman, 1970), to be compared with a nominal coverage rate of 95 % (Table 1, rows 3-4).For the two regression approaches, the uncertainties are slightly too permissive in both cases, with actual coverage rates of about 94 %.The coverage rate for the mechanistic model is correct for the p = 0 case but only about 90 % when p > 0. In contrast to RAW and ITS, the mechanistic model does not involve two separate calibrations and solutions for the p = 0 and p > 0 cases.As such, the coverage rates are more appropriately assessed without division into the two cases, yielding PIs with a coverage rate of 93 with a standard error of ±2 %, compatible with the nominal rate of 95 %.While coverage rates can always be increased by widening PIs, a useful probabilistic model should yield predictions that are both sharp and on point.A complementary view comes from the interval score (Gneiting and Raftery, 2007, Eq. 43), which balances coverage with sharpness.That is, the score rewards the narrowest intervals that encompass the true values at the nominal rate (here, 95 %) and should be as small as possible.Computed scores (Table 1, rows 5-6) are lowest for the BPM, whose slightly permissive coverage is more than compensated for by smaller widths; the scores are also the least variable for this method.At the opposite end, the raw regression displays the largest and most variable scores, while the ITS regression displays scores much closer to the BPM, and with comparable variability, particularly when data speak (p > 0).
The bias in the mean is quite small for all methods (rows 7-8) but is slightly more variable for the two regression methods (raw and transformed) than for the Bayesian model.The bias in the Bayesian inference for p > 0 results from the balance between the erfc term in Eq. (A5) giving greater weight to more positive values and the informative prior shrinking the posterior mean towards the prior mean (Eq.A6); given the particular parameters, the erfc terms has a stronger effect and there is a slight positive bias.More appropriately, assessing the Bayesian inference jointly for the p = 0 and p > 0 cases results in a bias that is not inconsistent with zero (0.05 with standard error 0.08).Changing the prior sharpness for the climate process has competing effects on the bias (and other diagnostics) in the p = 0 and p > 0 cases; these issues are investigated further in Appendix B.
Table 1.Inference diagnostics derived from 10 000 realizations of the climate process c(t), for each of the three inference approaches, with δ = 0.5.CI width is the width of the 95 % uncertainty intervals produced by each method (see Fig. 5) averaged over the 300 time points.The coverage rate should be compared to its nominal value of 95 %.The interval score should be as small as possible.The bias is defined as the difference between the estimated and true climate, averaged over the entire interval.All statistics have been stratified according the sign of p to account for the varying amounts of uncertainty in each case.For the top three rows, we report the mean and standard error (s.e.) of each distribution, obtained from the 10 000 realizations.The variance ratio statistics pertain to the distributions plotted in Fig. 6; variance ratios for p = 0 are not reported because all methods produce constant estimates in that case.

Quantity
Statistic RAW ITS BPM p = 0 p > 0 total p = 0 p > 0 total p = 0 p > 0 total CI width ( Finally we summarize how closely each method estimates the variance ratio between the calibration (t > 151) and validation (t ≤ 150) intervals (Table 1, rows 9-10).We calculate the ratio of sample variances between the two periods for each of 10 000 inferences of c(t) using each method.We restrict our attention to the case where p > 0, as when p = 0 each method estimates c(t) by a constant value and the variance of the corresponding climate cannot be estimated.To remove the effect of differences in sample variances calculated over the two intervals for realizations of the actual climate c(t), we divide the ratio inferred from each pseudo-proxy by the ratio calculated from the corresponding realization of c(t).
An ideal estimator of past climate should therefore result in a distribution of normalized variance ratios that is tightly distributed about unity, and indeed we find this to be the case for regression on the transformed proxies and for the Bayesian inversion (Fig. 6, blue and green curves; Table 1).In contrast, regression on the raw proxies results in a much broader distribution, peaked at about 0.4, with a much heavier upper tail (Fig. 6, cyan curve; Table 1).More tellingly, regression on the raw proxies results in a normalized variance ratio in excess of 1.5 in 29 % of experiments, more than 4 times as frequently as for the other two methods (Table 1, last row): in other words, regression on the raw proxies ups the risk of overestimating changes in variability by a factor of 4 as compared to the other methods.
Taken together, the results of our numerical experiments lead to a number of general observations about the potential pitfalls of ignoring nonlinearity in proxies, as well as the strengths and weaknesses of potential solutions.A failure to account for strong nonlinearity in a proxy often presents itself in structured residuals (Fig. 5) and incurs a high risk Estimated ratio

Probability density
Variance ratio, N = 10000 Regression, raw, p > 0 Regression, transform, p > 0 Bayesian posterior mean, p > 0 Figure 6.Distribution of variances ratios between the calibration and validation intervals, as inferred from each method and normalized by the same ratio calculated from the actual realization of c(t).Cyan: raw regression; blue: regression on transformed pseudoproxies; green: Bayesian inversion.The black dashed line denotes unity, which, due to the normalization, is the result under perfect inference for each realization of c(t).Distributions are based on 10 000 realizations of c(t) with δ = 0.5.
of overstating changes in climate variability (Table 1, last row).Since palaeoclimate proxies are almost never considered in isolation, but (rightfully so) often stacked against comparable records, it is necessary that all proxies be approximately linearly related before interpreting discrepancies between records.This caveat is especially applicable to mul-tiproxy analyses, as a climate reconstruction based on linear regression that includes such nonlinear proxy predictors would inherit the same potential pitfalls, though they may be attenuated by the presence of other sources of information.
Applying ITS to the nonlinear proxy record prior to regression analysis results in a much better fit according to the residual plots, as well as more faithful estimates of changes in variance.Such a transformed proxy series will thus be less likely to lead to spurious conclusions about past climate variability if included in a multiproxy reconstruction or a proxy intercomparison.
An important shortcoming of the regression methods used here is that inference is generally limited to a best-estimate of past climates and an uncertainty interval, whereas the Bayesian framework naturally provides the full distribution of climate conditional on the proxy observations (for further discussion see Tingley et al., 2012).However, we note that the excellent performance of the Bayesian inversion in all numerical experiments performed here is to be expected given that the inference model is constructed using the correct mechanistic model and parameter values.In practical applications, the parametric form of the mechanistic model would likely not be an exact representation of the data-generating mechanism, while the parameters themselves would need to be estimated from the data.This can be computationally demanding when no closed-form solution exists for the posterior distribution of model parameters (as in this case) and would naturally broaden the posterior distributions, i.e. lead to more uncertainties about the reconstructed climate.

Laguna Pallcacocha
We now investigate how inferences made from the iconic Laguna Pallcacocha record of river run-off from the southern Ecuadorian Andes (Moy et al., 2002) are affected by different considerations of nonlinearity.The primary variable of interest here is the red colour intensity of the sediment, which is controlled by sediment delivery to the lake.Episodes of alluvial deposition are hypothesized to occur predominantly during El Niño events (Rodbell et al., 1999), similar to our idealized proxy (Sect.2).Our intent here is not to dispute qualitative conclusions made from the Laguna Pallcacocha record -they turn out to be robust across the analyses we consider -but to show how nonlinearity may affect the quantitative inference made from such a proxy, and we view the analysis primarily as an illustrative example.
The time series of Pallcacocha red colour intensity displays pronounced positive skew (Fig. 7a), closely following a generalized extreme value distribution (GEV, Fig. 7b).In contrast, the distribution of December-January-February averages of NINO3 values is fairly close to a Gaussian, so using the Pallcacocha red colour record to infer changes in ENSO variability requires inferring a normally distributed climate variable from a strongly non-normal proxy time series -as per the experiments in Sects. 2 and 3. Fitting a mechanistic model would necessitate determining both its structural form and parameters, so for simplicity we explore the consequences of an empirical transform (ITS).By design, applying the transform to the empirical distribution produces a nearly Gaussian series (Fig. 7d).
We investigate how the transform affects the time series of sample variances computed on disjoint, 100-year segments (Fig. 8).While the 100-year variance time series calculated from the raw series suggests an abrupt increase in ENSOrelated variance at about 5000 years BP (grey dashed line), the series calculated from transformed values displays a more gradual increase.To establish if either or both series record significant increases in variance around 5000 years BP, we compute the variance ratio between two segments delineated by the grey line (centred at 3150 BCE), with lengths that vary between 200 and 3600 years.A window length of 200 years, for instance, means that we compare variances over the period 3350-3151 and 3150-2951 BCE.As the statistic of interest is a variance ratio, results are unchanged under linear transformation of the series: while accounting for the nonlinearity is necessary for meaningful inference, calibrating to climate units is not.
On account of the non-normal and serially correlated nature of both series, a standard F test for the significance of the observed variance ratio is not appropriate.Instead, we assess the significance of the variance ratio for each series via a block bootstrap resampling plan (Efron and Tibshirani, 1993;Kunsch, 1989), using a block length of 25 years (Table 2).That is, we randomly sample with replacement the original time series in 25-year blocks, compare variances before and after the year 3150, and do so N = 2000 times to obtain a null distribution, smoothed by kernel averaging (e.g.Wilks, 2011, Chap 4).The P value for each observed ratio (the probability of observing a ratio at least as large as that observed from chance alone) is then estimated from the bootstrap-derived null distributions.The results (Table 2) show two salient features: firstly, the variance ratio is generally larger with the raw than with the transformed series, as would be expected from Sects. 2 and 3. Secondly, the variance ratio estimated from the raw series is deemed significant at the 5 % level for all segment lengths, while the transformed series requires segment lengths of at least 1000 years to make this conclusion.Hence, with the raw series one would detect a significant shift at the boundary in a matter of 200 years, while when using the power-transformed series, this transition would be seen to occur much more gradually, taking at least 1000 years to become manifest.We note that the change in variance is largely driven by the record's accumulation rate (Rodó and Rodriguez-Arias, 2004), so we refrain from interpreting this aspect of the record.
Following Moy et al. (2002), we next apply Morlet wavelet analysis (Torrence and Compo, 1998), modified to account for energy conservation (Liu et al., 2007), to both the raw  Both series were standardized prior to analysis.and transformed series.Results are qualitatively similar in both cases (Fig. 9): interannual variability is muted prior to the mid-Holocene, while millennial and centennial variabil-ity is relatively persistent.However, the two wavelet transforms differ in detail, in particular regarding which "islands" of multidecadal power are significant with respect to a red noise benchmark (Torrence and Compo, 1998).In the latter 5000 years of the record, high-frequency variability appears more consistently significant with the empirical transform than without, indicating that the mid-Holocene shift for highfrequency variance is more consistently significant under the transform, even though the shift in total variance appears less substantial.
Our analysis does not contradict the qualitative conclusions of the original study by Moy et al. (2002).Indeed, the transformed series also suggests that ENSO events "become more frequent over the Holocene until about 1200 years ago, and then decline towards the present" (Moy et al., 2002).Wavelet analysis accounting for energy conservation (Liu et al., 2007) also suggests a statistically significant modulation in the 2000-year range, though it is nearly equal in amplitude to a ∼ 500 yr broadband peak and lies mostly within the cone of influence.Nonetheless, results of the statistical analyses are markedly changed from a quantitative standpoint when nonlinearity is recognized and the proxy series transformed to approximate normality.Applying classical estimators to the raw series may lead to erroneous conclusions about the magnitude of changes in the variability of the un- derlying climate, as well as their localization in frequency space.The empirical transform counterbalances the nonlinearities inherent to run-off proxies such as the Pallcacocha red colour intensity record, and the resulting transformed series is nearly normal and more likely to have an approximately linear relationship with climate (cf. the numerical examples of Sects. 2 and 3).One is therefore able to interpret changes in variability estimated from the transformed proxy record as indicating changes in climate variability (cf.Fig. 6).

Lake Challa
Recently, Wolff et al. ( 2011) used varved thickness at Lake Challa (equatorial east Africa) to retrace changes in ENSO activity since the Last Glacial Maximum.The proxy is interpreted as follows: El Niño (La Niña) events are associated with wetter (drier) conditions in east Africa and decreased (increased) surface wind speeds, which drive turbulent mixing in the lake, thereby bringing nutrients to the surface and boosting primary productivity.Thicker varves are therefore indicative of La Niña conditions, and thinner varves of El Niño conditions, resulting in a large anti-correlation (r = −0.48) between varve thickness and NINO3.4SST (Wolff et al., 2011).The raw varve thickness data are plotted in Fig. 10a for the past 3000 years.The histogram is highly skewed and closely fits a log-normal distribution with parameters (µ, σ ) = (0.33, 0.31) (Fig. 10b).The transformed series (Fig. 10c) closely follows a standard normal distribution (Fig. 10d).
As the raw Lake Challa series has been used as a proxy for ENSO activity (Emile-Geay et al., 2013a, b), it is worth asking how much it resembles the Lake Pallacacocha red colour intensity.Fig. 11 plots the two time series, as well as their standard deviations on sliding 40 yr windows, before and after transforming each record to normality.While Fig. 11c shows a fair amount of structure in each time series (peaks near 500 BCE and 2000 CE in Lake Challa varve thickness variability, quasi-periodic modulations of red colour intensity variability with an approximate recurrence time of 300 years), the transformed series (Fig. 11d) show different features: variability in red colour intensity now displays more pronounced late Holocene excursions than its untransformed counterpart, while variations in varve thickness now appear to recur at centennial, rather than millennial, scales.
This illustrative comparison is necessarily limited, as the proxies are located at either end of the Indo-Pacific system and are sensitive to different aspects of tropical Pacific SST (Challa responds more to central Pacific anomalies; Pallcacocha responds more to variability along the Peruvian coast, though see Rodbell et al. (2008) for a non-climatic interpretation), which are not simply related to each other.In addition, the Pallcacocha sensitivity to SST is highly nonlinear, as noted before, as is the Challa relationship: Wolff et al. (2011) note that "the varve thickness record is particularly sensitive  to La Niña conditions and shows some evidence for saturation during El Niño years".The main message is that the visual impression changes with the application of the transforms.Given that each proxy is non-normal due to a nonlinear relationships with normally distributed SSTs, there is a strong rationale for comparing the transformed rather than the untransformed series.

Discussion
Much of palaeoclimatology is concerned with the characterization of climate variability in the pre-instrumental era.Many proxy archives exhibit nonlinear relationships to the underlying climate, whereas many commonly used tools assume, either explicitly or implicitly, that the proxies are linearly related to a normally distributed climate variable.As illustrated for an idealized proxy (Sects. 2 and 3), and for ENSO-sensitive proxies (Sect.4), direct variance estimates on different segments of nonlinear proxy records generally give a misleading picture of climate variability.
There are a number of techniques that allow for meaningful conclusions about changes in the underlying climate from such nonlinear proxy archives.Applying an empirical transform to such a series can render it approximately normal, so that standard data-analytic tools may be applied.In the context of the pseudo-proxy study (Sect.2), the inverse transform sampling approach corrected for nonlinearity and reduced the risk of overstating variance changes.In the context of inferring climate values from proxy values (Sect.3), the transform provided a regression fit comparable to that of an optimal benchmark and led to normally distributed resid-uals, in accordance with the assumptions of the regression framework.These examples also highlighted the dangers of ignoring nonlinearity in the proxy-climate relationship.
Applying the transform to the Lake Pallcacocha record (Sect.4.1) allows conclusions from a wavelet analysis and estimates of changes in variance to be interpreted in terms of climate, even without explicit knowledge of the mechanism responsible for the nonlinearity.Transformation allows for classical tools to be correctly applied to the analysis of palaeoclimate records, and it is a simpler approach than specifying and fitting a mechanistically informed statistical model.Transformation to normality may thus be seen as a pragmatic compromise between scientific and statistical rigour on the one hand and the need for practical solutions to analyse nonlinear proxy time series on the other.A limitation of the transform approach is that it is blind to the datagenerating mechanism and may conflate various sources of nonlinearity.
We thus stress that generic recipes are no substitute for a mechanistic understanding, and ultimately the scientific interpretation of a proxy in terms of its climatic, ecological, or geological controls should guide the choice of statistical methodology.The example of Sect. 3 illustrates that explic-itly modelling the mechanism giving rise to non-normality yields much better estimates of the underlying climate.In the example considered here, both the functional form and parameters of the mechanistic model were assumed known, while in real applications the form would likely be an approximation and the parameters would be estimated as part of a fully Bayesian analysis (Gelman et al., 2004;Tingley and Huybers, 2010a, b;Tolwinski-Ward et al., 2013), together with their associated uncertainties.
In instances where many proxies are used as predictors of past climates, such as climate reconstructions of the Common Era (e.g.Mann et al., 1998;Luterbacher et al., 2004;Moberg et al., 2005;Mann et al., 2008;Kaufman et al., 2009;Mann et al., 2009;Barriopedro et al., 2011;Emile-Geay et al., 2013b;Tingley and Huybers, 2013), individually modelling each proxy may prove challenging.Ideally, one would apply mechanistic forward models to each proxy (e.g.Dee et al., 2015) and combine them within a fully Bayesian inference procedure (Tingley et al., 2012).Until a full suite of validated models and computationally accessible Bayesian tools are available, however, empirical transforms provide a simple expedient to ensure that all input series fit the assumption of linear regression from a multivariate normal model that underpins most climate field reconstructions.We stress that such transforms are not a panacea, however, as they do not address other modelling challenges with proxy data, including time uncertainty (Anchukaitis and Tierney, 2012;Comboul et al., 2014), the heterogeneous distribution of observations in space and time, spatial covariance modelling (Tingley and Huybers, 2010a; Guillot et al., 2015), and the differing spatial and temporal averaging inherent to the proxies (Tingley et al., 2012;Hanhijärvi et al., 2013).
Finally, while this article has focused on nonlinear proxy records with power-law type relationships, it is worth pointing out that a number of other valuable climate proxies may deviate from linearity in other ways.In particular, records based on proportions (e.g.pollen counts, lithological fractions, fractions of certain faunal assemblages), being in the range [0, 1], also involve a nonlinear transform of Gaussian inputs like temperature and therefore require specific inference tools.Another key factor complicating inference from climate proxies is the existence of multiple influences on the measured variable, e.g.temperature and soil moisture controls on tree-ring width (e.g.Anchukaitis et al., 2006;Vaganov et al., 2006;Tolwinski-Ward et al., 2011;Evans et al., 2014) or temperature and seawater composition controls on the oxygen isotopic composition of biocarbonates (e.g.Thompson et al., 2011;Russon et al., 2013;Dee et al., 2015).We will explore solutions to these problems in future work.

Conclusions
In this article we showed that nonlinearity fundamentally alters the information content of climate proxies and that it must be dealt with, lest some erroneous inferences be made.Though we advocate for mechanistic modelling whenever possible, we showed that a simple empirical transform (ITS) can often be sufficient to remedy many of the ills of nonlinearity, and we recommend that palaeoclimatologists working with nonlinear proxies adopt such simple transforms in their work.Matlab code for implementing the transform is available at https://github.com/CommonClimate/common-climate/blob/master/gaussianize.m.Consider the proxy model of Eq. ( 1) and assume for simplicity that both c and are normally distributed, with c ∼ N (µ, σ 2 ) and ∼ N (0, δ 2 ).
We are interested in inferring c(t) from p(t) and, more specifically, in finding the posterior probability distribution of climate given the proxy, f (c|p).Using Bayes' theorem, the posterior distribution f (c|p) can be re-expressed as a function of the distribution of the proxy conditional on climate, f (p|c), and the prior distribution of climate states, π (c): In what follows, we set the prior on the climate states, π (c), to follow the same distribution as the process used to generate the climate itself.The prior is therefore informative and unbiased, and the posterior will be a mix of the information supplied by the prior and information from the data.As the prior distribution and the distribution of climate itself are the same in this illustrative example, in the absence of any information from the data, the posterior distribution of the climate will converge to the distribution used to generate climate, N (µ, σ 2 ).
On account of the thresholding behaviour of the proxy, it is necessary to distinguish two cases in calculating the posterior distribution of climate.With some rearrangements, the integral may be rewritten with the help of erfc, the complementary error function 1 : and the normalization constant can then be calculated numerically.The posterior is the product of two terms.
The first gives the likelihood that p = 0 given any value of c, while the second is the prior on c.Note that the likelihood increases as c decreases, and in the absence of the prior the uncertainty interval for c would be unbounded from below.The prior distribution thus provides important regularization and allows for uncertainty estimates with finite width when p = 0.

Figure 3 .
Figure 3. Distributions of estimated NINO3.4 variance ratios within the periods1900-1949 and 1950-1999, using positive proxy excursions.Light blue: distribution of ratios from 10 000 samples of p(t) derived according to Eq. (1) with noise variance set to 0.25 (compare with the dark grey curve in Fig.1); dark blue: distribution of ratios estimated from z(t), the empirical transform of each sample of p(t) (compare with the dark grey curve in Fig.2); green: distribution of ratios estimated from the functional transform ((p/α) β ).In each case, the median is indicated with a dashed vertical line of the same colour, while the ratio estimated from the original NINO3.4series is indicated in red.

Figure 4 .
Figure 4. Inferences on simulated c(t) using three different methods.(a) Inference using regression on raw proxy values (blue) and the Bayesian model (grey).(b) Inference using regression on transformed proxy values (blue) as well as the Bayesian model (grey).In both panels, c(t) and c (t) are shown in solid and dashed red, respectively; blue shading represents 95 % frequentist prediction intervals; and grey intervals represent pointwise 95 % credible intervals from the Bayesian posterior distribution (see, for example, Tingley and Huybers, 2013).

Figure 5 .
Figure5.Residuals (differences between estimated and true values of the climate) as a function of the estimated climate.The three inference methods (columns) are compared for three values of δ (rows).Blue circles: residuals when p > 0; box plots: the distribution of residuals for p = 0. Whiskers extend to the 2.5 % and 97.5 % quantiles; boxes denote the 25 % and 75 % quantiles; circles mark the median.The middle row corresponds to the simulated proxy plotted in Fig.4.

Figure 7 .
Figure 7. Raw and transformed time series of the Laguna Pallcacocha record (Moy et al., 2002).(a) Raw time series, interpolated at 2-year intervals; (b) empirical histogram (bars) and parametric fit to a GEV distribution with parameters (k P , σ P , µ P ) = (0.07, 15.59, 72.13) (red line); (c) transformed time series with λ = 0.31; (d) empirical histogram of the transformed time series (bars) and parametric fit to a standard normal (red line).

Figure 8 .
Figure 8. Variances calculated over non-overlapping 100-year segments of the Pallcacocha red colour intensity record (thick cyan line) and after applying the empirical transform (thin blue line).Both series were standardized prior to analysis.

Figure 9 .
Figure 9. Wavelet analysis of the Laguna Pallcacocha record (Moy et al., 2002).(a) Morlet wavelet coefficients for raw values of the red colour intensity series, interpolated at 2-year intervals; (b) global wavelet spectrum (black), with the AR(1) benchmark plotted in grey.(c, d) As in (a, b) but for the transformed time series.Colour bars are omitted since units are arbitrary.

Figure 10 .
Figure 10.Raw and transformed time series of Lake Challa varve thickness (Wolff et al., 2011).(a) Raw time series, interpolated at yearly intervals; (b) empirical histogram (bars) and parametric fit to a log-normal distribution (cyan line); (c) transformed time series; (d) empirical histogram of the transformed time series (bars) and parametric fit to a standard normal (blue line).

Figure 11 .
Figure 11.Comparison of Lake Challa and Laguna Pallcacocha records.Top row: raw (untransformed) records.Bottom row: transformed records.Left: time series interpolated at biannual resolution.Right: standard deviation on sliding 40 yr windows.

Table 2 .
Significance of variance changes across the 3150 BCE boundary (grey line in Fig.8) in Laguna Pallcacocha red colour intensity.The table lists P values for a block bootstrap test (25-year blocks, N = 2000 realizations) of the null hypothesis that the variance jump across the boundary could have arisen from chance alone.The test is deemed significant at the α level whenever P ≤ α and is conducted on time segments of variable length, to identify how long an interval is needed to detect a shift in variance.Results are virtually identical to 10-year blocks.