Inferring late-Holocene climate in the Ecuadorian Andes using a chironomid-based temperature inference model

Presented here is the first chironomid calibration data set for tropical South America. Surface sediments were collected from 59 lakes across Bolivia (15 lakes), Peru (32 lakes), and Ecuador (12 lakes) between 2004 and 2013 over an altitudinal gradient from 150 m above sea level (a.s.l) to 4655 m a.s.l, between 0–17 S and 64–78W. The study sites cover a mean annual temperature (MAT) gradient of 25 C. In total, 55 chironomid taxa were identified in the 59 calibration data set lakes. When used as a single explanatory variable, MAT explains 12.9 % of the variance (λ1/λ2 = 1.431). Two inference models were developed using weighted averaging (WA) and Bayesian methods. The best-performing model using conventional statistical methods was a WA (inverse) model (R2 jack = 0.890; RMSEPjack = 2.404 C, RMSEP – root mean squared error of prediction; mean biasjack =−0.017 C; max biasjack = 4.665 C). The Bayesian method produced a model with R2 jack = 0.909, RMSEPjack = 2.373 C, mean biasjack = 0.598 C, and max biasjack = 3.158 C. Both models were used to infer past temperatures from a ca. 3000-year record from the tropical Andes of Ecuador, Laguna Pindo. Inferred temperatures fluctuated around modern-day conditions but showed significant departures at certain intervals (ca. 1600 cal yr BP; ca. 3000–2500 cal yr BP). Both methods (WA and Bayesian) showed similar patterns of temperature variability; however, the magnitude of fluctuations differed. In general the WA method was more variable and often underestimated Holocene temperatures (by ca.−7± 2.5 C relative to the modern period). The Bayesian method provided temperature anomaly estimates for cool periods that lay within the expected range of the Holocene (ca.−3± 3.4 C). The error associated with both reconstructions is consistent with a constant temperature of 20 C for the past 3000 years. We would caution, however, against an over-interpretation at this stage. The reconstruction can only currently be deemed qualitative and requires more research before quantitative estimates can be generated with confidence. Increasing the number, and spread, of lakes in the calibration data set would enable the detection of smaller climate signals.


Introduction
Holocene climate variability (11.7 kcal yr BP -present) offers the most recent opportunity to parameterize climate and ecosystem responses to natural forcing under current boundary conditions in the absence of intense anthropogenic activity (Mayewski et al., 2004;Oldfield and Steffen, 2014). Furthermore, quantitative estimates of past climate over long timescales (> 1000 years) are vital to improving the reliability of modelling and the prediction of present and future climate variability (Mayewski et al., 2004). The spatial distribution of palaeoclimate records, however, is currently uneven around the world. Quantitative reconstructions of past climate are common from mid-to high latitudes of both hemispheres, but data are much scarcer from low-latitude 1264 F. Matthews-Bird et al.: Inferring late-Holocene climate in the Ecuadorian Andes (tropical) regions (Jansen et al., 2007). Tropical climate is the dominant driver of atmospheric circulation (Ivanochko et al., 2005) and the source of intermittent phenomena, such as the El Niño-Southern Oscillation (ENSO), which has a global influence on climate (Collins et al., 2010). Quantitative estimates of past climate from the low-latitude tropics, therefore, are crucial for investigating not only regional climate processes but also teleconnections on long timescales (> 1000 years; Garreaud et al., 2009;Jomelli et al., 2009;Vuille et al., 2000). Here we develop the first chironomidbased temperature inference model for tropical South America. The model is applied to a Holocene lake sediment sequence to generate a chironomid-inferred temperature reconstruction from the tropical east Andean flank.
Chironomidae (non-biting midges) are a family of twowinged aquatic insects of the order Diptera. The family is globally distributed and one of the most diverse within aquatic ecosystems (Armitage et al., 1995). Many species are stenotopic, and their short life cycles and ability to colonize favourable regions quickly means that the insects are extremely sensitive to environmental change (Pinder, 1986). The head capsules of chironomid larvae are well preserved in lake sediments and have been used extensively as palaeoecological proxies (Brooks, 2006;Walker and Cwynar, 2006). Chironomid-based temperature inference models, derived from modern calibration data sets, have been applied across North America (reviewed in Walker and Cwynar, 2006), and Eurasia (reviewed in Brooks, 2006), and more recently the method has been applied in the Southern Hemisphere in Patagonia (Massaferro and Larocque, 2013;Massaferro et al., 2014), Central America (Wu et al., 2014), east Africa (Eggermont et al., 2010), and Australasia (Dimitriadis and Cranston, 2001;Woodward and Shulmeister, 2006).
Transfer functions make a number of underlying assumptions; in particular, the environmental variable to be reconstructed is an ecologically important determinant in the system, and environmental variables other than the one being reconstructed have a negligible effect on species assemblages (Juggins, 2013). Rarely are ecological systems as simple as transfer functions would imply, and violations of these assumptions will undermine the validity of the environmental reconstruction (Juggins, 2013). Nevertheless, despite known inherent problems associated with transfer functions (Huntley, 2012;Juggins, 2013;Velle et al., 2010), quantitative reconstructions from chironomid assemblages often produce consistent results that compare well with other proxy estimates of past temperature (Brooks, 2000;Brooks et al., 2012;Heiri et al., 2007). The best-performing inference models can reconstruct temperatures with errors of ca. 1 • C Eggermont et al., 2010;Heiri et al., 2003;Olander et al., 1999;Rees et al., 2008;Self et al., 2011), providing high-resolution insights into past changes in climate (Brooks and Langdon, 2014) and the validation of climate models (Heiri et al., 2014).

Holocene climate variability in tropical South America
The most notable feature of current South American climate is the annual migration of the Intertropical Convergence Zone (ITCZ), which affects rainfall patterns across the tropical Andes (Garreaud et al., 2009;Hastenrath, 2012). On Holocene timescales, however, there remain large uncertainties regarding the patterns and processes of climate change in the Andes with evidence for both rapid (ca. 100-1000 years) precipitation (Haug et al., 2001) and temperature (Thompson et al., 2006;Wanner et al., 2008) change. A further point to note is the spatial heterogeneity of Holocene climate variability in the tropical Andes (Baker and Fritz, 2015), particularly regarding precipitation. Ice core records from the Peruvian and Bolivian Andes since ca. 5400 cal yr BP suggest that the overall trend is towards a drier climate with high-amplitude fluctuations and periods of significant aridity. Precipitation reached a minimum during the period between 3800-2800 cal yr BP and the LIA (Haug et al., 2001;Thompson et al., 1986Thompson et al., , 1995. Speleothem records from the Central Andes of Peru contradict this, however, and indicate instead that from the 15th to 18th century precipitation was on average about 10 % higher than in the present day (Reuter et al., 2009). The mid-to late Holocene (ca. 6000 cal yr BP to present) is a period of cooling climate in South America. Pollen evidence suggests that montane vegetation replaced Andean forest taxa as the treeline was lowered with modern vegetation patterns becoming established by ca. 3000 cal yr BP (Markgraf, 1989). Long-term cooling in the late Holocene culminated in a minimum during the 17th and 18th centuries, coinciding with evidence for a precipitation minimum during the LIA in northern South America (Haug et al., 2001;Thompson et al., 1986Thompson et al., , 1995. Further south, Patagonian proxy records infer periods that were wet and cold enough to allow glacial advance (Meyer and Wagner, 2008). In the South American tropics, where the relationship between changes in temperature and precipitation are complex (Baker et al., 2001;Garreaud et al., 2009), more independent quantitative estimates of past temperature are needed in order to resolve climate patterns over the tropical Andes during the Holocene.

Aims
In this study, we have developed the first chironomid-based temperature calibration data set from the tropical Andes (0 to 17 • S). Surface sediment samples from 59 lakes along the eastern flank of the Andes to Amazonia are analysed. Two approaches are used to develop the inference model, the widely used weighted averaging method (Brooks and Birks, 2000) and a Bayesian approach (Holden et al., 2008) which has rarely been used before. The models are applied to fossil chironomid assemblages in a late-Holocene lake sediment record from Laguna Pindo, central Ecuador, to reconstruct mean annual temperature (MAT) changes over the past ca. 3000 years.

Modern calibration data set
Surface sediments were collected from 59 lakes across Bolivia (15 lakes), Peru (32 lakes), and Ecuador (12 lakes) between 2004 and 2013 over an altitudinal gradient from 150 m above sea level (a.s.l) to 4655 m a.s.l, between 0-17 • S and 64-78 • W (Fig. 1). The study sites cover an MAT of 25 • C; the coldest lake in the data set is 0.8 • C MAT and the warmest is 25 • C MAT (Table 1). The deepest lake is 25 m and the shallowest is 0.1 m; mean water depth of all the study sites is 5 m. Cold, high-elevation lakes are more common within the calibration data set and there are no lakes between 16 and 20 • C. Sediment samples used in this study were taken from the uppermost centimetre (0-1 cm), which represents the most recent deposits (approx. 5-20 years) (Frey, 1988) and is therefore most comparable with the available climate data for calibration.

Fossil chironomid record
Laguna Pindo is a small shallow lake on the eastern flank of the Ecuadorian Andes (1 • 27.132 S; 78 • 04.847 W) (Fig. 1). The site is located at an elevation of 1248 m a.s.l. MAT is ca. 20 • C with little seasonal variation, and mean annual pre- cipitation (MAP) can reach ca. 4000 mm per year (Hijmans et al., 2005). Currently the lake is not directly fed by a stream inflow and has no visible stream outflow; the lake receives water from surface run-off and direct precipitation. There are no obvious geomorphological causes for the escarpment of the lake and we hypothesize that it is tectonic in origin.
At the time of field work (January 2013), maximum water depth was ca. 1 m; the lake is heavily overgrown with aquatic macrophytes, making a detailed bathometric survey difficult. A sedimentary sequence 929 cm long was extracted using a cam-modified piston Livingston corer (Colinvaux et al., 1999) from the centre of the lake to minimize the chance of encountering a sedimentary gap caused by any periods of lake area reductions. Sediments were recovered in aluminium tubes and sealed on site before being transported to the UK and stored at ca. 4 • C. A total of six samples was analysed for 14 C radiocarbon using accelerator mass spectrometry (AMS) dating at the Scottish Universities Environmental Research Centre (SUERC) radiocarbon facility, East Kilbride (Table 2). An age-depth model was created using version 2.2 of the statistical package clam.R (Blaauw, 2010) and the Southern Hemisphere calibration curve SHCal13.14C (Hogg et al., 2013) (Fig. 2). The sampling interval for chironomid analysis was not uniform due to a varied sedimentation rate. To achieve as even a coverage as possible over the time interval, samples were taken between every 10 and 20 cm.

Chironomid analysis
Chironomid preparation and identification from both lake surface and core sediments followed standard methods as described by Brooks et al. (2007). The wet sediment was deflocculated in 10 % KOH for 2 min at 75 • C. The sediment was then washed through 212 and 90 µm sieves with water. Chironomids were picked from the residues in a Bogorov counting tray using a stereomicroscope at 25× magnification. Head capsules were mounted in Euparal, ventral side up, and identified to the highest possible taxonomic resolution under a compound light microscope at 200-400× magnifications with reference to Wiederholm (1983), Epler (2001), Rieradevall and Brooks (2001), and Brooks et al. (2007), and local taxonomic works including Prat et al. (2011), andTrivinho-Strixino (2011). Some taxa could not be formally identified and so were given informal names. Images and descriptions of informally named taxa are provided in Matthews-Bird et al. (2015).

Environmental variables
Environmental variables (depth, pH, conductivity, and water temperature) were measured at each lake in the field. Organic content of the sediment was established through loss on ignition following standard methods as described by . Climate data (MAT, MAP) were obtained from high-resolution, interpolated climate surfaces (Hijmans et al., 2005). A summary of all the environmental variables measured can be found in Table 1.

Exploratory statistics
Detrended correspondence analysis (DCA) was initially used as an indirect ordination method to assess the gradient lengths in compositional units of taxon turnover (Hill and Gauch, 1980). The gradient length of DCA axis 1 was 5.2 standard deviation units (SD), which suggests a unimodal response and that linear ordination methods were not appropriate (ter Braak, 1987). Canonical correspondence analysis (CCA) was used to explore the influence of the measured environmental variables on the distribution and abundance of taxa. Highly correlated variables were partialled-out by Figure 2. Sediment description, radiocarbon dates ( 14 C age) and age-depth models of Laguna Pindo. Key colour for sediment descriptions: black or dark brown -organic-rich sediments (peat and clay respectively); white -dark sandy intervals; green -greenish sandy clay, not compacted.
analysis of the variance of their regression coefficients indicated by their variance inflation factors (VIFs). Variables with high VIFs were systematically removed from the environmental variable data set until the remaining variables had VIFs below 20. Detrended canonical correspondence analysis (DCCA) was used to test how much of the variance in the assemblage data was explained by each individual explanatory variable. The ratio of λ 1 : λ 2 (i.e. the ratio of first constrained DCCA axis 1 and second unconstrained DCA axis 2) was used to assess the influence an explanatory variable has in describing the variance in the chironomid community assemblage and hence its predictive power (Juggins, 2013). All taxa were retained in the statistical analysis and rare taxa were down-weighted in the weighted average transfer function (down-weighting of rare species is implicit in the Bayesian approach). Multivariate analysis was carried out on square-root-transformed chironomid percentage data. Inference models were developed using two separate approaches. The first method relied on weighted averaging methods, a tried and tested technique well established in quantitative palaeoecology (Birks, 1998;Birks et al., 2012;ter Braak and Juggins, 1993;ter Braak and Looman, 1986). The second method uses a Bayesian approach, which in general has received less attention (Holden et al., 2008). There are a number of inherent problems associated with quantitative inference models (Huntley, 2012;Juggins, 2013;Velle et al., 2010), so the two independent methods were used to compare results and assess the strengths and weaknesses of each method.
The assemblage data were unimodal suggesting that transfer functions using weighted averaging partial least squares (WA-PLS) were appropriate (ter Braak and Juggins, 1993). Inference models were also developed using classical and inverse weighted averaging (WA) to compare performance. The optimal number of components was assessed using leave-one-out cross validation (jack knifing) and a minimum 5 % change in prediction error between components. Sample-specific errors for the inferred temperatures were obtained through bootstrapping 999 cycles.
Bayesian model selection was used to generate probability-weighted species response curves (SRCs) for each taxon in the calibration data set. Each taxon is assigned 8000 possible SRCs. Each of these SRCs has a probability weight based on its relative ability to describe the training data for that taxon. To perform a reconstruction, likelihood functions (temperature probability distributions) are derived from each taxon in a fossil sample, considering all 8000 SRCs. Combining the likelihood functions of all the taxa in the fossil sample derives the reconstruction. The power of the Bayesian approach is that it ascribes a probability distribution to the reconstruction, providing a reconstruction-specific uncertainty. An important benefit is that all taxa in the sample provide potentially useful information, even those with low counts that would be largely neglected in a weighted averaging approach. To illustrate, a few counts of a taxon with a narrow temperature tolerance may constrain the Bayesian reconstruction more than a very high count of a taxon with a broad tolerance.
Although the Bayesian model was developed for application to pH reconstructions from diatom assemblages, it is generally applicable whenever it is appropriate to assume a unimodal species response to an environmental gradient. The only modification required is the specification of appropriate priors. The a priori probability distribution for optimum temperature in the SRCs was assigned to be uniform in the range −4.2 to +30.8 • C (training set range ±5 • C). The a priori probability for SRC tolerance was assigned to be uniform in the range 2 to 10 • C. Other SRC priors were unchanged from those in Holden et al. (2008).
DCCA detrending by segments and non-linear rescaling and constrained by radiocarbon age was used to determine compositional turnover constrained within the stratigraphic sequence (Birks and Birks, 2008). The goodness of fit to temperature was evaluated by including the fossil chironomid samples passively in a CCA ordination space of the modern training set samples constrained by MAT. Fossil samples with a squared residual distance within the extreme 10 % of the modern calibration data set samples are considered as having a poor fit to temperature. The modern analogue technique was used to test whether fossil samples had good analogues within the modern calibration data set. Any fossil sample with a squared chord distance larger than the 95 % threshold of the calibration data set is considered to have no good modern analogues (Birks, 1998;Velle et al., Figure 3. Canonical correspondence analysis (CCA) of the calibration data set lakes and environmental variables with elevation, longitude, and latitude removed after variance inflation analysis. MAP: mean annual precipitation; MAT: mean annual temperature; WT: water temperature; LOI: loss on ignition. Grey circles denote calibration lakes; dark grey triangles mark species. 2005). Data were untransformed prior to analysing the dissimilarity using the modern analogue technique. The significance of the final reconstruction was tested by comparing the amount of variance in the fossil data explained by that reconstruction, compared with inferences produced by transfer functions trained on randomly generated environmental data (Telford and Birks, 2011a). In this case, 999 random environmental variables were generated in order to produce the null distribution.
In total, 55 chironomid taxa were identified in the 59 training set lakes. Chironomus anthracinus-type was the most widespread taxon, occurring over the entire temperature gradient (Fig. 4). Orthocladiinae are generally most abundant towards the cold end of the temperature gradient. Cricotopus/Paratrichocladius type III is the domi- nant taxon of the coldest lake and is not present in sites > 10 • C MAT. Figure 4 shows the weighted average and Bayesian optima and tolerance of each taxon ordered by lowest to highest optima as modelled in the weighted averaging approach. In general, the temperature optima predicted by each method are similar; however, Tanytarsus type II and Cricotopus/Paratrichocladius type VII have colder optima when modelled using a Bayesian approach. Cricotopus/Paratrichocladius type IV has the coldest temperature optimum (ca. 3.3 • C; Fig. 5). Few Chironominae were found at the cold end of the calibration data set, but, for example, Parachironomus and Tanytarsus type II were only found in lakes cooler than ca. 8 • C and had optima of ca. 7.5 and ca. 6.5 • C respectively. Paratanytarsus and Pseudosmittia are important components of the chironomid assemblage between 4 and 12 • C, forming > 50 % of the chironomid community in some lakes, and have optima of ca. 9.1 and 8.3 • C respectively. Tanytarsus type I, Micropsectra, and Einfeldia are dominant taxa at mid-temperatures between ca. 10 and 22 • C. The absence of lakes between ca. 16 and ca. 20 • C limits a complete understanding of the distribution of taxa occurring at these temperatures. DCCA analysis, constrained by MAT, indicates an assemblage shift across the temperature gradient of 2.2 SD units. The biggest change in assemblage composition occurs above 12 • C MAT (Fig. 4). Goeldichironomus, Cladotanytarsus, and Tanytarsus type III were only found in lakes with MAT warmer than ca. 22 • C. Tanypodinae were in greatest abundance at the warm end of the temperature gradient between ca. 10-26 • C, Procladius was the most common Tanypodinae. It occurred between ca. 10-26 • C and had an optimum of ca. 21 • C. Both methods (WA and Bayesian) produced similar performance statistics. The bestperforming model using conventional statistical methods was a WA (inverse) model (Table 4, Fig. 6) (R 2 jack = 0.890; RMSEP jack = 2.404( • C), RMSEP -root mean squared error of prediction; mean bias jack = −0.017( • C); max bias jack = 4.665( • C)). The Bayesian method produced a slightly higher-performing model with R 2 jack = 0.909, RMSEP jack = 2.373( • C), mean bias jack = 0.598( • C), and max bias jack = 3.158( • C).

Laguna Pindo fossil chironomids and dating
Chironomid remains were found only in the upper 416 cm of the 929 cm sequence of Laguna Pindo (Fig. 7). In total, 2489 individual chironomid head capsules were analysed. The entire assemblage was made up of 32 taxa in 26 genera and 4 subfamilies. Among the taxa identified, 17 were Chironomini, eight Orthocladiinae, and three Tanypodinae The best-fit age-depth model for Laguna Pindo was a smooth spline (Fig. 2). Due to the absence of chironomids at the bottom of the sequence, six radiocarbon samples were used for building the model with the total depth of the sediment considered being 461 cm ( Table 2). The sedimentation rate ranged between 0.03 and 0.5 cm yr −1 , with a sampling interval resolution of 97 years between samples on average (range from 27 to 196 years).

Palaeotemperature reconstruction
Both transfer functions (WA inverse and Bayesian) show similar patterns in the temperature reconstruction (Fig. 8). From 3000 to 2500 cal yr BP inferred temperatures are cold relative to the modern period (20.2 • C). The minimum WA inverse temperatures are much colder (13.5 • C ± 2.5) than the inferred Bayesian temperatures (17.5 • C ± 3.7) for the early section of the sequence. From 2400 to 1700 cal yr BP inferred temperatures from both methods oscillate around ca. 18-19 • C but remained depressed relative to the modern period. A notable feature of both reconstructions is the sudden drop in inferred temperatures at 1600 cal yr BP. Inferred temperatures fall by ca. 2 to 17.5 • C ± 2.7. This abrupt drop in temperature is short-lived in both reconstructions and temperatures return to previous values in the subsequent sample. From 1500 cal yr BP to the present, the chironomid-inferred temperatures stabilize and steadily rise. Peak temperatures for the entire record (21.9 • C ± 3.5) are inferred between 400 and 700 cal yr BP. Temperatures begin to cool from 400 cal yr BP in both reconstructions, reaching a minimum of ca. 17 • C ± 2.5 ca. 100 cal yr BP before rising rapidly to between 20-21 • C ± 2.5 in the most recent sediment sample. On average the Bayesian model infers warmer temperatures than the WA model.
The fossil samples of Laguna Pindo plot within the modern variation of chironomid assemblages when included passively in a CCA analysis of the calibration data set (Fig. 9). This suggests that the calibration data set is appropriate for the fossil sequence of Laguna Pindo. The fossil samples plot along the MAP gradient suggesting precipitation is an important variable controlling the variance in the fossil assemblages. The sites associated with high precipitation in the calibration data set are located in the same region of the Ecuadorian Andes as the fossil site. With a modern MAT of ca. 20 • C, however, Laguna Pindo is located in a region of the temperature gradient that is poorly covered in the calibration data set (Fig. 4). Seven taxa found in the Laguna Pindo sequence do not occur in any of the analysed calibration data set lakes. These include three unknown morphotypes, three Xestochironomus morphotypes, and Metriocnemus eurynotus-type. These taxa, however, never comprise more than 10 % of the chironomid assemblage of any one sample.
Fourteen of the fossil samples are considered to have a poor goodness of fit to temperature and all fossil samples are considered as having poor modern analogues in the calibration data set (Fig. 10). Although the modern analogue technique is not used to infer past temperatures the lack of modern analogues in the fossil assemblage is important when considering the reliability of any reconstruction.
DCCA constrained by radiocarbon age shows an abrupt change at 1475 cal yr BP between zones 3 and 4 and a turnover of 1.6 SD units over the whole sequence (Fig. 10). Much of the variation in goodness of fit and DCCA sample scores is mirrored by changes in count size and head capsule concentration. The sudden drop in head capsule concentration occurs at a step change in DCCA assemblage variation (1475 cal yr BP) (Fig. 10). Periods of increased count size and head capsule concentration in older sediments (2100-2250 cal yr BP) also coincide with periods of improved goodness of fit (Fig. 10). The WA inferred MAT values using the modern calibration data set explain more of the variance than 95 % of randomly generated variables, and so the WA MAT reconstructions can be deemed statistically significant (p = 0.032) (Fig. 11) (Telford and Birks, 2011a).

Chironomids and environmental variables
Chironomids have been shown to respond to temperature on a variety of spatial scales and taxonomic levels (Brooks, Figure 5. Weighted average and Bayesian optima (solid grey circles) and tolerances (thick lines) of the 55 chironomid taxa included in the calibration data set, and MAT range (dashed lines). Taxa are organized by WA temperature optima from cold to warm. Table 4. Summary of the performance statistics of chironomid-based MAT( • C) inference models developed using classical and Bayesian methods based on leave-one-out cross validation and allowing weighted averaging inverse and classical (WAinv, WAcla), with and without tolerance down-weighting (TOL), weighted averaging partial least squares (WA-PLS), coefficient of determinant between predicted and observed (r 2 jack ), and root mean squared error of prediction (RMSEP jack ) as percentage of the gradient. 2006; Eggermont and Heiri, 2011). Temperature is a key variable in controlling chironomid development at all stages of their life cycles and influences voltinism, behaviour, and metabolism (Armitage et al., 1995). Across the Northern Hemisphere, over large temperature gradients, mean July air temperature has been shown to be the major determinant of variation in chironomid assemblages (Brooks, 2006;Walker and Cwynar, 2006). July is the warmest month of the year, which reflects the developmental period of most species. As a result, many quantitative temperature inference models have been developed to reconstruct mean July air temperature. Across the tropics, however, seasonal variation is small and many chironomids are multivoltine (Walker and Mathews, 1987) so temperatures throughout the year are likely to be relatively more influential. In tropical east Africa, Eggermont et al. (2010) demonstrated that mean annual air temperature was a significant driver of chironomid assemblage composition and developed a chironomid-based inference model on this basis. Similarly, Wu et al. (2014) showed MAT to be the most important environmental variable when developing a chironomid inference model for Central America. When attempting to make quantitative inferences from fossil assemblages, it is first crucial to establish that the variable of interest is an important ecological determinant. The variable to be reconstructed must describe a statistically important component of the variance within the assemblage data (Juggins, 2013). Compared to other measured variables, mean annual temperature explained the largest amount of chirono-mid assemblage variance and had the highest eigenvalue ratio (λ 1 : λ 2 ) in the Andean calibration data set (Table 3). The explanatory strength of temperature in the calibration data set meets the minimum criterion proposed by Juggins (2013) (i.e. λ 1 : λ 2 > 1.0) for temperature being a suitable variable to reconstruct from this calibration data set.

Model
The DCCA results suggest that precipitation is also a strong ecological determinant (λ 1 : λ 2 = 0.9); the passive plot of fossil samples with calibration samples further supports this conclusion. The fossil samples of Laguna Pindo are strongly associated with MAP. Precipitation in Andean landscapes, however, is spatially heterogeneous and geographically close localities experience significantly different rainfall patterns (Garreaud et al., 2009). Lakes associated with high rainfall (Fig. 3) are actually in areas of the northern Andes with two rainy seasons a year. It is very likely that the bimodality of rainfall in these areas is as important in controlling chironomid populations as the total amount of rainfall as measured by MAP. Precipitation is also intrinsically linked to temperature as both temperature and precipitation increase with decreasing latitude in tropical South America (Garreaud et al., 2009). Unlike temperature, precipitation affects chironomids indirectly making any quantitative inference difficult. Precipitation will alter a suite of environmental variables (e.g. pH, conductivity, depth, substrate) making quantitative inferences of precipitation problematic. As chironomid life cycles are strongly controlled by temperature and many tropical chironomid species tend to be multivoltine, we suggest that the most appropriate variable both ecologically and statistically to reconstruct using the Andean calibration data sets is MAT although the of influence of precipitation cannot be overlooked.
The optima and temperature tolerances (Fig. 5) of many taxa found in the current study are similar to that noted in other Neotropical chironomid calibration data sets, further supporting the conclusion of temperature being an important ecological determinant. For example, Wu et al. (2014) in Central America found taxa of the genera Beardius, Labrundinia, and Goeldichironomus to have optima between 23-24 • C, whilst Limnophyes and Corynoneura were more abundant at the colder end of the gradient with optima of 15 and 18 • C respectively. In the current data set Beardius, Labrundinia, and Goeldichironomus all have optima between 23 and 24 • C, and Limnophyes and taxa of Corynoneura also have optima of 15 and 19 • C respectively. Limnophyes also has one of the broadest tolerances of all taxa in both calibration data sets, suggesting that the genus is probably represented by many species (Matthews-Bird et al., 2015). More work is needed in order to refine chironomid larval taxonomy in South America; however, the current data suggest the potential for a larger calibration data set applicable to a wider area, incorporating the northern Neotropics and Central America.

Model performance
Although both models (WA inverse and Bayesian) perform well (WA: RMSEP = 2.4 • C/9.6 % of training set range; Bayesian: RMSEP= 2.3 • C / 9.2% of training set range), some of the best-performing chironomid-based temperature inference models have prediction errors closer to 1.0 • C Heiri et al., 2007Heiri et al., , 2011Olander et al., 1999). The highest-performing chironomid inference models often have in excess of 100-150 calibration sites compared with just 59 in the current model, and this may account for its reduced performance. Furthermore, the lakes in the calibration data set are not evenly distributed over the temperature gradient. The cold end of the gradient has a higher number of lakes (34 cold, high-elevation lakes) than at warm and intermediate temperatures (15 warm, midto low-elevation lakes). Uneven sampling has been shown to lead to biases which may reduce RMSEP (Telford and Birks, 2011b). Furthermore, the over-representation of cold lakes in Figure 9. Distribution of Laguna Pindo fossil samples (black circles) included passively within a CCA of the calibration data set lakes (grey circles) constrained using the significant environmental variables. MAP: mean annual precipitation; MAT: mean annual temperature; WT: water temperature. The first and last fossil sample in the sedimentary sequence has been labelled (total sediment depth); there are no directional trends through time. Calibration lakes that lie at similar elevations as Laguna Pindo have been labelled.
the current data set may result in an underestimation of the temperature optima of some taxa and, therefore, bias temperature estimates towards cold values. In the Andean data set, as analysis of residuals shows, temperatures around 10 • C are often underestimated (Fig. 6). Furthermore, the inferred temperatures of Laguna Pindo are on average cooler than the modern-day conditions.
The absence of lakes in part of the temperature gradient may limit the reliability of estimates of optima and tolerances of taxa and also create "edge effects" in the middle of the temperature range, in addition to those that occur at the cold and warm end of the temperature gradient (Eggermont et al., 2010). Such problems are inherent to WA models as predicted values are pulled towards the mean of the training set resulting in under-and overestimations of high and low values (ter Braak and Juggins, 1993). However, despite having no lakes between 16-20 • C in the calibration data set, additional edge effects are not a feature of the current inference model. The gap of ca. 4 • C does not appear to have compromised model performance, probably as the interval is not significant and taxa have tolerances that span these temperatures.
Chironomid larval head capsule concentrations can vary significantly between lakes, due to differences in preservation or abundance. Low counts can have adverse effects on the performance of inference models and the reliability of quantitative environmental reconstructions when using conventional methods Quinlan and Smol, 2001). A minimum count size of 50 head capsules per sample is advised Quinlan and Smol, 2001); however, good model performance has been achieved even when several samples include as few as 15-30 head capsules (Massaferro et al., 2014). In some lakes in the current training set, head capsule concentrations were as low as two head capsules per gram of sediment. Fifteen lakes in the data set produced fewer than 50 head capsules, and three lakes had fewer than 30. On average 77 individuals were analysed from each lake, with a minimum count of 23 and a maximum of 164 (Table 1). Lakes with low head capsule counts were retained in the model in order to maintain as even a coverage as possible across the temperature gradient.
Polypedilum nubifer-type and Chironomus anthracinustype make up a large component of the chironomid assemblages in lakes across the entire temperature gradient (Fig. 4). Such eurythermic taxa probably include several different species. It is difficult to model reliable, or even meaningful, optima for eurythermic taxa. Poor model performance or unreliable reconstructions may result if the assemblage is dominated by eurythermic taxa. We note that eurythermic taxa are described by high-tolerance SRCs in the Bayesian approach, leading to increased uncertainty in reconstructions through broad-likelihood functions that contribute little information to the posterior. Inferred temperatures of ca. 10 • C are likely to be underestimated as many taxa found at these temperatures also occur in cold lakes, which are over-represented in the calibration data set. In African lakes, Eggermont et al. (2010) found that the presence of eurythermic taxa such as Chironomus type Kibos caused an overestimation of temperatures in lakes at the warm end of the gradient. They also found that the occurrence of Limnophyes minimus-type and Paraphaenocladius type OI Bolossat overestimated the temperature of lakes close to where gaps occurred in the gradient (Eggermont et al., 2010). Similarly, in a New Zealand calibration data set developed by Woodward and Shulmeister (2006), Chironomus was present in both high-elevation, cold, oligotrophic lakes and lower-elevation, warm, eutrophic lakes. The intermediate temperature optimum estimated for this taxon resulted in overestimated temperatures of cold lakes and underestimates of warm lakes (Woodward and Shulmeister, 2006). Eurythermic taxa may be contributing to the overestimation of cold temperatures and the underestimation of temperatures in the middle of the gradient in the Andean inference model.

WA vs. Bayesian
Despite similar performance statistics between the Bayesian and WA methods, the inferred pattern of late-Holocene temperature change is different. Temperatures inferred ca. 2700 cal yr BP (400 cm) (Fig. 8) using the WA inverse method are extremely cold (ca. 14 • C) compared with the rest of the record. This reconstruction is driven by the high abun- Figure 10. Left to right: chironomid-inferred WA inverse MAT with sample-specific errors generated using bootstrapping. Bayesian reconstruction with sample-specific errors. Goodness of fit of the fossil assemblages to temperature: vertical dotted line indicates the 90th percentile of squared residual distances of modern samples to first axis in a CCA; samples to the right of the line have a poor fit to temperature. Nearest modern analogue analysis: vertical dotted line indicates the 5th percentile of squared chord distances of the fossil samples in the modern calibration data set; samples to the right of the line have no good modern analogues. Detrended canonical correspondence analysis (DCCA) sample scores, with radiocarbon age used as the sole constraining variable. Head capsule concentration per gram of sediment. Zones are derived from optimal partitioning of fossil assemblages using a broken-stick model to define significant zones. Sq res dis: square residual distance; Sq chrd dis: square chord distance; SD units: standard deviation units; hc gram −1 : head capsule per gram of sediment. dance of Tanytarsus type II, a taxon that has a WA temperature optimum of 6.5 • C. The Bayesian reconstruction for this sample of 17.8 ± 2.8 • C is in line with more modest temperature shifts that would be expected in the late Holocene (Wanner et al., 2008). One advantage of the Bayesian methodology is the transparency of the reconstruction through the consideration of individual likelihood functions for this assemblage (Fig. 12). Although Tanytarsus type II is abundant in the sample, its influence in the reconstruction is moderated by several other taxa with higher temperature optima that are present at low abundances. This temperature estimate demonstrates that the Bayesian reconstruction can be sensitive to a few counts of a species that have a negligible effect in a WA approach. The likelihood function for Chironomini type II, which has an abundance of only 2.3 % in the sample, constrains the reconstruction more than Tanytarsus type II, which has an abundance of 74 %. This is because Chironomini type II is only found in the warmest lakes in the calibration set, each time with a low abundance. We note that because it is found in only three training set sites, Chironomini type II is associated with many (671) high-probability SRCs, defined as having a probability great that 10 % of the most likely SRC. For this reason, its likelihood function is relatively broad and extends to temperatures far lower than the temperature of the sites in which the taxon is found in the training set.

Temperature and secondary environmental variables
Whilst the λ1 / λ2 of 1.431 indicates that MAT is appropriate for reconstruction using this calibration data set (Juggins, 2013), it does not necessarily mean that reliable temperature reconstructions can be obtained from a fossil record (Telford and Birks, 2011a). Before attempting to interpret any reconstruction several metrics can be used to assess the validity of a reconstruction (Juggins and Telford, 2012). The modern analogue technique compares the similarity of the fossil samples to the modern samples in the calibration data set. All fossil samples are greater than the 5th percentile of the square chord distance (Fig. 10), which suggests there is no close modern analogue in the calibration set to any fossil sample (Birks, 1998;Juggins and Birks, 2001). The lack of modern analogues in the Laguna Pindo fossil sequence is due to the many taxa present in the fossil samples that are not present in the calibration data set. This may reflect the lack of lakes in the calibration data set with MAT values close to those of Laguna Pindo. Nevertheless, WA and WA-PLS models have been shown to perform well in non-analogue situations . The Bayesian method generates temperature reconstructions from likelihood functions of species in the calibration data set. Although analogous assemblages are not required for the Bayesian reconstruction (each taxon is treated equally and individually), species that are absent from the training set cannot contribute information to the posterior, thereby increasing the uncertainty associated with the reconstruction. One advantage of the Bayesian methodology is that this uncertainty is explicitly incorporated into the Bayesian reconstruction (Holden et al., 2008).
During periods of poor fit to temperature, variables other than temperature may have been affecting the composition of the chironomid assemblage. As noted previously, the CCA biplot of fossil samples included passively with the significant explanatory variables (Fig. 9) shows that MAP was also important in driving the assemblage variance. During times of poor fit to temperature the influence of precipitation as a secondary variable may be more important than temperature in influencing the chironomid assemblage composition. Indeed, precipitation has been shown to be an important variable in controlling the modern distribution of chironomid taxa in the tropical Andes (Matthews-Bird et al., 2015).
Samples with poor fit to temperature also corresponded to samples having low numbers of head capsules. The number of head capsules retrieved will directly affect how representative a sample is of the chironomid fauna (Heiri, 2004;Quinlan and Smol, 2001). The cold oscillations inferred from the Bayesian reconstruction are more in line with what is expected during the late Holocene (1-3 • C); the likelihood functions of rare species, which favour warm conditions, combine to rule out the anomalously cold temperatures suggested by some of the WA reconstructions. As discussed above, the over-representation of cold lakes in the calibration data set will likely bias species optima to colder values in a weighted average approach, so there may be a tendency for the model to underestimate temperature, especially during cold periods. This problem is likely exaggerated when head capsule concentration is low; cold indicator taxa may have higher abundances than would be the case if all taxa were accurately represented.
The DCCA results indicate that there was a distinct change in the composition of the chironomid assemblage after 1600 cal yr BP (210 cm). This largely coincides with an increase in head capsule concentration, possibly indicating an increase in lake productivity, and the shift in chironomid-inferred temperatures from low to high. Indeed post 1600 cal yr BP, (210 cm) samples are inferred as being on average 2-3 • C warmer than early sections using Bayesian and WA models respectively.
Although the temperature reconstruction has a good ecological basis, because chironomids globally are highly sensitive to temperature and Laguna Pindo is on an ecotonal boundary that is sensitive to temperature changes, precipitation is influential as a secondary variable. The WA inverse MAT reconstruction, however, is statistically significant based on the criteria described by Telford and Birks (2011a) (Fig. 11), suggesting that despite precipitation as a possible confounding variable, a temperature signal can be obtained from Neotropical chironomids. We would caution, however, against an over-interpretation at this stage. Due to some of the limitations discussed previously, the reconstruction can only currently be deemed qualitative and requires more re- Figure 12. Individual likelihood functions for the fossil taxa in the coldest sample of the Laguna Pindo sequence (396 cm total depth, ca. 2700 cal yr BP). The posterior probability distribution for temperature for the fossil sample is plotted in red; note that this is plotted on an independent axis. search before quantitative estimates can be generated with confidence.

Conclusions
The chironomid fauna of the tropical Andes have been shown to be sensitive to climate variables, particularly temperature and precipitation. Both variables (MAT and MAP) meet the basic criteria for being used in an environmental reconstruction using the Andean calibration data set. MAT, however, is an important determinant of chironomid species distribution and abundance and was therefore more appropriate to be reconstructed. The influence of precipitation should be explored further and must be considered as an important secondary variable, especially when reconstructing past conditions in the region. It is very likely that the influence of precipitation noted here relates to the annual variability in rainfall across the Andes as opposed to the overall amount, making any quantitative interpretations even more difficult.
The two techniques used to develop inference models (WA and Bayesian) show comparable performance statistics (WA inverse model: R 2 jack = 0.890, RMSEP jack = 2.404( • C), mean bias jack = −0.017( • C), max bias jack = 4.665( • C); Bayesian model: R 2 jack = 0.909, RMSEP jack = 2.373( • C), mean bias jack = 0.598( • C), max bias jack = 3.158( • C)). This work demonstrates a proof of method; however, a larger calibration data set with a more even coverage of calibration sites is needed in order to improve model performance. The Bayesian approach provided a transparent reconstruction less susceptible to the effect of an uneven distribution of calibration sites and performed particularly well during periods of low count size and when inferring cold intervals. The chironomid-based MAT reconstruction from the Laguna Pindo is often colder than would be expected for Holocene timescales. The underestimated temperatures are most likely the direct result of an over-representation of cold lakes in the calibration data set. The addition of more calibration sites between 12 and 20 • C would expand our understanding of tropical Andean chironomid distribution and significantly improve model performance and reconstruction reliability.
Knowledge of past tropical climate dynamics is fundamental not only to understanding regional climate but also global climate patterns and hemispherical teleconnections. Quantitative temperature proxies, such as chironomids, will provide valuable data on past climate variability in the region. The reconstructions presented here demonstrate the potential of the proxy and also highlight the complexity of late-Holocene climate change in tropical South America.

Information about the Supplement
The Supplement includes the tropical Andean calibration data set and the fossil counts of Laguna Pindo.
The Supplement related to this article is available online at doi:10.5194/cp-12-1263-2016-supplement.