Examining bias in pollen-based quantitative climate reconstructions induced by human impact on vegetation in China

Human impact is a well-known confounder in pollen-based quantitative climate reconstructions as most terrestrial ecosystems have been artificially affected to varying degrees. In this paper, we use a “human-induced” pollen dataset (H-set) and a corresponding “natural” pollen dataset (N-set) to establish pollen–climate calibration sets for temperate eastern China (TEC). The two calibration sets, taking a weighted averaging partial least squares (WA-PLS) approach, are used to reconstruct past climate variables from a fossil record, which is located at the margin of the East Asian summer monsoon in north-central China and covers the late glacial Holocene from 14.7 ka BP (thousands of years before AD 1950). Ordination results suggest that mean annual precipitation (Pann) is the main explanatory variable of both pollen composition and percentage distributions in both datasets. The Pann reconstructions, based on the two calibration sets, demonstrate consistently similar patterns and general trends, suggesting a relatively strong climate impact on the regional vegetation and pollen spectra. However, our results also indicate that the human impact may obscure climate signals derived from fossil pollen assemblages. In a test with modern climate and pollen data, the Pann influence on pollen distribution decreases in the H-set, while the human influence index (HII) rises. Moreover, the relatively strong human impact reduces woody pollen taxa abundances, particularly in the subhumid forested areas. Consequently, this shifts their model-inferred Pann optima to the arid end of the gradient compared to Pann tolerances in the natural dataset and further produces distinct deviations when the total tree pollen percentages are high (i.e. about 40 % for the Gonghai area) in the fossil sequence. In summary, the calibration set with human impact used in our experiment can produce a reliable general pattern of past climate, but the human impact on vegetation affects the pollen–climate relationship and biases the pollen-based climate reconstruction. The extent of human-induced bias may be rather small for the entire late glacial and early Holocene interval when we use a reference set called natural. Nevertheless, this potential bias should be kept in mind when conducting quantitative reconstructions, especially for the recent 2 or 3 millennia.

bias may appear in pollen-based quantitative climate reconstructions using Chinese pollen data.
In the past two decades, a number of modern pollen studies have been conducted in China to investigate regional pollenvegetation-climate relationships (Herzschuh et al., 2010;Li et al., 2009;Lu et al., 2011;Luo et al., 2009;Shen et al., 2006;Xu 25 et al., 2007;Zhang et al., 2012;Zheng et al., 2008) and human impact on vegetation (Ding et al., 2011;Liu et al., 2006;Pang et al., 2011;Wang et al., 2009;Yang et al., 2012;Zhang et al., 2014;Zhang et al., 2010). At the same time, representative modern reference datasets (Cao et al., 2014;Xu et al., 2010a;Yu et al., 2000;Zheng et al., 2008;Zheng et al., 2014) and fossil pollen datasets (Cao et al., 2013;Ren and Beug, 2002;Sun et al., 1999) have been assembled, which make it possible to the main determinant in the dataset; otherwise, the reconstruction of the variable should be conducted with caution (Juggins, 2013). HII was also analysed in the same way to evaluate the human impact on the pollen data.
The WA-PLS approach (ter Braak and Juggins, 1993) has been tested, along with other statistical techniques, for eastern China data and demonstrated to give better results (Cao et al., 2014;Xu et al., 2010a) due to its generally good performance under non-analogue situations and ability to cope with spatial autocorrelation (Cao et al., 2014;Juggins and Birks, 2012). The optimal 5 number of WA-PLS components was selected using a randomisation t-test (van der Voet, 1994). Low root mean squared error of prediction (RMSEP), low average and maximum biases, a high coefficient of determination (R 2 ) between the predicted and observed climate values, and a rule-of-thumb threshold of 5% (reduction in RMSEP for adding a component) were all considered when selecting a model (Birks, 1998;Birks et al., 2010;Juggins and Birks, 2012).
The significance of the obtained reconstructions was also tested. The proportion of variance in the fossil sequence explained 10 by 999 transfer functions trained with random data was calculated from a constrained ordination (Telford and Birks, 2011).
To help understand the bias mechanism of human impact on pollen assemblages, we estimated the weighted average (WA) optima and tolerances (Birks et al., 1990;ter Braak and Looman, 1986) of selected climate variables for major taxa. The five closest modern analogues for each fossil sample were calculated using MAT (Simpson, 2007). The mean HII value at the analogue location site was used to examine the potential human influence on analogue samples, and further, to evaluate the 15 bias in climate reconstruction for that fossil sample. All numerical analyses were performed using vegan version 2.

Relationship between modern pollen and climate 20
Ordinations are based on square-root transformed pollen data of 99 taxa in the N-set and 93 taxa in the H-set after noise reduction. DCA showed that the length of the first axis is 2.65 SD (standard deviation units) in the N-set and 2.36 SD in the H-set, suggesting that linear ordination techniques (e.g. RDA) are appropriate to present the distribution of pollen taxa along the climate gradients in our datasets. When using each of the climatic variables as a sole predictor, Pann explains 20.56% (highest) of the pollen assemblage variance in the N-set, while the thermal variables have much lower explanatory power (Tann: 25 2.83%, Mtco: 3.49%, Mtwa: 6.35%). For the H-set, Pann explains 6.31%, which is slightly less than Mtwa (6.62%). If we assess the marginal contribution of a variable after partialling out the interaction effect of other variables in an RDA, Pann explains the highest amount of variance in both the N-set (10.56%) and the H-set (5.85%). HII explains more variance in the H-set (2.29%) than in the N-set (1.12%), and has a marginal contribution in both the H-set (0.55%) and the N-set (0.76%). Pann has the highest λ1/λ2 ratio in both the N-set (1.28) and the H-set (0.34); the λ1/λ2 ratios for all thermal variables and HII are much 30 less than one (Table 1). Our ordination results suggest Pann is the main determinant of pollen distribution in TEC, and Pann in the N-set is used to establish a standard calibration set. We then use the H-set to establish a contrasting pollen-Pann calibration set to compare the deviation in the reconstructions, and to see the extent of the potential bias induced from human impact on the modern pollen assemblages.

Test of the WA-PLS models 35
A 2-component WA-PLS model performed best with the lowest RMSEP and highest R 2 for the H-set, and a 3-component model for the N-set (Table 2)        the mean HII value of the five best analogues in the N-set (blue, with lower side standard deviation) and the H-set (red, with higher side standard deviation) for fossil samples. Six time windows (TWs), delineated according to the deviation pattern between the two reconstructions, are separated by grey dashed lines.