A Bayesian Hierarchical Model for Reconstructing Sea Levels: From Raw Data to Rates of Change

We present a holistic Bayesian hierarchical model for reconstructing the continuous and dynamic evolution of relative sea-level (RSL) change with fully quantified uncertainty. The reconstruction is produced from biological (foraminifera) and geochemical ({\delta}13C) sea-level indicators preserved in dated cores of salt-marsh sediment. Our model is comprised of three modules: (1) A Bayesian transfer function for the calibration of foraminifera into tidal elevation, which is flexible enough to formally accommodate additional proxies (in this case bulk-sediment {\delta}13C values); (2) A chronology developed from an existing Bchron age-depth model, and (3) An existing errors-in-variables integrated Gaussian process (EIV-IGP) model for estimating rates of sea-level change. We illustrate our approach using a case study of Common Era sea-level variability from New Jersey, U.S.A. We develop a new Bayesian transfer function (B-TF), with and without the {\delta}13C proxy and compare our results to those from a widely-used weighted-averaging transfer function (WA-TF). The formal incorporation of a second proxy into the B-TF model results in smaller vertical uncertainties and improved accuracy for reconstructed RSL. The vertical uncertainty from the multi-proxy B-TF is ~28% smaller on average compared to the WA-TF. When evaluated against historic tide-gauge measurements, the multi-proxy B-TF most accurately reconstructs the RSL changes observed in the instrumental record (MSE = 0.003). The holistic model provides a single, unifying framework for reconstructing and analysing sea level through time. This approach is suitable for reconstructing other paleoenvironmental variables using biological proxies.


Introduction
Paleoenvironmental reconstructions describe Earth's response to past climate changes and consequently offer a context for current trends and analogs for anticipated future changes (e.g., Mann et al., 2009). Reasoning by analogy underpins the use of biological proxies to reconstruct past environments (e.g., Rymer, 1978;Jackson and Williams, 2004;Bradley, 2015). The ecological preferences of biological assemblages observed in modern environments are used to derive a paleoenvironmental reconstruction from their counterparts preserved in dated sediment cores under the assumption that the ecological preferences were unchanged through time (Juggins and Birks, 2012). This approach commonly utilizes data consisting of one environmental variable and counts from multiple proxy species (e.g., Imbrie and Kipp, 1971;Fritz et al., 1991;Birks, 1995). Numerical techniques known as transfer functions formalize the relationship between biological assemblages and the environmental variable. This process is termed calibration. To quantify environmental change through time it is necessary to combine the paleoenvironmental reconstruction with a chronology of sediment deposition and an appropriate methodology to describe temporal trends. These three components can be developed and applied independently of one another or assimilated in a single, holistic framework.
Relative sea-level (RSL) reconstructions can constrain the relationship between temperature and sea level and reveal the long-term, equilibrium response of ice sheets to climate forcing (e.g., Dutton et al., 2015). Salt-marsh foraminifera are sea-level proxies, because species have different ecological preferences for the frequency and duration of tidal submergence, which is primarily a function of tidal elevation (e.g., Scott and Medioli, 1978;Edwards and Wright, 2015). Under conditions of RSL rise, salt marshes accumulate sediment to maintain an elevation in the tidal frame. The resulting sedimentary sequence is an archive of past RSL changes that may be accessed by collecting sediment cores. After extraction, these sediment cores are sliced into layers (samples), from which foraminifera are counted. The transfer functions commonly used to reconstruct RSL impose a single ecological response to tidal elevation on all species of foraminifera (or other biological groups such as diatoms). Other analyses performed on the same layers can provide a multi-proxy approach to reconstructing RSL, although this often relies on informal approaches to combine results from independent proxies (e.g., Kemp et al., 2013a;Gehrels, 2000). For example, on organogenic salt marshes on the U.S. Atlantic coast the primary source of organic carbon is in-situ plant material and measurements of bulk sediment δ 13 C reflect the dominant plant community (e.g. Kemp et al., 2012). Some sediment layers are dated using radiocarbon or recognition of pollution markers of known age. Since there are typically fewer dated layers than total layers, a statistical age-depth model is used to estimate the age of undated layers with uncertainty (e.g., Bronk Ramsey, 2008;Haslett and Parnell, 2008;Blaauw and Christen, 2011). Although Bayesian age-depth models and methods for estimating rates of sea-level change already exist, Bayesian methods are yet to be applied in the calibration phase of reconstructing RSL. This prevents the appropriate propagation of uncertainties, which is the primary advantage of using a holistic numerical framework.
We develop a Bayesian transfer function (B-TF) to reconstruct RSL using counts of foraminifera and measurements of bulk sediment δ 13 C from salt-marsh sediment. This model allows each species of foraminifera to have a different ecological response to tidal elevation and provides a formalized approach to combine multiple proxies and consequently reduce reconstruction uncertainty. Following the framework of  we combine this new calibration module with an existing chronology module (Bchron), and an existing process module (the Errors-In Variables Integrated Gaussian Process (EIV-IGP) model of Cahill et al., 2015) to create a holistic Bayesian hierarchical model. Through application of the model to a case study of Common Era and instrumental RSL change in New Jersey (USA), we compare the utility of the B-TF with an existing weighted averaging transfer function (WA-TF) approach and demonstrate the advantage of combining the three parts of a RSL reconstruction in a single and shared numerical framework rather than treating each as an independent and discrete step.

Previous calibration methods
Transfer functions are empirically-derived equations for reconstructing past environmental conditions from the abundance of multiple species. The term refers not to a single numerical method, but to a range of regression-based techniques that are classified into two categories depending on whether the underlying model maps environmental variables to species abundances (classical calibration) or vice versa (inverse calibration). Classical approaches are underpinned by the ecologically-intuitive assumption that the distribution of species is driven by environmental variables (Birks, 2012). Inverse approaches gained popularity because of their reduced computational complexity (e.g., Birks, 2010) resulting in quicker processing compared to classical methods.
Furthermore, inverse methods often demonstrate equal or superior performance when compared to classical approaches (e.g., ter Braak and Juggins, 1993;Korsman and Birks, 1996). The parameters in transfer functions are estimated using empirical data (a modern training set) from environments likely to be analogous to those encountered in core material (e.g., Juggins and Birks, 2012) and are treated as fixed and known. Studies seeking to reconstruct RSL from salt-marsh sediment employ transfer functions developed using a modern training set of paired observations of tidal elevation and microfossil assemblages (most commonly foraminifera or diatoms) to reconstruct RSL from their counterparts preserved in sediment cores (e.g., Gehrels, 2000;Kemp et al., 2013b;. Although the different types of transfer function have advantages and weaknesses compared to one another, these regression-based techniques share the limitations of applying a single response form to all species and treating model parameters as fixed and known. These characteristics can result in misleading or inaccurate paleoenvironmental reconstructions if the response curve is not appropriate for all species (Smith, 1983) and does not account for the inherent uncertainty in model parameters that results from ecological noise and the influence of secondary environmental variables, which in RSL reconstructions can include salinity and sediment texture and composition (e.g., Shennan et al., 1996;Zong and Horton, 1999).
Bayesian calibration methods are inherently classical and have recently been given growing attention to produce paleoenvironmental reconstructions using biological proxies (e.g., Vasko et al., 2000;Haslett et al., 2006;Li et al., 2010;Tingley et al., 2012;Tolwinski-Ward et al., 2013.  and Vasko et al. (2000) developed a Bayesian model to reconstruct temperature from chironomid counts. Haslett et al. (2006) adopted elements of the model proposed by  in a more complex Bayesian hierarchical model for reconstructing multivariate climate histories from pollen counts. Li et al. (2010) proposed a Bayesian hierarchical model to reconstruct temperature using a multi-proxy approach. Similarly, Tingley et al. (2012) considered a Bayesian hierarchical space-time model for inferring climate processes. More recently, Tolwinski-Ward et al. (2013 and  expanded on the aforementioned approaches of Haslett et al. (2006) and Tingley et al. (2012) for reconstructing climate variables. To date, Bayesian methods have not been used for reconstructing RSL using biological proxies.

A Bayesian hierarchical model for reconstructing and analysing former sea levels
We now describe our statistical model, which produces estimates of RSL and associated rates from raw inputs including foraminifera counts and radiocarbon dates from a sediment core. We add two major novelties to existing approaches: 1. A B-TF model using a penalized spline (P-spline) as a non-parametric model of the multinomial response of foraminifera to tidal elevation. This model allows for multi-modal and non-Gaussian species response to environmental variables; 2. A full hierarchical model which incorporates the B-TF, a chronology model accounting for time uncertainty, and a rich stochastic process for quantifying sea level rate changes.
We use the JAGS package (Just Another Gibbs Sampler; Plummer, 2003) to fit the model via Gibbs sampling.
We start by outlining our notation: • y are the observed foraminifera abundances from the sediment core. y il is the abundance of species l in layer i. We denote y i the L-vector of foraminifera counts for each layer i in the sediment core, where i = 1, . . . , N layers and l = 1, . . . , L species; • r are the observed radiocarbon dates in the sediment core. r k is the k th radiocarbon date, k = 1, . . . , K. Usually K N. Due to the nature of radiocarbon, these are given in radiocarbon years rather than calendar years. A known calibration curve is used to transform the radiocarbon ages into calendar ages as part of the chronology model (Sect. 3.2); • d are the observed depths in the sediment core. d i is the depth associated with layer i; • e is paleo marsh elevation (PME), which is the tidal elevation at which a layer originally accumulated. e i is the PME for sediment core layer i; • s is RSL. s has a deterministic relationship with e and d given some fixed parameters ω so that s = g ω (e, d). Producing s will require correcting PME for sample tidal elevation (a function of sediment core depth). ω includes values for the the sample tidal elevation (E) so that s i = E i − PME i . s i is the RSL for sediment core layer i; • t represents the calendar ages (in years before present (1950); BP) of all layers in the sediment core. It is unknown and estimated with uncertainty as part of the chronology module from the radiocarbon dates r and observed depths d. t i represents the age of sediment core layer i; • y m are the observed modern foraminifera counts. y m jl is the abundance of species l in surface sample j. y m j is an L-vector of modern foraminifera counts for modern sample j with j = 1, . . . , J modern samples. T j = ∑ L l=1 y m jl are the row totals of species counts for calibration sample j in the matrix of species abundances; • e m are the observed modern tidal elevations. e m j is the tidal elevation for surface sample j. Together y m and e m are used to calibrate the relationship between foraminifera abundance and tidal elevation; • z is the sediment core δ 13 C where z i is the δ 13 C for layer i. We include this as a secondary proxy though it is an optional part of the model and can be removed if unavailable in other sediment cores; • θ are a set of parameters governing the relationship between foraminifera counts and tidal elevation; • ψ are a set of parameters governing the sedimentation process (i.e. linking age and depth); • φ are a set of parameters governing the RSL process, including its smoothness and variability; • α are a set of parameters governing the relationship between δ 13 C and tidal elevation.
Using the notation above we create a Bayesian hierarchical model to produce a posterior distribution of our parameters given data: Before describing the components of the model that we use, we note that this is an extremely complex and computationally demanding model to fit, being of very high dimension with rich stochastic processes being required for many of the sub-models. We follow  in making some simplifying assumptions. We first assume that the calibration parameters θ can be learnt solely from the modern calibration data y m and e m . Thus the sediment core data contains no further information about this relationship. This is a common assumption in many palaeoclimate studies (see e.g. Haslett et al., 2006;Tolwinski-Ward, 2015). Second we assume that the model can be modularised into three parts: the aforementioned calibration, chronology and process modules. This is a conservative assumption and follows from the restriction on the calibration parameters.
Following these assumptions we obtain the three modules: We note that if there is no additional δ 13 C proxy information then z and α (and hence the last two terms on the RHS of the process module) are removed from the equation.

The calibration module: multinomial P-splines (B-TF)
In this module we aim to estimate the parameters θ that govern the relationship between forami-nifera and tidal elevation by using the model as specified in the previous section. The probability density function (pdf) p(y m j |e m j , θ) used as the likelihood here provides the data-generating mechanism from which foraminifera abundances can be simulated given tidal elevation. The likelihood we use for the modern model is: where p j = p j1 , p j2 , ....p jl is the vector containing the probability of finding species l at the tidal elevation associated with sample j.
The probability vectors p j are estimated from a latent response λ jl (i.e. the response of species l for sample j) which is a function of tidal elevation e m j . λ l is a J-vector including the latent response of species l for all samples j. The relationship between probability of foraminifera species occurrence and tidal elevation is expected to be non-linear so we model these using P-splines (De Boor, 1978;Dierckx, 1993) via a softmax transformation. The softmax transformation is given as: The λ parameters are given P-spline prior distributions. P-splines are created from B-spline basis functions penalised to produce a smooth curve. The B-spline basis functions are constructed from piecewise polynomial functions that are differentiable to a given degree q, here cubic. The component cubic B-spline basis functions look like individual Gaussian curves, however, they will be non-zero only over the range of q + 2 knots; this has numerous computational advantages. We refer to the B-spline matrix as B. The columns of B are the tidal elevations e m , transformed by the appropriate basis function. The resulting relationship is: B is a J × M matrix of basis functions where M is the number of knots, and J is the number of modern samples. To obtain the penalised smooth behaviour for λ we apply a prior such that the first differences of β l are normally distributed with mean 0 and precision τ β . The parameter τ β controls how close the weights are related to each other and will therefore control smoothness.
An error term, l ∼ N(0, ν l ), is added to the mean for λ here to ensure that we do not encounter problems with over-dispersion by under or over-estimating the variance in the observed data.We do not assume a constant variance; to account for the changing variation in the data the precision parameters ν l are also estimated using P-splines. This allows the variance to adapt given the data and will allow it to increase/decrease where necessary.
Similarly to λ m , the basis functions are penalised by parameters γ l to produce ν and we apply a prior such that the first differences of γ l are normally distributed with mean 0 and precision τ γ . Therefore, the calibration model has parameters θ = β l , γ l , λ, τ γ , τ β ; l = 1, . . . , L , which can be fitted in a single Bayesian model for all species simultaneously.
The B-TF produces posterior estimates for the multinomial probability vector p for each modern sample. For each species of foraminifera, we compare the probability of species occurrence (at each modern observed tidal elevation) estimated from the B-TF with the empirical probability of foraminifera species occurrence estimated from the observed data. The model vs. empirical probability comparison provides evidence to support the validity of the model, indicating if the model is capable of capturing the within-species variability of occurrence probabilities across changing tidal-elevations. Once run, the B-TF can produce predictions of elevation for each layer in the sediment core from this relationship.
We evaluate the performance of the B-TF via 10-fold cross validation on the modern data, where the data are divided up into 10 randomly drawn equal size sections (known as folds) which are removed in turn. We create predictions for the left out sections repeatedly until every observation has an out-of-sample prediction value.To allow direct and meaningful comparison between models we also cross validated the WA-TF using the same approach on the same randomly drawn folds. We showcase the output of this exercise for our case study in Sect 5.1.2.

The chronology module: Bchron
The chronology module is concerned with estimating the ages t of the foraminifera in the sediment core. These ages will necessarily be uncertain, since the radiocarbon dates r are observed with uncertainty which, when transformed into calendar years, provide highly non-Gaussian probability distributions. An interpolation step is then required to obtain estimated ages at all depths, which adds further uncertainty. A useful constraint is that age must increase with depth (older sediments lie deeper in the core, known as superposition) so a monotonic stochastic process is used. Bchron (Haslett and Parnell, 2008) assumes that the integrated sedimentation rate (i.e. the accumulation of sediment over a fixed period of time) arises as the realisation of a Compound Poisson-Gamma (CPG) process. Bchron calibrates the radiocarbon (and non-radiocarbon) dates, estimates the parameters of the CPG (here ψ) and identifies outliers. Other age-depth models are available (see Parnell et al. 2011 for a review), but Bchron was designed specifically for use in palaeoenvironmental reconstructions.
Once Bchron has been run, we obtain a joint posterior distribution of ages for every layer in the sediment core, which we denote as p (t|r, d, ψ). Each individual chronology sample from Bchron satisfies the law of superposition. However, we approximate the age of each layer in the posterior, i.e. p(t i |r, d, ψ) as a normal distribution, so that t i |r, d, ψ∼N(µ t i , σ 2 t i ). This may seem like a severe relaxation, since the ages of layers may now overlap, but we find this has minimal effect on the resulting sea level curves since the ages are further updated during the process module. Further simulations justifying this assumption have been carried out using chronological models in late Holocene sea level reconstructions from saltmarsh sediments (Parnell and Gehrels, 2015).

The process module: errors-in-variables integrated Gaussian process (EIV-IGP)
Our final step is to take the output from the previous two modules, namely estimates of the posterior PME e i for each sediment core layer from the calibration module, and estimates of the age of each layer t i from the chronology module. In cases where the secondary δ 13 C proxy is available, the posterior estimated for e i will include the likelihood p(z|e, α). This is a normal likelihood z i ∼ N(µ i , τ z ) where the precision τ z is constant and µ i will correspond to the dominant δ 13 C value at e i . δ 13 C reflect dominant plant communities on a marsh and the observed modern boundries between communities can correspond to a tidal datum (TD). As a result δ 13 C measured in bulk sediment can be related to tidal elevation as follows; µ i 's are given informative uniform priors with upper and lower limits corresponding to the maximum and minimum δ 13 C values represented in a given elevational range. The prior information required here is location specific. The details needed for priors related to our case study are presented in Section 4.2.
We can transform e i into RSL s i via the relationship s i = g ω (e i , d i ). We thus have a set of bivariate probability distributions for each layer consisting of pairs (t i , s i ) which represent the raw layer-by-layer estimates of RSL and age. To use these in the EIV-IGP framework of Cahill et al. (2015), we approximate each bivariate probability distribution as bivariate Gaussian. The model makes use of two well known statistical approaches. Firstly, the EIV approach (Dey et al., 2000) accounts for measurement error in the explanatory variable, here time. The EIV approach is necessary when dealing with proxy reconstructions that include temporal uncertainty from dating the sediment core. Secondly, the Gaussian process approach (Rasmussen, 2006) is useful for nonlinear regression problems and is a practical approach to modelling time series data. A Gaussian process is fully specified by a mean function (set to zero) and a covariance function that relates the observations to one another.
We use an integrated Gaussian process approach (Holsclaw et al., 2013;Cahill et al., 2015). A Gaussian process prior is placed on the rates of sea-level change and the mean of the distribution assumed for the observed data is derived from the integral of the rate process. This integrated approach is useful when there is interest in the rate process as the analysis allows for estimates of instantaneous rates of sea-level change. Furthermore the current sea level estimate is derived as the integral of all the previous sea level rates that have occurred, matching the physical behaviour of sea level evolution over time. By embedding the integrated Gaussian process (IGP) model in an errors-in-variables (EIV) framework (which takes account of time uncertainty), we can estimate rates with quantified uncertainty. We use the same priors for the parameters φ as described in Cahill et al. (2015) where technical details of the IGP-EIV model can be found.

Case study: New Jersey RSL
On the Atlantic coast of southern New Jersey (Figure 1), salt marshes form in quiet-water, depositional settings and display a zonation of plants into distinct vertical zones corresponding to ecologically important tide levels. Elevations below mean tide level (MTL) are not vegetated and the inorganic sediment is comprised of silt and fine sand with shell material. Low salt-marsh environments between MTL and mean high water (MHW) are vegetated by Spartina alterniflora (tall form), which is a C 4 plant. Sediment in this zone is organic grey silt and clay. High salt-marsh environments exist between MHW and highest astronomical tide (HAT). This zone is typically a wide, flat meadow vegetated by Spartina patens and Distichlis spicata (C 4 species). The sediment deposited in this zone is brown peat with abundant plant remains. The transition between high salt marsh and the freshwater upland is vegetated by C 3 plants such as Phragmites australis, Iva fructescens, Schoeneplectus americanus, and Typha augusitfolia. This community exists at tidal elevations above mean higher high water (MHHW), including freshwater environments above the reach of tidal influence, and occurs with black, amorphous, organic sediment.

Modern training set
At twelve salt marshes in southern New Jersey Kemp et al. (2013a) established transects across the prevailing environmental gradient from lower to higher tidal elevations ( Figure 1). The twelve sites were selected to span a wide range of physiographic settings including brackish marshes located up to 25 km from the coast with a strong fluvial influence. The sites share a common climate and oceanographic regime and therefore constitute a regional-scale training set. At stations along each transect a surface sediment sample was collected to describe the assemblage of foraminifera (count sizes ranged from 8 to 307 dead individuals). The tidal elevation of each sample was measured in the field.
Since the great diurnal tidal range (MLLW to MHHW) varies among sites in the study region it is necessary to express tidal elevation as a standardized water level index (SWLI; e.g. , where a value of 0 corresponds to MLLW and 100 is MHHW. At NOAA tide gauges in New Jersey, measured HAT occurs at SWLI values of 127 in Atlantic City and 123 at Cape May.

Modern counts of foraminifera
The modern dataset comprised of 172 paired observations of 18 foraminiferal species (including many zeros) and tidal elevation. The highest occurrence of foraminifera in the modern dataset is 141.5 SWLI. Higher samples were devoid of foraminifera and interpreted as being from a freshwater environment above marine influence. This modern training set demonstrates that foraminifera (like plants) form distinct assemblages that correspond to elevation in the tidal frame (e.g., Scott and Medioli, 1978), but with a secondary influence of salinity (e.g., de Rijk, 1995). Throughout southern New Jersey, low-marsh environments are occupied by Miliammina fusca and Ammobaculties spp. (groups D and E in Figure 2). High salt-marsh environments are characterized by a number of foraminiferal assemblages including groups dominated by Trochammina inflata, Arenoparella mexicana, and Tiphotrocha comprimata. High salt marshes at sites with strong fluvial influence and correspondingly low (brackish) salinity are occupied by Ammoastuta inepta (group G in Figure 2). At some sites elevations above MHHW are characterized by a group of foraminifera in which Haplophragmoides manilaensis is the dominant species (group A in Figure 2). The pattern (uniform low marsh and diverse high marsh) and composition of these assemblages is similar to those identified elsewhere on the U.S. Atlantic coast (e.g., Murray, 1991;Gehrels, 1994;Kemp et al., 2009a;Wright et al., 2011;Edwards et al., 2004). This modern training set, was previously used to develop a WA-TF (Kemp et al., 2013a), and is also used to develop our B-TF.

Modern bulk-sediment δ 13 C measurements
In the mid-Atlantic and northeastern U.S. the low salt-marsh and high salt-marsh zones are dominated by C 4 species such as Spartina alterniflora, Spartina patens, and Distichlis spicata, while the transitional marsh and surrounding upland zones are dominated by C 3 species. In New Jersey the boundary between C 3 and C 4 plant communities corresponds to MHHW and δ 13 C measured in bulk sediment can be used to reconstruct RSL by determining if a sample formed above or below the MHHW tidal datum.
Based on the modern dataset of bulk sediment δ 13 C from three sites in southern New Jersey (Figure 1) and presence or absence of foraminifera, Kemp et al. (2013a) recognized three types of sediment that were likely to be encountered in cores of organic coastal sediment.
1. Samples with δ 13 C values more depleted than -22.0‰ and with foraminifera present formed at tidal elevations from 100-150 SWLI. The lower limit of this range corresponds to MHHW and the upper limit is conservatively set to extend slightly beyond the observed highest occurrence of foraminfera (141.5 SWLI) in the modern dataset.

Proxy data
Cores of salt-marsh sediment were recovered from two sites in southern New Jersey (Cape May Courthouse and Leeds Point; Figure 1) and sliced into 1-cm thick samples. Three types of data were generated for each sediment core and were originally presented by Kemp et al. (2013b).

Fossil counts of foraminifera
In the Cape May Courthouse core Jadammina macrescens and Trochammina inflata were the dominant species from 1.72 m to 1.29 m ( Figure 3A If core samples exceeded the 20 th percentile of dissimilarity measured using the Bray-Curtis metric among all possible pairings of modern samples then the core sample was deemed to lack a suitable modern analogue and was excluded from further analysis by Kemp et al. (2013b). We did not reconstruct PME for these samples because they may lack ecological plausibility (e.g., Jackson and Williams, 2004). On Figure 3 these samples are lacking PME reconstructions (panels B, C, and E).

Fossil bulk-sediment δ 13 C measurements
In the Cape May Courthouse core all samples were less depleted than -18.9‰ and were interpreted as having formed below MHHW in a salt marsh dominated by C 4 plants ( Figure 3D, upper panel).
In the lowermost part of the Leeds Point core (below 3.35 m) the presence of foraminifera and bulk-sediment δ 13 C values more depleted than -22.0‰ indicate that the sediment accumulated above MHHW in an environment dominated by C 3 plants, but below the highest occurrence of foraminifera ( Figure 3D, lower panel). Between 3.31 m and 2.86 m bulk-sediment δ 13 C values were variable and interpreted to record the transition from highest salt marsh to high salt-marsh environments. Above 2.86 m, all samples were less depleted than -18.9‰ and were interpreted as having formed below MHHW in a salt marsh dominated by C 4 plants.

Age-depth profile estimated by Bchron
Age-depth models for the Cape May Courthouse and Leeds Point cores were previously developed using Bchron (Kemp et al., 2013a) and are used here in the chronology module without modification ( Figure 3F). The Cape May Courthouse core was dated by recognition of pollution markers, the appearance of Ambrosia pollen as a land clearance marker, and radiocarbon dating of in situ and identifiable plant macrofossils. These data were combined into a single age-depth model that estimated the age of every 1-cm thick sediment sample in the core with an average uncertainty of ∼30 years for the period since ∼700 CE ( Figure 3F, upper panel). Anthropogenic modification of the Leeds Point site limited dating and RSL reconstruction to the interval from ∼500BC to ∼1750 CE. The core was dated using only radiocarbon measurements performed on in situ and identifiable plant macrofossils. These data were combined into a single age-depth model that estimated the age of every 1-cm thick sediment sample in the core with an average uncertainty of ∼50 years ( Figure 3F, lower panel).

Instrumental data
A tide gauge is an instrument that automatically measures the sea surface height with reference to a control point on the land many times during a day. These measurements are averaged to obtain annual values to minimize the effects of weather and tidal variability. In New Jersey tidegauge data are available since 1911 CE when the Atlantic City tide gauge was installed. The Sandy Hook, Cape May, and Lewes (Delaware) tide gauges began measurements in 1932 CE, 1966 CE, and 1919 CE respectively. A single regional record was compiled by averaging annual data from these four local tide gauges. RSL is zero between 2000-2010 CE to be roughly equivalent to the age of a surface sample in the core (by definition RSL=0 m when the core was collected). The resulting record minimizes spatial variability and the influence of decadal-scale RSL variability (Douglas, 1991). A linear regression of the averaged record shows that RSL rose at an average rate of 4.03 mm/yr between 1911 CE and 2012 CE (Kemp et al., 2013a). . Bchron provided the chronology for both cores (F). Note that some samples with counts of foraminifera lack a corresponding PME reconstruction because they lack a suitable modern analog using the criteria applied by Kemp et al. (2013).

Results
We reconstruct PME using the original WA-TF of Kemp et al. (2012Kemp et al. ( , 2013a and our new B-TF. We developed a third reconstruction by incorporating downcore δ 13 C values with our B-TF to inform the posterior distribution for PME (see Section 3: process module). These results are combined with the existing Bchron age-depth model for each sediment core to reconstruct RSL. The resulting reconstructions are analysed using the EIV-IGP model to capture the continuous and dynamic evolution of RSL change while taking account of uncertainty in both sea level and age reconstructions. Our goal is honest assessment of uncertainty rather than reduced uncertainty.

Species-response curves
The B-TF estimated a response curve (mean with a 95% credible interval) for each species of foraminifera (expressed as raw counts) to tidal elevation (expressed as SWLI) from the modern training set of 172 samples (Figure 4). The response curves are estimated from a multinomial distribution (in which species compositions are considered as a whole) parameterized by a probability vector p, which is the probability of a species being present at a given tidal elevation. The multinomial model (described in detail in Section 3.1) utilises the combined species information from these observed responses to provide estimates for PME. The species prediction intervals (dashed red lines in Figure 4) will aid in providing uncertainty for the PME estimates.
Broadly, we identify two forms of species-response curve in southern New Jersey. First, a skewed, unimodal form describes the distribution of Haplophragmoides manilaenis with a maximum probability of occurrence of ∼0.2 at 123 SWLI compared to a WA-TF species optimum that was also 123 SWLI. Jadammia macrescens and Trochammina inflata also have a skewed, unimodal form with the highest probability of occurrences found in high salt-marsh environments at 124 SWLI (p ≈ 0.3) and 110 SWLI (p ≈ 0.4) respectively. The species optima for these species were situated at lower elevations by the WA-TF (99 SWLI and 100 SWLI respectively). Both Ammobaculities spp. (highest probability of occurrence of ∼0.3 at 22 SWLI) and Miliammina fusca (highest probability of occurrence ∼0.5 at 31 SWLI) have skewed unimodal distributions with maximum probability of occurrence in low salt-marsh environments. The WA-TF estimated species optima of -19 and 49.33 SWLI respectively for these two species.
Ammoastuta inepta also has a skewed, unimodal form (maximum probability of occurrence of ∼ 0.2 at 140 SWLI). This species generally has a low probability of being present because its distribution in southern New Jersey is restricted to sites with brackish salinity such as those located up river in Great Egg Harbor (Figure 1). Relatively few samples from these environments are included in the modern training set and therefore in the dataset as a whole it is a rare, but ecologically-important, species. Second, a unimodal Gaussian-like form describes the distributions of Arenoparrella mexicana (maximum probability of occurrence of ∼0.07 between 60 and 70 SWLI) and Tiphotrocha comprimata (maximum probability of occurrence of ∼0.3 at 78 SWLI). In this case the WA-TF indicated species optima of 80 and 90 SWLI respectively for these two species. These results suggest that the number and type of ecological response curve prescribed to all species in the WA-TF model (and other transfer functions) might be inappropriate for accurately describing the relationship between salt-marsh foraminifera and tidal elevation. Figure 4: The response of foraminifera species to elevation estimated from the modern training set using the Bayesian transfer function. The blue circles represent the probabilites of species occurance as determined from the raw count data (empirical probabilities). The response probabilites of occurance estimated by the Bayesian transfer function model are shown in red with a mean (heavy line), a credible interval for the mean (light line), and a prediction interval (dashed line). The green vertical lines represent the species optimum determined from the weighted average transfer function.

Cross validation of the modern data
Performance of the new B-TF and existing WA-TF was judged using 10-fold cross validation ( Figure 5).The uncertainty bounds (±2σ) for elevations predicted by the B-TF contained the true elevation 90% of the time compared to 92% for the WA-TF. The average 2σ uncertainties are larger in the WA-TF (28 SWLI) than in the B-TF (21 SWLI). The pattern of residuals in the WA-TF displayed a structure in which the elevation of low salt-marsh samples is over predicted (negative residuals) and the elevation of high salt-marsh samples is under predicted (positive residuals). For example, the WA-TF showed an average residual of -16.6 between ∼10 and ∼70 SWLI and an average residual of 22.5 between ∼120 and ∼140 SWLI. This structure is absent in the B-TF suggesting that this model is better suited to reconstructing values close to the extremes of the sampled elevational gradient.

PME reconstructions
We reconstructed PME in the Cape May Courthouse and Leeds Point sediment cores using the WA-TF and B-TF models. At Cape May Courthouse, the B-TF estimated an average PME close to MHHW (SWLI=100) of 96.2 SWLI, with a standard deviation 14.1 SWLI ( Figure 3C, upper panel). The WA-TF also estimated an average PME close to MHHW of 96.7 SWLI, with a standard deviation of 4.1 SWLI ( Figure 3B, upper panel). The 2σ uncertainties for the B-TF reconstructions ranged from 15.1 to 45.7 and are more variable than those from the WA-TF (28.1 to 29.0 SWLI).
At Leeds Point, the B-TF estimated an average PME close to MHHW of 92.8 SWLI, with a standard deviation 12.8 SWLI ( Figure 3C, lower panel). The WA-TF estimated PME close to MHHW for all samples except for the 3.00-2.80 m interval where Miliammina fusca was present and PME reconstructions were correspondingly lower ( Figure 3B, lower panel). The 2σ uncertainties for the B-TF reconstructions ranged from 11.5 to 59.8 SWLI and were more variable than those from the WA-TF (28.0 to 28.5 SWLI).
Comparison of the two models shows that the B-TF typically reconstructed PME with greater variability among samples than the WA-TF model. Similarly, reconstruction uncertainties were more variable for the B-TF model than the WA-TF model. Within their uncertainties the PME reconstructions for the B-TF and WA-TF overlap for all samples in both sediment cores.

Multi-proxy reconstruction of PME
Measurements of δ 13 C in bulk-organic sediment are useful sea-level proxies because they readily distinguish between sediment that accumulated above MHHW in an environment dominated by C 3 plants and sediment that accumulated below MHHW in an environment dominated by C 4 plants. This additional paleoenvironmental information was combined with the B-TF to generate a multi-proxy reconstruction of PME. The inclusion of the δ 13 C did not treat MHHW as a hard bound for PME, rather, if a sample is dominated by C 3 plants then the probability of PME being above MHHW increases. Likewise if a sample is dominated by C 4 plants the probability of PME being below MHHW water increases.
For Cape May Courthouse using the downcore δ 13 C values as a secondary proxy in the B-TF estimated an average PME of 87.5 SWLI (a reduction of 9 SWLI compared to the original B-TF). The 2σ uncertainties estimated for the PME reconstructions were reduced by 32% on average and up to ∼60% for some samples ( Figure 3E, upper panel). For the Leeds Point core, the inclusion of the secondary δ 13 C proxy resulted in an estimated average PME of 86.1 SWLI (a reduction of 7 SWLI compared to the original B-TF). The 2σ uncertainties decreased by ∼25% on average ( Figure 3E, lower panel). However, for samples where δ 13 C values and the presence of foraminifera indicate deposition in the transitional marsh (above MHHW, but below the highest occurrence of foraminifera) the uncertainty was reduced by an average of ∼50% and up to ∼70% for some samples because the constraint on PME changed from 0-150 SWLI to 100-150 SWLI. These results demonstrate that incorporating a second line of proxy evidence in the B-TF framework is an efficient and formalized way to reduce uncertainty in RSL reconstructions in some sedimentary environments.

Comparison among reconstructions
We applied the EIV-IGP model to the RSL reconstructions produced from the WA-TF, B-TF and multi-proxy B-TF to describe RSL trends along the coast of southern New Jersey since ∼500BCE ( Figure 6). The WA-TF and B-TF models show ∼3.9 m of RSL rise compared to ∼4.1 m of RSL rise for the multi-proxy B-TF model. The multi-proxy B-TF reconstructed lower RSL at the beginning of the record (∼ -4.2 m) compared to the B-TF and the WA-TF (∼ -3.8 m) because of the additional constraint placed on the lowermost section of the Leeds Point core by δ 13 C values that indicate deposition at 100-150 SWLI.
All of the reconstructions show three phases of RSL behavior ( Figure 6). The period from ∼500 BCE to ∼500 CE is a characterized by a continuous increase in the rate of RSL rise. The second stage shows a decline in rates of RSL rise from ∼500 CE to ∼1400 CE. After 1400 CE there is a continuous acceleration in the rate of RSL rise until reaching historic rates, which are unprecedented for at least 2500 years. Figure 6: The EIV-IGP model results for reconstructions produced using the B-TF, the WA-TF and the multi-proxy Bayesian transfer function. The upper panel shows individual data points (represented by rectangular boxes that illustrate the 95% confidence region) and include age and relative sea-level uncertainties. The middle panels show the posterior fit of the errors-in-variables integrated Gaussian process model to the relative sea-level reconstructions. Solid line represents the mean fit with the 68% and 95% credible intervals (C.I.) denoted by shading. The lower panels are the rates of relative sea-level (RSL) change. Shading denotes 68% and 95% credible intervals (C.I.) for the posterior mean of the rate process. The average rate for each phase of the reconstruction is given (in mm/yr) with a 95% credible interval.
However, there are some differences among the three reconstructions. For example, the B-TF shows the highest modern rate of rise at 4.1 mm/yr (95% C.I. 3.27-4.92 mm/yr) in 2000 CE compared to 3.16 mm/yr (95% C.I. 2.68-3.65 mm/yr) and 3.11 mm/yr (95% C.I. 2.45-3.77 mm/yr) for the multi-proxy B-TF and the WA-TF respectively. The B-TF consistently estimated RSL lower than the multi-proxy B-TF and the WA-TF between ∼1400 to ∼1700, resulting in the observed difference in the rates into the 21st century. When compared to the observed tide-gauge data for the last ∼100 years from New Jersey (Figure 7), the quality of the estimated RSL mid-point reconstructions can be assessed using mean squared error (MSE). For the multi-proxy B-TF the MSE was estimated at 0.003 m 2 compared to 0.014 m 2 for the B-TF and the WA-TF, indicating that the multi-proxy B-TF mid-points provide better estimates for RSL in comparison to the B-TF and the WA-TF.  1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 Relative Sea Level (m) 2 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 Relative Sea Level (m) 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 Relative Sea Level (

Discussion
The B-TF provides an alternative to the (non-Bayesian) regression-based transfer functions commonly used for reconstructing RSL (e.g., Gehrels, 2000; and in conjunction with the previously developed chronology and process modules enables RSL to be reconstructed in an entirely Bayesian framework. A key difference between the B-TF and existing transfer functions (e.g. the WA-TF) is the modeled relationship between species of foraminifera and tidal elevation. The number and type of species-response curves estimated by the B-TF model stands in contrast to the WA-TF that assumes a unimodal Gaussian form for all species. The optima and tolerance estimated for each species by the WA-TF shows overlap with the B-TF species-response curves, particularly those that have a Gaussian form such as Tiphotrocha comprimata. However, this form is only appropriate for two of the eight dominant species in the southern New Jersey training set. The flexible species-specific response provided by the B-TF is more appropriate given that models based on the assumption of a single response do not adequately explain the ecological behavior of the dominant species in New Jersey, or other salt marsh foraminiferal assemblages (e.g., , or species from other biological groups used in RSL reconstructions such as diatoms (e.g., Zong and Horton, 1999;Gehrels, 2000). This variability in species response to a single environmental variable arises from ecological complexity and the influence of secondary environmental variables (e.g., Murray, 1991). Therefore, we propose that the additional flexibility of the B-TF will produce more accurate PME (and consequently RSL) reconstructions than existing transfer functions such as the WA-TF.
The implication of the flexibility of the B-TF is illustrated in the cross validation results. The WA-TF appeared to suffer from edge effects (i.e., a tendency for the model to bias PME predictions towards the mean of the training data), a common artifact of using weighted average based methods (ter Braak and Juggins, 1993;Birks, 1995). Our B-TF does not suffer from this prediction bias and outperformed the WA-TF in the upper and lower extremes of tidal elevation.
The consequences of such an improvement are significant where true PMEs lie close to the ends of the sampled environmental gradient. For example, on subduction zone coastlines such as the Pacific Northwest coast of North America (e.g., Nelson et al., 1996, cyclical tectonic activity contributes to reconstructed RSL trends. During a slow (100s to 1000s of years) inter-seismic phase, accumulation of strain results in uplift of the coast (RSL fall). Conversely, the strain is released during an instantaneous co-seismic phase in which the coastline subsides (RSL rise). These processes cause significant and very rapid shifts in depositional environment that can span the full elevational range of coastal environments from sub-tidal settings to supra-tidal, freshwater uplands. In contrast, the sediment sequences targeted for reconstructing Common Era RSL on passive margins (e.g. New Jersey) are commonly comprised of unbroken sequences of high salt-marsh peat that are less susceptible to these edge effects.
Further motivation for the development of a Bayesian model for RSL reconstruction lies in the quantification of reconstruction uncertainty. Non-Bayesian transfer function methods (e.g. the WA-TF model) assume that model parameters are fixed and known. Therefore, they do not incorporate uncertainty into the estimation of the PME reconstruction itself, rather, the uncertainty is produced separately either before or after PME was estimated. This uncertainty is the root mean square error from two sources (S1 and S2; Birks et al., 1990;Juggins and Birks, 2012). The S1 contribution is sample-specific and is the standard deviation of bootstrapped PME reconstructions. The S2 contribution is the difference between observed and predicted tidal elevations established by cross validation of the modern training set ( Figure 5). The uncertainties for PME reconstructed by the WA-TF model show very little variability among samples (2σ ranges from 28.0 to 29.0). This pattern arises because the contribution from the sample-specific (S1) uncertainty is very small compared to the model uncertainty (S2) which is common to all samples. As a result the PME reconstructions for all sediment core samples have very similar uncertainties despite biological variability in species composition.
Alternatively, Bayesian methods explicitly model the uncertainty associated with individual reconstructions. Uncertainty for PME (and other unknown parameters) is included in the probability model through prior distributions. Assuming distributions for unknown parameters (in contrast to non-Bayesian approaches that use point estimates) allows the parameter uncertainty from the calibration step to be formally propagated into the reconstruction step. Therefore, estimates of PME produced by the B-TF take fuller account of the uncertainties related to the model and its parameters than non-Bayesian counterparts. The uncertainties estimated by the B-TF (excluding a secondary proxy) show more pronounced variability among core samples (2σ uncertainties range between 15.1 to 45.7). This variability arises from the observed response distribution of each species to tidal elevation (estimated from the modern data; Figure 4). For each individual species there is variability in both the uncertainty of the mean response curves and in the prediction intervals (i.e. uncertainty is greater in some parts of the elevational gradient than at others).
The variability of reconstructed PME from the B-TF is a more accurate reflection of the observed trends in downcore foraminiferal populations and is therefore a more ecologically plausible reconstruction than the WA-TF model. In the New Jersey case study in both cores the dominant groups of foraminifera are characteristic of a high salt-marsh environment.  concluded that samples identified as being of high salt marsh origin formed between MHW (SWLI values of 93 at Cape May and 90 at Leeds Point) and HAT (SWLI values of 123 at Cape May and 127 at Leeds Point). But there is a pronounced lack of variability reconstructed PME using the WA-TF model (average of ∼95 SWLI with a standard deviation of 5.5). This lack of variability in reconstructed PME is at odds with the observed downcore variability in species assemblages. For example, the key, high salt-marsh species Jadammina macrescens and Trochammina inflata vary in relative abundance from 0% (absent) to 100% and approximately 80%, respectively ( Figure 3). In contrast, PME reconstructions from the B-TF are also estimated at an average of ∼95 SWLI, but with a larger standard deviation of 13.1.
The majority of quantitative RSL reconstructions employ a single proxy (e.g., Kemp et al., 2011). A number of other proxies are available to support RSL reconstructions primarily produced from salt-marsh foraminifera. Additional biological proxies could include different groups of organisms with a relationship to tidal elevation such as diatoms (e.g., Zong and Horton, 1999;Shennan et al., 1994) or thecomebians (e.g., Charman et al., 2010;Roe et al., 2002). These organisms can be incorporated as presence/absence data or as species counts from a modern training set of paired observations of species abundance and tidal elevation. A number of lithological proxies (e.g., Nelson, 2015) are also available which can be qualitative (such as field and lab-based descriptions of sediment as high marsh or low marsh) or quantitative (such as measurements of organic content; e.g., Plater et al., 2015) and may provide thresholds in a similar fashion to sediment geochemistry in New Jersey. Although secondary proxies are often available to provide additional and independent constraints, a barrier to their use is the lack of an accessible and formal framework for combining multiple proxies with appropriate consideration of uncertainty. A strength of our B-TF is its ability to accommodate these secondary proxy sources. In the example from New Jersey we primarily used a biological proxy (assemblages of foraminifera), but amended the model to include information from a geochemical proxy (bulk sediment δ 13 C). On average this approach reduced the uncertainty for PME reconstructions by ∼28%. The reduction in uncertainty consequently provides constraints on this history of RSL change and more precise estimates of rates of RSL change through time. This is highlighted in the reconstruction of sea level between ∼ 500 BCE and 500 CE where the multi-proxy B-TF shows rates of rise for this period reach a maximum of ∼1.9 mm/yr which is greater than the rates produced by the B-TF and the WA-TF (∼1.5 and 1.6 mm/yr respectively). Uncertainty for these rate estimates was reduced by 25% for the multi-proxy B-TF compared to the WA-TF and the B-TF. These results highlight the specific utility of bulk sediment δ 13 C measurements as a sea-level indicator along the Atlantic coast of North America and the general utility of employing a multi-proxy approach to reconstructing RSL where the goal is to produce reconstructions with the best possible precision.
A practical and intuitive means to illustrate the improved performance of the B-TF over the WA-TF model is to compare RSL reconstructions with long-term measurements made by nearby tide gauges (Kemp et al., 2009b(Kemp et al., , 2013bGehrels, 2000;Leorri et al., 2008). We compare the reconstruction provided by the WA-TF, B-TF and multi-proxy B-TF with regional tide-gauge measurements from New Jersey (Figure 7). The tide-gauges measured about 30 cm of RSL change over the period 1911 to 2012 CE. Considering the 1σ errors in the reconstructions are of the order ±10 cm it is unsurprising that the uncertainty bounds of reconstructions capture the tide gauge observations. However, the RSL mid-points reconstructed by the multi-proxy B-TF are notably better at capturing the variability observed in the tide-gauge data. This suggests that the variability in the PME estimates produced from B-TF is relevant (the model is picking up a signal (as opposed to noise) due to downcore changes in foraminifera assemblages) and the estimates are improved by the addition of a secondary proxy. The improved fit between the instrumental records and the proxy reconstruction using the multi-proxy B-TF model indicates that it is possible to reconstruct decadal to multi-decadal RSL trends using salt-marsh sediment in New Jersey and similar regions. This is a noticeable advantage over existing approaches such as the WA-TF that reconstruct multi-decadal to centennial scale trends because in the absence of reconstructed PME variability, the resulting RSL reconstructions are primarily driven by the age-depth model.

Conclusion
To accurately reconstruct the continuous and dynamic evolution of sea-level change, we developed a Bayesian hierarchical model comprised of three formally interconnected modules.
(1) A B-TF for the calibration of foraminifera into tidal elevation, which is flexible enough to formally accommodate additional proxies (bulk-sediment δ 13 C). (2) An existing chronology developed from a Bchron age-depth model. (3) An existing EIV-IGP model for estimating rates of sea-level change. Previous reconstructions treated these three components as independent and employed existing approaches that were developed in a variety of numerical frameworks.
Our new B-TF provides an alternative to existing transfer functions. We illustrate the improved performance of our approach by applying the B-TF and a WA-TF model to a dataset of common Era salt-marsh foraminifera from southern New Jersey, U.S.A. The relationship between species of salt marsh foraminifera and tidal elevation was described using a regional-scale modern training set (n = 172) comprised of paired observations of species abundance and elevation. Results from the B-TF show that six of the eight most dominant foraminifera do not conform to the unimodal, Gaussian response curve prescribed by the WA-TF and other existing transfer functions.
We propose that the B-TF produces more accurate RSL reconstructions with a more complete evaluation of uncertainty and greater ecological plausibility than the WA-TF model. We applied the transfer functions to cores of salt-marsh sediment that were recovered from two sites in southern New Jersey. The flexible approach utilized in the B-TF results in more variability in reconstructed PME and associated uncertainty among samples than the WA-TF model. This variability is consistent with observed changes in foraminiferal population in core samples.
The B-TF allows results from additional, independent sea-level proxies to be formally incorporated alongside the primary biological proxy to produce a multi-proxy reconstruction. In New Jersey, we used bulk sediment δ 13 C values to determine if a core sample formed above or below the MHHW tidal datum. The addition of a second proxy reduced reconstruction uncertainty by an average of 28% and up to ∼70% for some samples.
We assess the ability of the multi-proxy B-TF, B-TF and the WA-TF to reconstruct RSL through comparison with observed tide-gauge data from New Jersey. Results showed that the 2σ uncertainty bounds for all reconstructions capture the observations from the tide gauge. However, the multi-proxy B-TF provides improved estimates (MSE = 0.003 m 2 ) for the reconstructed RSL mid points compared to the B-TF and the WA-TF (MSE = 0.014 m 2 ), indicating that the multi-proxy B-TF has the potential to capture the decadal-scale variability seen in the tide gauge data.