Palaeo sea-level and ice-sheet databases: problems, strategies and perspectives

Introduction Conclusions References


Introduction
The rapid growth in the acquisition of paleoclimate data and the development of strategies to assimilate these data into models has resulted in a growing need for openaccess, user-friendly databases (Overpeck et al., 2011).One area where such a need is highly relevant is in the data used to reconstruct the aerial extent and elevation of former ice sheets and past changes in sea level, particularly as new approaches for integrating these data into ice-sheet (e.g., Tarasov et al., 2012;Briggs et al., 2014) and sea-level (e.g., Kopp et al., 2009;Lambeck et al., 2014) models continue to emerge.Further advances in these areas are particularly important for improving our understanding of the paleo-context of ice sheets and sea level in projections of future icesheet and sea-level change (Church et al., 2013;Masson-Delmotte et al., 2013).
This paper addresses strategies for developing a standardised database of geological records that constrain ice-sheet and sea-level histories.Any such database Introduction

Conclusions References
Tables Figures

Back Close
Full must recognise that these records are subject to three distinct phases -measurement, documentation, and interpretation -to infer palaeo-sea level and/or characteristics of palaeo-ice sheets at a certain point in time and space.In particular, measurements of paleo-ice sheet and paleo-sea level records and especially their interpretations are often subjective and associated with uncertainty.For example, a moraine records the former position of an ice margin, but a priori it is unclear whether the moraine records a maximum ice-margin position, an ice-margin re-advance to that position, or a pause at that position during overall retreat.Similarly, shallow-water corals typically record the elevation of a former sea level with a vertical uncertainty that reflects the palaeodepth interpretation (e.g., Lighty et al., 1982).Even if uncertainties are based on the taxonomy of the coral only, they may be much larger than reported because modern ecosystems may not be accurate analogues for all past environments (e.g., Woodroffe and Webster, 2014).Additional factors that may affect the interpretation of these records in terms of global mean sea level include uncertain corrections for vertical displacement due to tectonics or isostasy (Raymo et al., 2011;Creveling et al., 2015) and regional variability associated with sea-level fingerprints (Clark et al., 2002;Kopp et al., 2009;Raymo et al., 2011;Törnqvist and Hijma, 2012;Hay et al., 2014).Moreover, the numerical ages of the records are based on geochronometers that have uncertainties related to, among others, the isotopic half-life or production rate or the calibration curve used to transform isotopic ages into calendar years.Finally, the research field is split into many (sub)communities and research groups where differences in data reporting can lead to confusion if the methodologies are not clearly explained.
All of these issues complicate efforts to develop standardised databases that are needed to address regional-to global-scale research questions.Furthermore, development and calibration of ice-sheet (Tarasov et al., 2012;Briggs et al., 2014) and glacial-isostatic adjustment (GIA) (e.g., Peltier et al., 2015;Lambeck et al., 2014) models rely heavily on high-quality databases, and systematic uncertainties therein can cause spurious results (Siddall et al., 2009;Siddall and Milne, 2012).Introduction

Conclusions References
Tables Figures

Back Close
Full Attempts to generate databases of former ice-sheet and sea-level changes began almost a century ago (Daly, 1934;Godwin, 1940).The call for internationally coordinated compilations of sea-level data started with IGCP Project 61 in 1974 (e.g., van de Plassche, 1986) which also formulated strategies for computer storage, although that could not be satisfactorily achieved at that time.The continued limitations and challenges of existing databases were recognised with the launch of the PALeo constraints on SEA level rise (PALSEA) working group in 2008 (Siddall et al., 2010).This project, sponsored by Past Global Changes (PAGES) and the International Union for Quaternary Research (INQUA), has grown to involve many research groups working on databases of ice-sheet and sea-level records.One of the goals of PALSEA and its successor PALSEA2 is to facilitate the construction of an open-source, quality controlled relative sea-level (RSL) and ice-sheet databases.In this short communication, we build on the experience of the PALSEA community in outlining strategies for designing self-consistent and standardised databases of changes in sea level and ice sheets.In particular, we build on existing guidelines for data reporting (e.g., Shennan et al., 2015;Balco et al., 2008) and we identify key components towards successful database creation.

The community structure
Most members of the community interact with databases in one way or another: those who develop the architecture (the database creators), those who populate the database (the data creators), and those who utilise or interrogate the database (the end-users).The end-users extract inferences from the data, often (though not always) by comparison to other datasets such as model results.There is often overlap between the communities, with many data creators also being data end-users.In some instances, the person or group who compiles published data to produce a larger, unified database is not the original data creator, but a "compiler".It is essential that a data compiler un-Introduction

Conclusions References
Tables Figures

Back Close
Full derstands the details of the datasets being joined or works closely with someone who does.
Most funding agencies require that data collected in the framework of a project are archived and made available through data repositories.Despite this requirement, dedicated funding for database creation is rarely available, as funding mostly prioritises projects that follow the classic hypothesis-driven research approach (e.g., the improvement of measurements of a certain indicator in a particular location).There is therefore a need for funding opportunities that deviate from this approach and favour research collaborations that focus specifically on database development and challenges, including projects that do not collect new data but rather amalgamate and re-analyse published datasets into new databases.This is particularly true in view of the large monetary investments that have been made for the collection of ice-sheet and sea-level data in the first place.

Standardised measurements and data reporting
Compared to many other palaeoclimate time series that often consist of a limited number of proxy measurements along with age information, ice-sheet and sea-level data are notoriously complex.Reconstructions of former ice-sheet margins and thicknesses and elevation of past sea levels are based on data that are first measured in the field and then interpreted.For example, the elevation of a palaeo beach deposit, if measured with high-precision techniques and referred to a standard geodetic datum, is an objective measurement (Woodroffe and Barlow, 2015).Attribution of the elevation of palaeo sea level at the same beach is subject to interpretation of its relationship to former tide levels (e.g., is it a storm deposit, deposited above the tidal zone?).It is therefore essential to use standard protocols, both to measure the record and to document the original, "raw" data and the associated uncertainties.These raw data are invariable in portant to separate the elements of interpretation from direct measurements.Following this strict approach will make it possible to reuse or reinterpret "raw" data in the future.
In general, the final aim of studies of past sea level and ice sheets is to obtain a temporal and spatial record of the former position of the relative sea level or ice mass.Therefore, deviations from standardised measurements or missing data pertaining to location, age, or sample type/characteristics (e.g., mollusk species needed to infer depth habitat) are the most common causes of discrepancies and problems related to database building.Other issues include applying different dating techniques to estimate the age of the same indicator, or discrepancies between calibration curves used in different studies.
Along with the measurements of age and location (including elevation), the uncertainties associated with these parameters must also be described in as much detail as possible.This includes not only the uncertainties from direct field or laboratory measurements, but also how the uncertainties have been calculated, and which uncertainties derive from direct measurements and which ones derive from interpretations.In general, each parameter in a database should carry an uncertainty and a full description of how it has been calculated or estimated based on interpretation.This also opens up room for improving the quantification of the uncertainties.Uncertainties are usually treated as normal distributions, but in some cases (e.g., sea-level limiting data that only provide information on maxima or minima) it may be necessary to allow for other probability distributions.
Unfortunately, there are currently large disparities between studies in terms of standards used to take measurements in the field and the laboratory.Additionally, incomplete data reporting limits the longevity of some data.An important goal for the future is for different communities to agree on standardised measurements and data reporting norms.This will facilitate seamless interfacing with database systems for archiving and further analysis.Introduction

Conclusions References
Tables Figures

Back Close
Full Any database containing information on heterogeneous samples that are sparsely and unevenly distributed in space and time, like ice-sheet or sea-level data, rests on standardised documentation of a few fundamental data fields: (i) location (latitude, longitude and elevation or depth), (ii) age, including lab identification number and details on the technique used and, wherever available, the raw data, (iii) description of the feature that should divide objective description and interpretation, and if applicable, possible dual interpretations, (iv) uncertainties and how they have been measured/inferred.All these fundamental fields might include subfields, which are related to special needs of the reported data type.There are many databases currently in existence (see table in online supplement for a list of examples), all with unique strategies and emphases, but often lacking essential information for future researchers to reinterpret the data.
Regardless of the type of data reported, one fundamental choice in the design of a database is the type of relationship between the data columns, or fields.If the relationship is one-to-one, then a simple spreadsheet can be used (e.g., Hijma et al., 2015).The same is true for Last Glacial Maximum ice-sheet databases (Clark et al., 2009;Briggs and Tarasov, 2013).If the one-to-one relationship fails, then one needs more advanced relational databases including many-to-one or other relationships that are difficult to handle with a single spreadsheet.
In general, a good practice before building a database is the rationalisation for what its structure should be, and what fields will likely become necessary in the future.This should involve informatics experts to develop the software architecture.These experts should be closely involved in research projects so they can interface directly and continuously with earth scientists.An essential goal for the future of ice-sheet and sealevel databases is to make database designs more flexible to incorporate additional elements that may be added after the initial set-up.Introduction

Conclusions References
Tables Figures

Back Close
Full Databases have driven many major developments in our understanding of ice sheets and sea level in the Earth system.For example, sea-level databases help to constrain model estimates of the rates of GIA during and following the last deglaciation (e.g., Bradley et al., 2011;Milne and Mitrovica, 2008;Peltier et al., 2015;Whitehouse et al., 2012), which in turn have constrained estimates of current rates of ice-sheet mass loss and sea-level rise from geodetic observations (e.g., Vaughan et al., 2013).These databases have also provided estimates of the magnitude of the sea-level highstand during the last interglacial period (e.g., Dutton and Lambeck, 2012;Kopp et al., 2009) and helped to improve our understanding of global ocean volume during the Pliocene (e.g., Rovere et al., 2014Rovere et al., , 2015)).Likewise, the worldwide timing of the Last Glacial Maximum is well constrained by ice-sheet databases (e.g., Clark et al., 2009).
For databases to have high scientific value, it is important to consider the needs of the end-users during database creation.It is essential to provide supporting metadata, such as extensive definitions of fields, to make it clear to a non-specialist (e.g., a GIA modeller) who may not be familiar with specific terminology (e.g., taxonomy of salt marsh foraminifera).In spreadsheet databases, it can be advantageous to link fields by equations, but it has to be noted that this does not in any case replace suitable metadata.Providing the opportunity to visualise the data is essential, as it is an easy means to show the information contained in the database and an important tool for quality control.For example, simply plotting the data in a thematic map can effectively detect gross typing errors for location information.More complex visualisations can be used to give users a first hint at which data may be suitable for their specific needs.To develop general visualisation approaches (e.g., Unger et al., 2012;Rovere et al., 2012), it is essential to use standardised approaches to store the data and to use consistent data types per column.It is also important to consider complementary data that may be of use to the data re-user, such as information regarding the reference frame used when GPS measurements were made.This may be done directly within the single database

Conclusions References
Tables Figures

Back Close
Full or through links to supplementary databases.An example for such a scheme is the International Geo Sample Number (IGSN), which links a 9-digit alphanumeric code to a uniquely identified geological sample.These additional meta-and supporting-data allow interested users more insight into the original data background.The connection of many databases with different tasks and foci will be an essential element for future research.Global, quality-controlled databases are necessary for answering the challenging questions about the Earth system.

Data citation and unique identification
Data and database creators should both receive scientific credit for their work and effort (Costello, 2009;Kattage et al., 2014).For the database creator, the problem of citation is relatively simple to overcome.Scientific credit is often granted through the publication of the database in a journal, and indexed as such.Apart from journal publication, it is possible to publish the data alone and assign a Digital Object Identifier (DOI) to the database, which allows the citation of data in a comparable way to a journal publication (Paskin, 2005;Quadt et al., 2012).The requirement of stable, unchangeable versions of the datasets with DOIs may be a problem in the case where databases are structured to evolve and expand over time.It has to be stressed that data archiving is a requirement for many funding agencies and that publishing data will help to fulfil the requirement with spinoffs for the invested work (Düsterhus and Hense, 2014).This is of additional benefit when the requirement to publish data in a reasonable time from after finishing the evaluation within a project is enforced.With regards to crediting data creators, the problem is more complex.While the rules for citation of small datasets are well established and often complied with by citing original publications directly, large databases may contain hundreds or even thousands of references.When a database of many records is published, it is to be expected that in the future only the database is cited and not the underlying original publications.As Introduction

Conclusions References
Tables Figures

Back Close
Full a consequence, a problem emerges because the credibility measures used in science (e.g., H-Index; Hirsch, 2005) ignore such re-citations in their calculations.
Although the entry of such material into a reviewed database implies that the data creator's work passed a further evaluation procedure as outlined above, the risk increases that original authors are no longer rewarded for their work (Kattage et al., 2014).Consequently, it may be hard for database creators to motivate data creators to contribute their data to a given database, which is critical to achieve completeness.As this is a general problem for data-intensive sciences, the solutions to such problems are an important challenge for the future that extends well beyond our community.

Centralised relational databases
Ideally, ice-sheet and sea-level databases might become centralised and interconnected via the Internet.One possible platform for the databases discussed herein is the NOAA World Data Center for Paleoclimatology (e.g., Wahl et al. (2010); http: //www.ncdc.noaa.gov/data-access/paleoclimatology-data),currently the most widely used data resource among Quaternary palaeoclimatologists.For data end-users, standardised accessibility from one source is an advantage, but is hard to achieve with a single database due to the diversity of the data described above.As a consequence, new approaches are necessary to create such an interface for the community.As spreadsheet databases are already available in many fields, it is reasonable to use this information as a basis for future databases.Scripts that read the machine-readable data could handle this task, rather than the otherwise necessary input interfaces for the different disciplines.The "database of databases" could act as a stepping-stone towards a centralised configuration.
Key elements of interdisciplinary databases are simple accessibility, long-term availability, transparent data processing, continual updating and trust.Transparency is simpler to achieve with relational databases than with spreadsheet databases.Relational databases allow for a more detailed description of the uncertainties by offering ded-Introduction

Conclusions References
Tables Figures

Back Close
Full icated views for different re-user communities.Including such information, like percentiles for non-Gaussian density functions, into spreadsheet databases would drastically reduce their usability.Also the processing of data, including quality assurance of the data and interpretation of the raw data, can be handled in more transparent ways.Furthermore, trust in such a database requires good software design, and longterm availability and citeability of the procedures.The latter is only achievable with long-term funding, which is currently an obstacle to many centralised databases.Maintaining databases to ensure that they stay up to date requires not only stable funding but also community buy-in.This should become part of the process of generating and reporting new data.

Conclusions
We have highlighted new advances that demonstrate substantial progress by the palaeo-sea-level and -ice-sheet communities in developing strategies for interdisciplinary data availability and organisation.The main challenges for the future reside in standardising data processing, from the collection of the data to their final interpretation.New databases and new approaches to community database-building efforts could help to achieve this goal.Nevertheless, funding agencies tend to see databases as deliverables in research projects rather than broader community tasks.This makes the creation of unified and standardised databases challenging.One approach to address this issue is to design research goals around the need for having such databases so that the database becomes an essential component towards successfully answering the research questions.Each of the goals defined herein, if achieved, will bring large advances to the field of palaeoclimatology, opening opportunities for new scientific insights.To achieve these goals, an interdisciplinary approach is imperative, and communities currently not involved in this research field should be included at least as Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | time.By comparison, interpretation of a sea-level indicator, such as its indicative range, may change over time with improvements in measurements and analysis.It is thus im-Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | data end-users.The PALSEA2 community (http://people.oregonstate.edu/~carlsand/Discussion Paper | Discussion Paper | Discussion Paper |PALSEA2/Home.html) is working towards fulfilling these goals with a focus on the longevity of such databases.