Writing Your Data Management Plan

Workshop Summary

Introduction

This document summarizes material presented December 14, 2010, at the AGU workshop “Writing Your Data Management Plan” sponsored by the Earth Science Information Partners.  The workshop brings together data experts from different agencies and organizations to describe the essentials of data management and stewardship.  A major focus is the new NSF requirement for a data management plan to be submitted with every proposal.  Funding agencies and science initiatives around the world are placing new emphasis on the role that data sharing can play in advancing science.  Shared data enable new ways of scientific problem solving.  Data enable the informatics revolution, from data mining to cloud computing. The revolution is also placing new burdens and demands on you as Earth Scientists.  Not only are you expected to share your data, but now you are now expected to do more to describe that data, to make it useful to those in other disciplines and to future generations.  These goals can be accomplished by developing a data management plan, by employing the best practices developed by your community to document and describe your observations, and by working with Long Term Archives to ensure the preservation of your data.

 

Data Management Plans

Successful data stewardship begins with a data management plan.  A data management plan identifies the materials that will be created; identifies the standards and organization of the materials; states the access, sharing, and re-use policies; and addresses backups, archives, and preservation.  The best time to write this plan is when a project is initially developed.  Developed early, the plan can guide the acquisition of needed data, ensure that volatile information is acquired before it is lost, safeguard the data from misuse, and ensure maximal utility of the data long after the project is completed.

Checklists have been created to help scientists develop data management plans yet it remains difficult to prescribe one single solution that fits all projects. Standards are evolving, expert communities often have unique requirements, and achieving the broadest impacts requires some creativity and individuality.  The best data management plan will satisfy the requirements of its specific user community, while following the broad principles for data documentation and stewardship that enable the data to be found by others and allowing those users to assess its suitability for use for their purposes many years from now.

 

Long Term Archives

Data experts working with Long Term Archives (such as the World Data Centers) can help Earth Scientists archive and preserve their data, serve as a resource for questions and methodologies, and as partners for long-term stewardship.  Among many other supporting roles, these experts can help scientists decide what data to archive (and what not to archive), provide a host of functions supporting the data lifecycle, and help with data descriptions and documentation.

What to Archive:Archiving everything is neither warranted nor possible, while archiving nothing fails to achieve the broadest use and impacts of Earth Science data.  Appropriate data for long term archives include unique measurements, those part of a time series, and raw “level 0” data from laboratory, field, and Earth observing platforms.  Data lacking in descriptive information (metadata), interim results to be superseded by more complete measurements, and data that can easily be reproduced are often not appropriate for long term preservation.  Some requirements will be determined by funding agencies, stakeholders, and user communities or be mandated by legal or contractual requirements such as privacy issues or intellectual property rights.

Data Lifecycle:

Data Description and Documentation: Data need sufficient documentation so that the data can be found by others and their suitability of use determined.  Content standards exist and range from the general and brief (Dublin Core is designed to describe nearly any data type with fifteen properties), to extensive descriptions that include geospatial information (the Federal Geographic Data Committee Metadata Content Standard is promoted in the US).  Within a specific content standard, approved glossaries and naming conventions such as the Climate and Forecast convention for netCDF data can greatly increase the quality and usefulness of the data.  In many communities, scientists have worked with data experts to develop and promote the use of these conventions.

 

Where to Go For Additional Information

Internet resources on data management and stewardship are vast.  The ESIP pages listed below provide access to some of these links. Links to NSF policies and several Data Management Plan Checklists are also provided.