ESIP Endorses Guidance for Science on Schema.org Metadata
The Earth Science Information Partners (ESIP) voted to endorse new metadata guidance to help make Earth science data more discoverable and interoperable. The endorsement is of the ESIP Schema.org Cluster’s newly released updates for Science On Schema.Org (SOSO) Guidance Documents version 1.3.0.
Science on Schema.orgThe Schema.org Cluster works on metadata standards to help improve Earth science data discoverability. Credit: Chris Marsicano/ESIP
Across the sciences, researchers produce millions of peer-reviewed scientific papers in hundreds of thousands of publications every year. But this is only one look at research productivity, which many in the Earth science data community see as too narrow and sometimes problematic.
In the era of Big Data and the pressures of “publish or perish,” discoverable and reusable datasets are like diving for pearls — they are hard to find within a whole ocean and hunting skillfully takes years of practice.
To make Earth science data easier to find, Schema.org markup offers a solution through standardization and consistency. In ESIP, the Schema.org Cluster has developed guidelines to help data-generating groups format the metadata on dataset web pages. Clearer metadata in domain-specific formats is like providing a pearl-diving map and seafloor scanner, making datasets easier for search engines to read, categorize, and share.
Now endorsed by the ESIP Partner Assembly, the Science On Schema.Org (SOSO) Guidance Documents version 1.3.0 (DOI 10.5281/zenodo.4477164) can help on multiple scales, from individuals to labs, from research institutions to repositories.
The cluster chair is Adam Shepherd, technical director at ESIP’s partner organization Biological and Chemical Oceanography Data Management Office (BCO-DMO). As Shepherd puts it, “Providing discovery-level metadata through schema.org is the gateway to enabling web-based discovery of earth and environmental data for researchers and the public.”
How Science on Schema.org Works
On the Schema.org website FAQ, the collaboration lays out their main purpose: “Make it easier for people to find relevant information on the web.” Schema.org does this by providing a simple system for headings, keywords and other metadata that make it easier for a search engine to browse and catalog a web page.
That is straightforward for a blog or recipe, but geo data can be more complex. From gleaning updates on real-time volcanic eruptions to weaving together glacial changes spanning decades, centuries, millenia or more, Earth science data carries information and challenges that are often unique.
“Describing geologic time can be complicated, and schema.org by itself does not provide sufficient detail for describing geologic events,” said Dave Vieglais, one of the Cluster participants and a senior scientist at the University of Kansas. “SOSO provides guidance on how events and periods in geologic time can be represented succinctly in schema.org metadata through some community defined extensions to help describe the extended periods and uncertainty often associated with geologic time.”
Standards for geologic time is just one element of how the science on schema.org markup can enhance Earth science datasets. Other examples include guidance on provenance or even validating data repository pages using Shape Constraint Language (SHACL).
“Adoption of the SOSO conventions by repositories represents a sea change in cross-repository data discovery,” said Matt Jones, another cluster participant and the director of DataONE. “With standardized schema.org metadata, federated data search providers like DataONE can expand their rich, domain-relevant search and discovery services across a vast swath of the data provider community, making Earth science data broadly accessible for reuse and synthesis.”
Metadata makes Earth science data more discoverable.
The Canadian Consortium for Arctic Data Interoperability (CCADI) is one example of how metadata with schema.org markup can become easier to find – and be stewarded properly. CCADI brings together Inuit communities, scientists, data managers, and institutions to make sure Indigenous data sovereignty is preserved while lowering barriers to dataset access. By using schema.org, CCADI allows its data repositories to implement schema.org at the repository level and be integrated into a metadata mediator where the consortium's data is kept and shared.
“Another example includes the collaboration UNESCO Ocean InfoHub” said Doug Fils, one of the cluster co-chairs. “Alongside Pier Luigi Buttigieg from the Helmholtz Metadata Collaboration and ESIP Semantic Harmonization Cluster, we have collaborated on schema.org markup to make the world’s oceanic data more accessible.
The Schema.org Cluster is working with other groups and organizations to share and implement their SOSO guidance. While the overarching goal is improving data discoverability, the successful execution of that is done by digging into the details of individual datasets. By connecting the discovery dots with schema.org markup, more Earth science data can become accessible and reusable.
“ESIP endorsement is the first step to making the guidance more widely adopted in the Earth science data community,” said Susan Shingledecker, ESIP Executive Director. “The initiative began in community and it is fitting to see the collaborative vision recognized at the community level. Our endorsement process is community acknowledgement of the thoughtful work of the Schema.org Cluster.”
This blog post was written by Allison Mills with edits by Adam Shepherd, Doug Fils, Dave Vieglais, Chantelle Verhey and Matt Jones. The Schema.org Cluster worked collaboratively on the version 1.3.0 release and can be found on the ESIP github.
ESIP stands for Earth Science Information Partners and is a community of partner organizations and volunteers. We work together to meet environmental data challenges and look for opportunities to expand, improve, and innovate across Earth science disciplines.
Learn more esipfed.org/get-involved and sign up for the weekly ESIP Update for #EarthScienceData events, funding, webinars and ESIP announcements.