Welcome to the new ESIP website!

Guest Blog: The Power of Soil Data

Guest Blog: The Power of Soil Data

There are about 30 Collaboration Areas in the Earth Science Information Partners (ESIP) community. Each one focuses on a specific domain or technical challenge – like soil data. The Soil Ontology and Informatics Cluster harmonizes soil data collection and taxonomy by connecting the soils research community.

In her guest blog, Cluster participant Vaasuki Marupaka explains why multidisciplinary approaches are crucial for soil science and understanding global carbon dynamics.


Soils are a precious resource and life on Earth! I have always been fascinated with soils and their capacity to fight global carbon emissions and changing climate caused by anthropogenic disturbances. 

Being a part of the ESIP Soil Ontology and Informatics Cluster exposed me to the richness of open science and value of sharing data and maintaining provenance to work towards data harmonization and synthesis.

Soil data is critical for understanding climate change.

Millions of living organisms inhabit soils and are home to diverse groups of species. These species make up a major part of soil organic matter that decompose and enrich soils in carbon.

Soil organic matter provides us with an estimate of soil organic carbon. This soil organic carbon (SOC) is the game changer in the context of carbon balance in the earth system. As soil is a source and a sink for this carbon, it is vital to study soil carbon in the context of global climate change.

The Soil Ontology and Informatics Cluster held a webinar series to bring together interdisciplinary tools and efforts. This is their playlist from the ESIP YouTube channel with all 21 presentations shared between 2020-2023.

Soil Ontology and Informatics Cluster

The Soil Ontology and Informatics Cluster helps connect the soils research community and those whose research incorporates soils data to informatics tools for better research. The group develops semantics and ontologies resources — including a series of presentations — as well as community building efforts to make connections between people who might not otherwise have the chance to meet.

But we need better soil data first.

Several studies have been done, and are currently taking place, centered around soil carbon. So much data is produced. Soil data are available publicly through repositories, government organizations, and individual studies spread across disciplines such as agriculture, environmental science, and Earth system science. 

However, these soil data are often: 

  • Heterogeneous – different formats and methods are used to observe this diverse natural resource described by many physical, chemical, biological, and anthropogenic properties
  • Disparate – unique and independent cases driving different observational strategies reflected in the data
  • Complex – any one property is the result of an interplay with a multitude of variables and parameters

Historic soil data exists from the rich legacy of soil surveys. These surveys, however, are composed of a collection of standardized soil classification systems and survey manuals across the world. This legacy data combined with recent data collections can provide insights on long-term processes like soil carbon sequestration from local to regional to global scales. 

Researchers are constantly adding new observations from their studies that expand our understanding of soil science. By working with evolving data, we can build a soil data warehouse that can serve as a platform for policymakers to aid in decision making.

What it means to harmonize soil data.

As the saying goes, the more the merrier! While there are many open questions that could be addressed by larger, more complete datasets – climate draw-down potential of soils lends urgency. We need soil-specific data collections that span decades of collection across the globe. 

Since no single data collection effort spans this space-time scale, we must harmonize data across multiple historical and contemporary efforts.

Harmonizing data means making data speak to one another so that data that is intercomparable is presented together. Harmonization involves mapping variables across datasets and to do this we need to agree on what those variables mean. Semantics is basically how the words are defined, the use of standardized vocabulary to describe observations. With soil data, the challenge is that the measured variables are not always comparable all the time, meaning that we need an adaptable hierarchical description of these variables so that we can craft data products for their intended purpose. 

By pooling these data, we can better parameterize our biogeochemical models and in turn better understand the carbon draw-down potential of these soils. 

I also think about the data coverage across the globe. Specifically, there is less available data in the global south. It sparks questions like what are the reasons for poor representation of soil data across these regions, and what can be done to fill this gap. For achieving climate goals through soil carbon sequestration, we need comprehensive data coverage. Integrating soil data from the global south can shed light on the regional trends in soil carbon sequestration across space and time. This data-centered task could build community involvement and collaboration across national boundaries. 

How Can We Leverage Soil Data?

If you had infinite time and money, you could possibly try to manually curate and harmonize soil data. Indeed in the past it has been done, where data were manually compiled and curated, often by a poor junior colleague. But that being a tedious task, we pivoted towards using informatics and technology to advance our science. 

While the data management here is relatively straightforward to link relational databases to a second, the vocabularies and semantics to support those linkages are not broadly agreed upon. There is a gap in community support in soil science for these semantics. To incentivise the soils community, we must first demonstrate the promise of this approach. 

The Soil Ontology and Informatics Cluster in partnership with the International Soil Carbon Network is developing a workflow framework for harmonization of soil data to start a dialogue between these diverse datasets. 

Join us on our next Cluster call and help integrate global soil datasets!


This blog was written by Vaasuki Marupaka with edits by Allison Mills and Megan Carter from ESIP.

ESIP stands for Earth Science Information Partners and is a community of partner organizations and volunteers. We work together to meet environmental data challenges and look for opportunities to expand, improve, and innovate across Earth science disciplines.

Learn more esipfed.org/get-involved and sign up for the weekly ESIP Update for #EarthScienceData events, funding, webinars and ESIP announcements.