ESIP Community Fellow and M.S. student, Alexis Garretson, reflects on her summer fellowship with the Environmental Data Initiative, an ESIP Partner Organization, and how the Data-at-Risk Matrix in development by the ESIP Data Stewardship Committee aided her in this work.
During the Summer of 2019, I was an Environmental Data Initiative Fellow placed at the Mohonk Preserve, a land trust in the state of New York. The data management needs of Mohonk Preserve are rather unique because they house the archives and collections of naturalist Daniel Smiley. Daniel Smiley was born and raised alongside Lake Mohonk, a glacial lake in the Shawangunk Mountain Range. Excepting four short years away at college, he spent his entire life on the land that became Mohonk Preserve in 1963. Throughout his life, he collected an enormous volume of data about the things he observed on the land around them. He began recording weather observations in 1938, began counting the number and life stages of amphibians in 1931, and began taking pH measurements of the groundwater springs in the mid-1970s. Smiley and his successor, Paul Huth, were strong believers in what is known as ‘serendipitous science,’ the practice of recording all interactions they had with the natural world. Because of this practice, the filing cabinets of the archive are filled with thousands of pages of recorded species observations.
Much of the data in the Daniel Smiley Research Center is considered at-risk and in need of rescuing for integration into the body circulating environmental and ecological data. The archives include 123 years of daily weather observations and nearly 100 years of natural and cultural history observation. The data is primarily stored as handwritten narratives on notecards and looseleaf papers, but the center also has more than 60,000 physical specimens and 9,000 photographs. Because of the ongoing data rescue needs at the preserve, The Daniel Smiley Research Center has partnered with the Environmental Data Initiative to receive additional support in the data management needs of the preserve.
Over the summer Fellowship, I provided expertise in cleaning and presenting datasets extracted from the digitized records. My first project focused on packaging the amphibian breeding ecology data collected from 1931 – present. This dataset included more than 2,000 sampling events and more than 150,000 unique individuals across all 9 species. I took this data from multiple excel sheets, combined them into one species occurrence dataset and an environmental quality dataset, created metadata in the Ecological Metadata Language, and published the resulting data package on the Environmental Data Initiative Repository. In addition to the amphibian data, I am continuing to work on packaging the data from vegetation sampling plots, phenology records, and stream monitoring.
Finally, my role involved developing a strategy for managing the data resources in the archives and helping plan for the data rescue needs. One tool that was extremely helpful in this process was the Data-At-Risk Matrix currently under development by the ESIP Data Stewardship Committee. This matrix helps identify and prioritize the steps in the data rescue process based on the holdings of an individual or organization by identifying the impacts of different risk factors and how they might be mitigated. At Mohonk Preserve, the matrix provided a way to have structured conversations about the holdings in the library and the steps needed to preserve them. One risk aspect it helped us to think through was the lack of metadata and documentation of methods for a lot of the historical data. A lot of the information about specific protocols and methods — including the meanings of codes and locations — were not written down alongside the outputs, which is a huge risk when preparing the data for reuse. Ensuring that metadata is recorded before it is lost is a huge priority for the preserve in the resulting long-term data management plan.
The Environmental Data Initiative Fellowship Program was a great opportunity to connect with ESIP partner organizations while developing my information management expertise. The fellowship opened with an in-person crash course in data publishing, including instruction on cleaning and structuring datasets, on using and creating Ecological Metadata Language, and on publishing data packages in the EDI data repository. This training, combined with the hands-on experience in data management at Mohonk Preserve and other host sites, has helped me define and explore career opportunities in environmental information management roles.
More about Alexis: Alexis is a master’s student in Department of Evolutionary Biology at George Mason University. She is broadly interested in ecological and evolutionary modeling, legacy data integrating, and ecological informatics. Before her master's work, she was a post-baccalaureate researcher in the Department of Biostatistics at Harvard and she received a BS in biology from George Mason University in Virginia. Alexis is working with the Data Stewardship Committee.