ESIP Lab – RFP now openThis year’s theme is climate resilience.

What We Wish We’d Learned in Grad School

In this post, ESIP Community Fellows Ben Roberts-Pierel, Ellie Davis Pierel and Yuhan Rao, share highlights from their highly successful 2020 ESIP Summer Meeting Session, entitled “What We Wish We'd Learned in Grad School.” The group continues to engage in efforts to raise awareness about the need for data management training at the graduate school level and invites you to share your input.

When the inkling of an idea starts to take shape, it can be difficult to anticipate the final, or even intermediary, result. Even more difficult to anticipate is the reaction from the community. Did Frankenstein set out to create a piecemeal humanoid that would be called a monster? No, his goal was to create life. The journey to create it and the final result and reception were likely unforeseen in the scientist’s original project plan. While we certainly have not done any grave robbing and hopefully pitchforks are not necessary, the results of the “What we wish we learned in Grad School” session held at the 2020 ESIP Summer Meeting have surpassed our wildest dreams.

In the beginning, we identified a serious gap in the traditional earth science graduate school experience: a major lack of formal data management training. After a survey of our peers, we found that most had the same experience. Our first exposure to thorough data management and the data management lifecycle came from ESIP. As ESIP Community Fellows, we all felt that data management is a critical skill in earth science and we all wished we had an opportunity to integrate it from the beginning in our graduate school experience.

We immediately identified a number of hurdles:

  1. It can be difficult to get graduate students to care about something that does not obviously impact completion of their degree. 
  2. There are lots of resources to train people in data management but they do not necessarily fit directly into the conventional academic timeline of a graduate student. 
  3. The resources that exist can be slightly overwhelming to a data management newcomer.

We knew we didn’t need to reinvent the wheel of data management training but we did need to create a roadmap for new graduate student drivers. We then asked ourselves, “Who better to help us overcome these challenges than the creative and knowledgeable ESIP community?” And that is how our ESIP Summer Session was born.

During the session, we asked participants to identify a section of the DataOne Data Lifecycle they would like to work on in a Steve Diggs-style Do-a-thon.

Each group then answered three questions:

  1. Why is this part of the Data Lifecycle Important to graduate students?
  2. How can this be integrated into the conventional graduate school timeline?
  3. What are the resources graduate students can use for this part of the Data Lifecycle?

We had participants engage with these questions in an iterative process, enabling a peer-review and gap filling exercise by allowing multiple groups to review questions and answers. 

The session yielded a lot of valuable content, but a few themes that emerged were: 

  1. Start early in a graduate student’s career- as we noted previously, it can be difficult to get graduate students to engage with something seen as peripheral to their main academic track so getting buy in early and convincing students of the longer term value to their education and careers is important. 
  2. Integrate training into existing curricula or requirements wherever possible. Finding natural synergies between existing student timelines and data management training will increase the chance of student participation. A short lesson in existing courses or orientations and a data management timeline in a graduate student handbook could cover much of this introductory material and bring it into the standard workings of graduate programs. 
  3. Think about hosting and dissemination and how best to reach the audience with outputs. We discussed where these resources should be archived but, maybe more importantly, best practices for getting them in the hands of people who can use them, such as graduate program directors and university librarians.

The project continues to grow (it even was included in the FSCI do-a-thon!) and we are now working to put the ideas, resources, and feedback together into a Graduate School Data Management Training Roadmap for Earth Sciences. Our hope is that this resource can then be distributed along with example data management webinars to earth science graduate programs.

Want to provide ideas or resources? Contribute to the documents linked below that we created during the do-a-thon. These sections and definitions were adopted from the DataOne data lifecycle to provide continuity between existing data management frameworks and the graduate student roadmap.

  • Plan – “Description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime.”
  • Collect – “Observations are made either by hand or with sensors or other instruments and the data are placed into digital form.”
  • Assure – “The quality of the data are assured through checks and inspections.”
  • Describe – “Data are accurately and thoroughly described using the appropriate metadata standards.”
  • Preserve – “Data are submitted to an appropriate long-term archive (i.e. data center).”
  • Discover – “Potentially useful data are located and obtained, along with the relevant information about the data (metadata).”
  • Integrate – “Data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed.”
  • Analyze – “Data are analyzed.”

In many ways, Frankenstein’s Monster and the Graduate School Data Management Training Roadmap have a lot in common. They are not made up of anything new, but are simply a recombination of elements that make an idea come to life.