Guest Blog: Open by Design – 5 Lessons I’ve Learned About Open Science

ESIP

Aug 07, 2024

Community Fellows

In his guest blog, Anthony Pignatelli shares insights on open science through his experiences as a graduate student at the University of South Carolina and as an Earth Science Information Partners (ESIP) Community Fellow. Anthony has supported the Envirosensing Cluster this year and his interdisciplinary research brings together hydrology, ecology and data science.

The last few years (and decades) have seen a cultural shift in academia and STEM. In 2024, we are overwhelmed with copious amounts of open access databases. There are repositories for data from published papers like the one operated by Dryad or like the government-funded and operated databases like those offered by NEON. Replicability is a key aspect of scientific research and discovery and increasing amounts of freely accessible data is helping to reinforce scientific advancement.

As an early level PhD student in quantitative ecology, I benefit from having this wealth of data. Throughout my educational journey and my time as an ESIP Community Fellow supporting the Envirosensing Cluster, I have learned a lot about what open science really is and how to make my science more open.

Image of a bridge with a blue gradient and the words “ENVISION open and accessible SCIENCE" to encourage open science practices.

Image of a rocky streambed with a blue gradient and the words “Streamline OPEN ACCESS journals & publications.

Image of an icy stream with a blue gradient and the words “Share REPRODUCIBLE SCIENCE through coding & data.”

Image of a glacial valley with a stream with a blue gradient and the words “More with less. DATABASES can be specific.”

Image of a mountain lake with a blue gradient and the words “Collaboration is key.”

Image of a mountain river and dam with a blue gradient and the words “Encourage citizen science.”

Lesson 1: Journal Articles & Publications

I want to start with a relatively hot topic right now in the sciences, the idea of open access journal articles.

The first time I ever read a scientific journal article was during my undergraduate education. The paper was provided to me in a biology course by my professor. Reading this paper, I wanted to learn more and decided to google the topic and find more papers, which led me to Google Scholar and Web of Science (www.webofscience.com/wos/woscc/basic-search). But also to paywalls.

Luckily my university paid for access to the specific journal I needed, but I realized “What if you are not a student or employee at a university?” If science is supposed to benefit society and inform policy makers, then why can’t the average person easily afford or gain access to a particular journal?

Another concern: the price of open access. Many journals charge high fees for choosing to make a paper open access compared to being part of a subscription — usually in the thousands of dollars. There has been a change with some journals lowering fees or foregoing them altogether to publish open access. This is helpful to many graduate students and early career scientists who might not have the grant support to publish in closed access journals. From my experiences, my goal is to publish in open access journals and I have become more conscious of what journals I want to publish in. If we want more young people to stay in academia and train the next generation of scientists, then we need to make the publishing aspect less draining financially and less stressful on early career folks.

Lesson 2: Coding for Reproducibility

In my research, like many ecologists, I primarily use R (https://www.r-project.org/), an open source/access platform for coding with capabilities to work with large datasets and perform various statistical analyses. The coding language itself is easy to learn and straightforward to read, and file formats allow for easy sharing amongst colleagues. Anybody can write code, but how many can re-read it and understand what is going on?

When I first began coding in a quantitative biology class in undergrad, my R scripts were a chaotic mess. Line spacing and functions were all over the place and usually only made sense in my head. As I worked through my masters and then started a PhD program, I realized I needed to be more aware of how I write code. At the University of South Carolina, my advisor, Tad Dallas, teaches a class on coding and open science called Ecoinformatics (https://ecoinformatix.github.io/), which he makes publicly available.

The biggest thing I have learned so far is regarding annotating my lines of code, especially functions. In R, you can write your own functions to perform analyses. While the function I wrote makes sense to me, if I do not document what it does, then how can I expect someone else to know? When writing a function, you need to explain the arguments and parameters involved. Just simple documentation of your code, can make it easier for others to replicate what you did. As I look back at the code I wrote during my undergrad and masters, I realize I was not applying this simple practice and even the code I wrote a year or two ago makes almost no sense to me.

The takeaway here – Document your code!

Lesson 3: Databases (So close yet so far)

Open access databases with copious amounts of usable data are sometimes difficult to come by and they contain different data types.

GBIF, for example, contains mostly presence and absence data of various taxa collected by researchers across the globe. While NEON is funded by the US government and contains datasets of regularly sampled abiotic and biotic parameters ranging from water quality and meteorological data to organismal data such as aquatic macroinvertebrates and terrestrial insects. The hard part of doing quantitative ecology is seeing if the data exists to answer the question I’m asking and what is the resolution or duration of sampling. Data from NEON goes back to 2014, so I cannot really ask decadal scale questions. Datasets in Dryad, Figshare or GBIF may only contain data for a couple years.

Since I am interested in ecological community change over time, having long-term datasets is crucial. I am fortunate enough that the University of South Carolina (USC) has the Baruch Marine Field Station where they have a 40-year timeseries of zooplankton community data that is sampled every two weeks. With this resolution and duration of sampling, I can get at more long-term trends in community dynamics. But, if I did not attend the USC, then I might not ever have known about this dataset as it is not readily available.

New databases are popping up every year. For example, in the next year or two a new database for freshwater ecological stoichiometric data will become available. This is made possible due to an NSF-funded group, the Ecological Stoichiometry Cooperative, which I was part of during my masters program. The goal of this cross-institution and interdisciplinary group was to compile all known available datasets containing elemental content of freshwater organisms into one open access database.

In a way, this might not be different from something like Dryad or Figshare, but the idea here is that this database is subject specific. I would know to go to this database if I was looking at nutrient dynamics in freshwater systems. Sometimes less is more and streamlining data accessibility could mean smaller, more targeted collections.

The ESIP community has called for and supported domain-specific repositories for niche communities for decades. Coverage for certain scientific communities is not on equal footing, however. For example, terrestrial data are more prevalent and have been going on longer compared to aquatic databases.

Lesson 4: Collaboration is Key to Open Science

Globalization opened new avenues of collaboration between researchers. Research groups can now span multiple cities, states and even countries. Being able to communicate to people far distances away in seconds has grown our circle of networking. Sharing research no longer has to be restricted to journal publications or catching up at annual conferences, instead collaborations are just an email or Zoom call away.

In graduate education, it is not uncommon to move away from home and even move to a new country. My educational journey has taken me from Pennsylvania to Ohio to Arkansas, and now South Carolina. During this time, I have widened my network of colleagues and have been able to meet with people outside of my academic field.

Collaboration fuels interdisciplinary research. For example, the Ecological Stoichiometry Cooperative, is not made up of just freshwater ecologists, but includes statisticians, evolutionary biologists and even artists and art historians. In addition to the breadth of subject diversity, the group is spread out across five different universities from Alaska, Wyoming, Nebraska, Arkansas and Vermont. This allows for different perspectives to tackle ecological questions.

ESIP is another great example of collaboration across subjects and geographies. I am the Community Fellow for the EnviroSensing Cluster. Something that we have been discussing recently is finding new ways to broaden our outreach by collaborating with the Education Cluster. Additionally, the Cluster has been developing environmental sensing kits that collect basic data such as temperature and soil moisture. These kits are being tested with the goal for citizen scientists to use this equipment. This is a great example of bringing together people of different backgrounds to exchange ideas.

Lesson 5: Coming down from our Ivory Towers

Community outreach is important to build trust. But you know the stereotype: The absent-minded professor so consumed by their research that they don’t know how to talk to anyone else.

A problem that academics have, including myself, is that we are consumed by jargon. In journal publications, it almost seems like a competition as to who can out jargon the other. Each niche subject contains its own set of jargon with some terms meaning something completely different in other fields. While speaking in jargon to fellow researchers can speed up collaboration, it certainly is an issue if we want the general public to understand our work.

Pop science writing is something I have seen showing up over the years. I remember even having an assignment in a climate change class in undergrad where we had to take a journal article and summarize it without the jargon. We can learn from news articles attempting to summarize oftentimes confusing and complex studies for the public. I think as researchers and data managers, we need to remember that a lot of the work we do will affect other people.

How do we do it? More public engagement. Here at USC, we host the STEM Heroes fair each year. This is a chance for K-12 students to engage in fun scientific activities that they might not get in school and for me I enjoy engaging younger students who might be inspired to go into a research career.

My journey through academic research has shown me that making science open and accessible is very important. These five lessons are some of the biggest that I have learned over the years and encourage readers to follow these and think of new ways to make science open. From publications to public outreach, there is much more we can do to bring open science practices into ecology, Earth science, informatics and data science.

Tagged as: Earth science data Ecology Hydrology Open Access Open Science

Share This Post