The Community PROV Challenge is an effort to find creative solutions to Earth data challenges through ideation and prototype development.

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. Source: https://www.w3.org/TR/prov-overview/

Use and adoption of the W3C-PROV standard has blossomed over the last few years. The US Geological Survey (USGS) is interested in exploring how these new community-based provenance and annotation capabilities enhance scientific integrity throughout the  data lifecycle. To do this, the USGS has partnered with the ESIP Lab to find creative solutions to PROV challenges.

To learn more about the motivation behind the challenge, visit Sky Bristol's blogpost here

The first stage of the Community PROV focused on ideation. Individuals contributed ideas to the IdeaScale platform answering the question, “How would YOU improve how W3C-PROV systems interoperate across agencies or institutions to enable a more complete picture of provenance?” Ideas were then voted and commented on, creating further dialogue around the challenge question. A follow-on breakout session was convened at the ESIP Summer Meeting, June 2017 in Bloomington, IN. The author of the idea that received the most ‘up votes’ received paid travel to the 2018 ESIP Winter Meeting.

Outcomes:

  • 17 Ideas Posted
  • 27 Comments
  • 35 Votes
  • 41 Individual Participants

Most popular idea: Visualization of Provenance Traces

Submitted by Tom Narock: Some simple visualization tools that will graphically show the lifecycle of a dataset would be very helpful. I'd suggest a web-based visualization service (perhaps using D3) that can aggregate related PROV and visualize the resulting data lifecycle. The service would dynamically generate a list of all datasets it knows about, users would select one, and the service would visualize the provenance. Additional features might include the ability to graphically compare two or more provenance traces and highlight differences.

How exactly this is implemented is going to depend on the underlying inter-agency architecture and structure of the provenance documents.

The ESIP Lab sought proposals from qualified teams to develop, extend, or fully test a prototype community-mediation capability for provenance and annotation generated and exposed by disparate sources but summarized, synthesized, or distilled into tractable forms for community use. Prototypes were based on a distributed system where organizations, from data centers to analytical labs, have different means of and underlying technologies for generating and storing PROV and annotation, but expose compatible APIs that follow a constrained set of standards and/or conventions using the W3C specifications. The ESIP Lab awarded one projects $15,000 to create a prototype solutions that was presented at the 2018 ESIP Winter Meeting.

Outcomes:

To culminate the Community PROV Challenge and create a bridge to ‘what's next' in the world of improved provenance and annotation at an interagency level, we are organizing an in-person synthesis working group. The workshop will be held at the eScience Institute at the University of Washington, March 27 – 29. Partnering with eScience for this workshop will enable unique interactions between workshop participants and the cross-disciplinary mission of eScience to engage researchers across disciplines in developing and applying advanced computational methods and tools to real-world problems in data-intensive discovery.

 

For questions about the Community PROV challenge, please email annieburgess@esipfed.org.