Guest Blogger
Submitted by administrator on Thu, 10/06/2011 - 09:18
Editor's note: This post was submitted by Brian Wee, NEON as a guest post from a Type I ESIP. Thanks Brian! NEON has a blog too.
A fair number of us have spent at least parts of our professional lives fretting over data representation. How do we accurately capture real-world processes in a way that captures the relationships between entities and manage the data such that it is amendable for processing? Can we sufficiently represent the state of ‘reality’ with sufficient fidelity so that we can resuscitate a representation of that reality on demand? How do we enable technical interoperability to facilitate data discovery and retrieval, semantic interoperability to facilitate data integration, plus all the other forms of interoperability that one may choose to define?

In the sci-fi movie “Another Earth” that was released at the 2011 Sundance Film Festival to high acclaim, a duplicate earth appears in our skies, apparently populated by human duplicates of this earth. If not for the moral implications (even if there were no humans!), an earth replica would be excellent for conducting large-scale experiments to answer questions about climate mitigation, adaptation, and geo-engineering. In the absence of such a capability to create Earth 2.0, we are nevertheless on a long-term trajectory towards observing and virtualizing the environment at different geospatial scales: from the soil microbial community, to intensely measured sites, to high-resolution landscape-scale measurements, to global satellite-based observations. The NSF NEON infrastructure, currently under construction, encompasses observations at microbial to landscape scales (with over 500 primary measurements acquired at each of 60 sites), using satellite-based observations for the interpolation of variables of interest across the US continent.
Reality has been happily humming along just fine for eons, the difficulty lies in modeling it at the relevant scales of time and space. This is especially challenging when developing models to integrate natural processes operating at various temporal and spatial scales. When we measure and capture environmental data, are we doing it at the relevant temporal and spatial resolution? How do we formulate archival policies that do not inadvertently eliminate data that may at first seem irrelevant but that we discover to be useful later? The business and military intelligence community invests resources to deal with unstructured data because reality is inherently messy and there is a great need to obtain analyses and assessments in a timely fashion: to what extent are these approaches applicable to observational and experimental environmental data?
This community is, knowingly or not, involved in the business of virtualizing reality. In the Tron universe, “Users” were digitized into a virtual world by a laser (my favorite Tron object: the troop carrier!). This community is collectively building an equivalent instrument that comprises physical infrastructure, cyberinfrastructure, standards, applications, tools, and best practices to digitize slices of the “reality cube” (with reference to spectral data cubes and the proposed NSF Earth Cube) into a form that can be manipulated for scientific understanding and forecasting. At the end of the day I would really like to see us build something (with a benign MCP please... we know what happened in the Tron universe...) that will give us a virtual troop carrier for cruising around on Earth 2.0!
Submitted by administrator on Tue, 01/05/2010 - 16:01
The opening plenary of the ESIP Winter meeting at DC is just underway.Listening to Michael Freilich, the director of NASA's Earth Science division.He's talking about how the measurements we make about the Earth need to inform the models that can tell us the longer-term picture of the Earth's climate.Sea level rise is a combination of adding water and heating the water (about half from each today)."Snapshots for most earth systems don't work." We need longer-term measurements.NASA makes its data available freely. Recently, the European Space Agency has also been moving to the NASA position.Helen Wood, Senior Advisor for Satellite and Information Services at NOAA takes the podium. She mentions that the NASA open data policy has helped NOAA to open up its data policies. NOAA is looking to build a National Climate Service that can pull together all of NOAA's efforts in this area. NOAA is also interested in sustainable fisheries and sustainable coastal communities, as well as weather forecasting and science. Pai-Yei Whung, Chief Scientist at the US EPA has stepped up to the podium. The EPA needs to quantify the impacts of its regulations to assure that they are beneficial for society. The data to information (through tools) to decisions and assessment requires the best available science practices, including observational data analysis.In the Midwest there were two 500-year floods in 15 years (1993 and 2008). Increased water flows can cause sewer systems to fail. The increase of the likelihood of reoccurring sewage incidents in the water system due to climate changes may require new standards for sewer systems.Dr. Whung notes that "Community is KEY" to the AirNOW effort at the EPA. What are the emergent Internet-based tools that would help the EPA to grow this community effort? Bryant Cramer, Associate Director for Geography, USGS has stepped up to the podium. He notes that it is the uncertainty in the physical models for the future that earth science decision makers need to manage to be able to drive processes/policies such as Carbon cap-and-trade. The critical first step is to take earth system models to a new level of certainty. He also noted that when USGS stopped charging for Landsat scenes the amount of data delivered skyrocketed.
Written by Bruce Caron
Submitted by administrator on Thu, 12/10/2009 - 23:00
Tom Cheatham (U. of Utah) and Tim Clark (Harvard Medical School), the final speakers at the IEEE escience 2009 meeting at Oxford University present the two sides of the escience spectrum. Cheatham’s work on bio-molecular modeling can consume as much HPC and advances in programming as the planet will permit, and then demand more. The outputs from his model runs could fill high-speed fiber for weeks. His graduate students abandon terabytes of information (soon to be petabytes) when they move on. Clark’s work on collaborative publishing, with semantics under the hood to resolve the complex structured tagging that can make this cross-community involves scientists using web services to alter their workflow in ways that make their output far more discoverable, useable, and available to non-scientists. In the middle of the two projects are computer scientists solving real problems and developing the requisite standards. Both speakers noted the need for multidisciplinary teams to keep their work nimble and linked to the larger picture of medical knowledge. Because medical knowledge, from new pharmaceuticals to new treatment plans, can save lives, the resources devoted to this are time sensitive. Cheatham looks to extend the time-scale for modeling organic molecular interactions, and so foster new information on drug interactions. Clark looks to socially network a range of medical sub-disciplines to quicken the pace of knowledge transfer with these communities. Without knowing it, they are working in concert on two ends of the same problem. The technical and the social sides of escience are like the two strands of the DNA helix; we need them both. In the world of funding, big iron HPC infrastructure may garner the most funding today. But the value of these investments can only be realized when agencies also devote sufficient funds for the software and theoretical/practical tasks of rebuilding science as a social knowledge engine. This is something NASA discovered when it first funded the ESIP Federation. It's been great seeing the many sides of eScience. Thanks for reading! Love to get your comments.bruce
Written by Bruce Caron
Submitted by administrator on Wed, 12/09/2009 - 23:00
This data-intensive science paradigm is also a feature of the emerging datafullness of the object of study. Satellites and sensorwebs, CCTVs and Streetviews, MRIs and CAT scans, Facebook and YouTube-- what we study is no longer data poor, but increasingly data-full. The question is no longer one of how to scrape up enough data to create a study, but rather how to winnow the emerging data deluge. Sociologists can no more ignore the data available from online social networks than meteorologists can ignore an emerging Mid-Atlantic tropical depression. In his talk at the IEEE eScience meeting, Jeff Dozier also mentioned that earth sciences are entering a new task horizon. In the1800-1900s, the earth sciences were discipline oriented sciences. From the 1980s+ we saw the development of earth system science. Emerging now: earth knowledge in service of policy to address planetary risks, such as climate change. The eScience challenges are many here. The increase in observational data make it possible to refine the resolution of climate models, which push the limits of available HPC resources. The data processing algorithms designed for science must be made robust enough to sustain resource and environmental enforcement decisions. New venues for communication between scientists, data providers, and policy decision makers need to be supported and used. This is a real opportunity for organizations such as the ESIP Federation to become active forums for problem solving. Microsoft Research’s 4th Paradigm ebook is available under a CC license here: http://bit.ly/5fs21q
Written by Bruce Caron
Submitted by administrator on Tue, 12/08/2009 - 23:00
This is a note about meeting organization. The All Hands meeting was (and I would guess the IEEE meeting will be) a standard meeting type (too many plenaries, breakouts are PPT frenzies, people trying to network in the 15 minute breaks). 300 e-science experts listening (emailing) while a panel of four experts talk about a ten-year plan for e-science. How much better it would have been to set up 40 tables and have ten times the discussion and perhaps a chance of 10 new ideas? This room is ripe for a charrette! Also... a twitter-stream on the two video panels would have livened up this place. The four panelists have a median age of (I guess) 69. Not that that’s bad, per se. However, there is a decided lack of young turks in this discussion. The moderator asked them another question... they are rolling through this exercise while the room nods off. It is 5:46 and we’ve been at this since 9 am. Question to the panel: The incentive models for sharing are not there. How do we change the social/cultural ways of doing science? Answer: we do need to change this culture. But there is a long road to get there. We have seen a rapid change to data sharing in the life sciences. The implication is that other disciplines can follow the lead of the life sciences. Back to meeting notes. As we sit here, the Copenhagen climate meetings are happening, and we all need to be aware of the impacts of travel on our carbon footprints. At the same time, we need to remember the value of face-to-face interactions. Face-to-face meetings need to change to reflect and attain their real value. The ESIP Federation takes this very seriously, and has an opportunity to be a leader in meeting technologies.
Written by Bruce Caron
Submitted by administrator on Mon, 12/07/2009 - 23:00
Software as a Service and Software as a Science: keynote by Tony Hoare (Microsoft scientist from Cambridge). http://research.microsoft.com/en-us/people/thoare/ The e-Science effort in the UK was to ensure that digital information technologies would have as great an impact on the practice of science as it was having in telecommunications, entertainment, and other aspects of society. In the human genome project, the people who were funded did not promise to cure a single patient in the first 15 years. The notion was that the overall knowledge gain was so significant that future advances in medical knowledge would ensue. In the same way, the growth of digital tools in science will not necessarily pay-off in the short term, but will build, over time, those new tools that will move science to a new level of capability. The computer engineers that are engaged in e-science research are not just of service to “real scientists” but are also engaged in a real engineering science. And so Professor Hoare argues that the software products are not just a service to others but also the outcome of a science as “real” as chemistry or physics. Having browsed the booths and the breakouts, I can say that the entire meeting, 600 people talking and listening for 5 days, rolls on three wheels: high performance computing (and pooled data storage), and the means to distribute this capability for scientists in multiple locations; science tools and services built on top of this data/computing network; and collaboration practices that promote and manage a range of sharing from data sharing, to shared experiments, to the (open access) publication of results. The engineering of the HPC infrastructure and the building of the services on top of these are not the real transformative levers of e-science. They mostly add efficiency and distribute resources more widely, so that science does not need to happen in a few concentrated locations (research labs at selected universities and corporate locations). This distribution of effort extends regionally, and eventually, globally. But this capability and the tools that allow its use replace similar tools that scientists at selected universities already use. The promise of new collaboration practices is where e-science has the potential to transform science in ways that are both intended and unintended. Last evening after dining at “high-table” at Christchurch College, I had a spirited conversation with a fellow on the phenomenon of Wikipedia. He was astonished by the amount of trust that users had in the quality of Wikipedia. I countered that the main value of Wikipedia was its ability to cover an amazing number of topics, far more than any previous encyclopedia. The real value of Wikipedia was its range, I proposed. This value was achieved the only way possible: by reinventing the role of the author/editor. Similarly, e-science will gain its promise only when it reinvents what it means to do science; who can do it; how it’s reviewed; where its published; how it’s used. Very little of this promise will simply grow from improvements in HPC and tools. Much of this will emerge as new users and new collaborative opportunities arise.
Written by Bruce Caron
Submitted by administrator on Sun, 12/06/2009 - 23:00
Anne Trefethen from Oxford is opening up the All Hands e-Science meeting. 186 submissions for presentations shows the growth of interest and activity in the UK for e-Science research and practice. The meeting is on the outskirts of Oxford, at the football (soccer) stadium conference center. Next door (across the parking lot) is a bowling alley and multiplex cinema. No building older than 50 years anywhere in the vicinity. So the location looks more like Oxnard than Oxford. The crowd is appropriately geeky in an academic fashion. The opening keynote (Helen Bailey) is a dancer, talking about e-Science on practice-led research. Where does e-Science lie in the larger field of technology? Is it simply science research informatics? Is it centrally HPC? Is it science 101 (hint... ASCII)? The “e” stands for “electronic,” an extension from e-mail and/or e-commerce; both of the latter refer to internet-enabled transactions. Much of the “e” in e-science involves the use of networks of computers to enable collaborations across locations. The research “transactions” flow beyond single laboratories/universities. Helen Bailey uses e-Science to build co-located dance performances where their are dancers from multiple locations in a single dance arena (using video feeds). This research focusses on the synchronous capabilities of an HPC network to support multiple video feeds in order to assemble a real-time event. Helen’s website: http://www.beds.ac.uk/departments/pae/staff/helen-bailey Photo Credit: http://www.arts-humanities.net/system/files/images/edance.jpg
Written by Bruce Caron
Submitted by administrator on Sun, 12/06/2009 - 23:00
Tom Rodden is looking at the history of e-Science, moving from infrastructure to collaborative tools (e.g., MyExperiment). After all the digital world is in the foreground of their lives. 1.5 billion Internet users in 2010. The more that our lives are performed on digital platforms, the larger footprint we leave. Google uses this footprint to target advertising. The next stop is uniquitous computing lifestyle.Hew then do we build a contextual footprint as a conscious activity. Computers will be able to sense human activities and use this sense to enable new forms of interaction. Some gathered quotes: “Half the world’s people have never made a phone call: 1990s.” “Half the World will use a Mobile phone by 2010.” “By year end 2012, physical sensors will create 20 percent of non-video internet traffic.” (Gartner group). Mobile phone use becomes a means of credit rating in countries with little credit history. Tom looks at the technology of amusement parks, where research is creating “fear sensors” that help park rides maintain an optimal amount of terror for each customer. Digital location services will help people find and share transportation services in real time. When DARPA released 10 red balloons, the main challenge was to create the reward system to get enough people to work together. Crowd sourcing: ReCaptcha and the search for Steve Fossett are examples of crowds enlisted for a common good. These are just the beginning of public engagement in digital crowd activities. As we become ever more embedded in digital activities, we need to remember: “What matters is not technology itself, but its relationship to us” Mark Weiser and John Seely Brown (1996). Rodden is wary of the imbalance of knowledge/power when digital services can collect an ever widening swath of information about our human endeavors. How do we track this information flow? How do we resist?
Written by Bruce Caron
Submitted by administrator on Sat, 11/21/2009 - 23:00
In a couple weeks I will be off to Oxford, England for the All Hands eScience and IEEE eScience joint meeting. I'm looking ahead to blogging and Tweeting about what is happening there. I would guess that most of the ESIPers will be headed west to the AGU meeting in San Francisco. eScience is a big topic, and it covers a lot of ground, from informatics to the governance of virtual organizations. A lot of ESIPs are already supporting eScience through the ESIP WIKIs and the SOAP services they provide for data access an manipulation. So... what's next in eScience. That's what I'm looking for. John Wilbanks from the Science Commons had a great quote recently: 'If we can lower the cost of failure and increase the interconnection and discoverability of the things we actually know, it's one of the only non-miraculous ways to systematically increase the odds in our favor to discover drugs, understand climate change, and generally make good choices in a complex world,'http://bit.ly/70kZvm
Written by Bruce Caron