There's an old saying that when the only tool you have is a hammer, everything looks like a nail. I know, it's cliched and painfully overused, but stay with me here. In the world of working with scientific data, and processing data using a computer and a programming environment, this–cliched or not–is the situation many young scientists find themselves in. In coming up through academia on a scientific path (rather than engineering or computer science route), we find ourselves frequently working with data and programming languages without having had the fundamental training in software best practices. This results in a lot of clunky code, and also in scientists wasting their time doing things the hard way–using a hammer for every task they need to perform on their data.
The guy who originally came up with the Hammers/Nails metaphor.
Many young scientists find themselves in an even worse situation: with a very complete theoretical understanding of hammers, and of how they are manufactured, but nothing to actually bang nails in with except for a nearby rock (let's call that rock “excel”). Which makes for data analyses that are fragile and not easily reproduced.
How you feel when Excel freezes on your hundred-thousandth row
In my efforts to work with remove sensing and geospatial data, and my interest in finding practical methods of working with the tsunami of data soon to come from unmanned aircraft, I was immediately drawn to the idea of a Software Carpentry Workshop; a place where I could learn about all of the fundamental skills and tools that I had missed along the way. With a generous travel grant from the University of Alaska Fairbanks's Graduate School, I was able to attend a Software Carpentry Workshop at the University of Washington. Here I am excited to report on the experience.
As a little background, the Software Carpentry Workshop is hosted by a partnership of individual organizations (like University of Washington) and a non-profit, The Software Carpentry Foundation, that is trying to promote computer skills in the sciences and social sciences. The former provides the facilities, and the later provides the instructors and lesson plans.
I have to admit that I approached the Software Carpentry Workshop with some trepidation about how useful it would actually be and how much I would learn. I have attended other workshops in the past where I left feeling like I was “lectured at” and hadn't actually learned any practical skills. The Software Carpentry Workshop was entirely different. Every aspect of the workshop started from a very low level and built up practical skills slowly, with the class writing code and completing exercises on their own computers in real time. Everything that was said in lecture was a task that you also performed, forming a lasting memory and a very practical hands-on skill set.
The workshop itself spanned four sessions: Bash programming, Python, Git and version control, and SQL for databases. These were very well selected as each addressed important practical needs in handling data in the sciences. Bash programming allows you to write routines that sift through and organize files in a Unix environment, which is very important for handling simple repetitive tasks. Python is used for more complicated data analysis and for writing custom scripts for processing data. Git and version control allows a research group to manage the latest version of a document or program and see each other's contributions–think of the Microsoft Office “track changes” feature but much more robust and powerful. And lastly, SQL is the query language for working with databases for when organizations need to archive and access data.
I was impressed, not only with how these were taught, but with the selection of those four topics. Each one provided a new skill for researchers to place in their toolkit to implement in their own research and they were mutually independent; our comfort level with one session did not impact our ability to learn in the subsequent session. So the fact that I was groggy and chugging coffee during the morning session on Git and version control, did not impair my ability to learn about SQL and databases that afternoon.
The Teaching Method
Software Carpentry uses an interactive teaching model with both an instructor and a number of roaming helpers. Each attendee is given two sticky notes at the start: a red one and a green one. As the lessons continue, if things are going well and the attendee is following along and completing the exercises, they attach the green sticky note to the top corner of their laptop screen so that the instructor and helpers can see that they are doing okay. If the attendee is confused or having trouble with the exercise, they replace the green sticky with the red one, and a helper comes over and gets them unstuck. This, for me, was much better than the classroom model where a student is expected to raise their hand and interrupt the lesson if they are having trouble. For a student in a classroom, they have to weight whether their question is sufficiently important to disrupt the class, and if they are sufficiently lost, they may feel too much shame to call attention to it, only getting more further behind in the process. The Software Carpentry model fixes this problem, not only by allowing people to ask for help shame free, but also by deliberately pacing the lessons with the assumption that people will get confused and time will need to be allocated for that.
“Okay, I'll ask my question, but don't look at me…”
One idea that lingered with me throughout the workshop was how immediately useful this could be to other students and researchers in the sciences. As I thought about it, I realized that there are two outlets where I could help this progress. First, I would love to bring software carpentry workshops to the University of Alaska Fairbanks, where I am attending. The University could partner with The Software Carpentry Foundation (just as UW did) and host these workshops, potentially as often as once per year. The other possible outlet is through the Federation of Earth Science Information Partners (ESIP Federation), where I am a student fellow. A software carpentry workshop breakaway session could easily be implemented into one of their existing twice-annual meetings. I was left very impressed with how practical this workshop was and excited to implement the things I learned in my own research and to teach them to others.
Obligatory “road forward” picture
Now fast-forward six months to today, where I am still not only enthusiastic about software carpentry but I am happy to report that I am currently undergoing instructor training from the Software Carpentry Foundation to become a session coordinator/instructor. Additionally I am training to coordinate/instruct for the Software Carpentry's sister organization Data Carpentry–with a similar aim but more specifically focused on data.
This past week I even had the privileged of attending a Data Carpentry Hackathon in Boulder Co (hosted by NEON Inc.) where a number of researchers who work with geospatial data gathered to develop a Data Carpentry workshop for working with this unique data type.
I believe that Software Carpentry and Data Carpentry provide a much needed service to scientists and students, both with their revolutionary teaching model and their practical lessons. I am thrilled to be getting further involved with these organizations and I believe that there is a great deal of overlap between their missions and that of the ESIP community at large.
More information can be found at http://software-carpentry.org/ and http://www.datacarpentry.org/.