Guest Blog: The Sprawling, Dynamic and Powerful World of Python for GIS
Python for GIS is not a game of snakes and legends. Geospatial programming is a core set of skills in the Earth Science Information Partners (ESIP) community. In his guest blog, Community Fellow Jake Gearon from Indiana University Bloomington shares his opinion on tools that smooth programming flows and mapping outputs. He focuses specifically on Python for GIS.
Geospatial knowledge is enhanced by our ability to use programming workflows to more deeply understand Earth phenomena like the recent flooding in northwestern Madagascar. Credit: NASA – MODIS, Jan 29, 2023
This post is for everyone, but especially for those who turn to Python first when encountering a problem. The rapid rise of GIS as an in-demand skill set, combined with the ever-steady improvement of programming languages and their ecosystems, has engendered a vast array of tools for the GIS end-user.
While there are more fit-for-purpose GIS solutions, ArcGIS and QGIS are heavily used throughout academia, industry and government roles. Let me be clear: these applications are incredibly useful and should continue to be implemented in workflows! However, as a self-proclaimed Python nerd, I constantly find myself asking how I might circumvent aspects of the GUI or automate common workflows.
There are well-rounded and robust Python APIs for both of these services (ArcPy and PyQGIS). But I found myself wanting more control over the details of operations, especially when it came to data input/output and interfacing with large, cloud-hosted datasets.
So, if this rings true to you, buckle in, let's learn about the world of Python for GIS.
As this blog post is aimed at a more technical crowd, I am intentionally glossing over the need for a package manager like Conda and the indefatigable Jupyter Notebook. If these concepts are new to you, I highly recommend starting there first before jumping directly into Python for GIS. Please note: this is not an exhaustive list, merely a reflection of my personal experiences, workflows and opinions.
Want to learn more about more #EarthScienceData tools?
Join one of the ESIP Collaboration Areas!
- Cloud Computing Cluster – Meets monthly for presentations, working sessions, and hands-on tutorials to bring cloud-based science down to Earth.
- IT&I Tech Dive Webinars – News-you-can-use discussions led by the Information Technology and Interoperability Committee alongside the USGS Community for Data Integration.
- Envirosensing Cluster – The ESIP Collaboration Area that Jake supports as a Community Fellow. Fewer #GeoPython talks, but there are many facets to Earth science technology and computing, especially as they relate to real-time sensors.
Python for GIS: Local Ops
Geopandas is likely my favorite Python package of all time.
It extends the incomparable dataframe utilities of the juggernaut Pandas to geospatial data, natively handling projections and common geometries (point, line, polygon) in geodataframes. It is my first-line tool for any tabular/vector geospatial data (shapefiles, geopackages, CSVs, etc.) and can read and write a shocking number of file types, even the amazing Feather format!
Rasterio is to raster data what Geopandas is to vector data. The package turns a raster into a Numpy array, making multi-band matrix operations like the Normalized Difference Vegetation Index (NDVI) astonishingly straight-forward.
Finally, Leafmap, a Jupyter-based interactive analysis framework created by Python for GIS guru and community legend Qiusheng Wu. Leafmap puts the pieces together, enabling easy visualization and data I/O. Some of its key features include:
- adding XYZ, web map service (WMS) and vector tile services to the map
- converting CSV to points
- adding local vector and raster data
- performing full-stack geospatial analysis using WhiteboxTools
Leafmap Demo, courtesy of Dr. Qiusheng Wu (@qiswqs)
Python for GIS: Cloud-Based Ops (Google Earth Engine)
In my opinion, Google Earth Engine (GEE) currently stands unparalleled in the cloud-hosted GIS space. The sheer amount of compute hours and data-hosting services Google provides for free (at least for noncommercial use) is mind-boggling. We are talking about terabytes or more of datasets hosted and seemingly endless amounts of compute delivered to the research community free-of-charge.
While I personally am not into Big-Tech worship, Google Earth Engine gets a pass; it is remarkable. This is not to say GEE has no competition, up-and-coming platforms — most notably Microsoft's Planetary Computer — are quickly proving their ability to go toe-to-toe with Google's service. However, GEE is the darling of the Earth science remote sensing community, as can be seen through the newly released open-source book: “Cloud-Based Remote Sensing with Google Earth Engine”, a truly herculean communal effort to warehouse tutorials and information on GEE in a FAIR way.
The benefit of working within a cloud-based system like GEE is that you are no longer yoked to the RAM available in your 2018 Macbook Air that you spilled coffee on in college. It is quite simple to do global analyses on massive datasets.
For example, calculating a slope raster from a 30-meter Digital Elevation Model (DEM) for the entire world is not only feasible but trivial in GEE. This is because the good people at Google's engineering team know what they're doing when it comes to parallelizing workflows for inordinately large geospatial datasets. They've been doing it for a while (hello, Google Maps!).
However, this is no reason not to learn GEE. The benefits outweigh any start-up inertia. As the software and its community has matured, though, GEE has only become easier to use.
This is where geemap comes in. The precursor to the aforementioned leafmap, geemap acts as a high-level Python interface to Google Earth Engine's massive data and compute resources. In my opinion, geemap stands as a triumph of Pythonic open-source software development. Prior to its release, the GEE python ecosystem was…underdocumented…to put it charitably. Geemap is exceedingly well-documented; check out the tutorial page. One of my favorite use-cases to demonstrate is the ability to quickly visualize and export data right in my notebook environment.
In conclusion, whether you're working locally or in the cloud, these packages form the backbone of a powerful and versatile GIS workflow in Python. Happy mapping!
Geemap Demo, courtesy of Qiusheng Wu (@qiswqs)
Learn more about our ESIP Community Fellows!
Extra Python for GIS Tools
Pangeo is an open-source project and community that aims to bridge the gap between research scientists and the publicly available data that they use. One benefit is the integration of cutting-edge open-source data engineering tools like xarray and dask. GEE may not always be available, so learning fully open-source alternatives is a good way to future-proof research methods. Check out the Q&A with Pangeo leader Ryan Abernathey.
Planetary Computer is Microsoft's new cloud-based platform for scientific exploration and research. It provides centralized access to planetary data, powerful data analysis capabilities, and APIs for easy integration into research workflows.
openEO is an open-source initiative that provides a standardized and easy-to-use API for cloud-based Earth Observation data processing, making it more accessible to a wider community of researchers, developers, and domain experts. Check out the IT&I Tech Dive Webinar with openEO.
Jake Gearon wrote this blog post a part of his ESIP Community Fellowship. Allison Mills and Megan Carter edited the piece.
ESIP stands for Earth Science Information Partners and is a community of partner organizations and volunteers. We work together to meet environmental data challenges and look for opportunities to expand, improve, and innovate across Earth science disciplines.
Learn more esipfed.org/get-involved and sign up for the weekly ESIP Update for #EarthScienceData events, funding, webinars and ESIP announcements.