📣 Call for Sessions Send in your proposal by Nov 14 for our 2025 January ESIP Meeting: esipfed.org/jan25-sessions

pysheds: a fast, open-source digital elevation model processing library

pysheds: a fast, open-source digital elevation model processing library

Hey ESIP!

I wanted to share a pet project of mine that could be useful for those of you who work with GIS data (and especially for those of you who work with water data of any kind).

Pysheds is an open-source library designed to help with processing of digital elevation models (DEMs), particularly for hydrologic analysis. Pysheds performs many of the basic hydrologic functions offered by commercial software such as ArcGIS, including catchment delineation and accumulation computation.

I designed pysheds with speed in mind. It can delineate a flow direction grid of ~10 million cells in less than 5 seconds. Flow accumulation for a grid of 36 million cells can be computed in about 15 seconds. These methods can be readily automated and incorporated into web services: for instance, a web mapping service like leaflet.js could use pysheds to generate contributing areas for locations of interest on-the-fly.

The library is available at my github here: https://github.com/mdbartos/pysheds

A full feature list is given below:

  • Hydrologic Functions:
    • flowdir: DEM to flow direction.
    • catchment: Delineate catchment from flow direction.
    • accumulation: Flow direction to flow accumulation.
    • flow_distance: Compute flow distance to outlet.
    • resolve_flats: Resolve flats in a DEM using the modified method of Garbrecht and Martz (1997).
    • fraction: Compute fractional contributing area between differently-sized grids.
    • extract_river_network: Extract river network at a given accumulation threshold.
    • cell_area: Compute (projected) area of cells.
    • cell_distances: Compute (projected) channel length within cells.
    • cell_dh: Compute the elevation change between cells.
    • cell_slopes: Compute the slopes of cells.
  • Utilities:
    • view: Returns a view of a dataset at a given bounding box and resolution.
    • clip_to: Clip the current view to the extent of nonzero values in a given dataset.
    • resize: Resize a dataset to a new resolution.
    • rasterize: Convert a vector dataset to a raster dataset.
    • polygonize: Convert a raster dataset to a vector dataset.
    • check_cycles: Check for cycles in a flow direction grid.
    • set_nodata: Set nodata value for a dataset.
  • I/O:
    • read_ascii: Reads ascii gridded data.
    • read_raster: Reads raster gridded data.
    • to_ascii: Write grids to ascii files.

An overview with examples is given here:

Import modules

import numpy as np
import pandas as pd
from pysheds.grid import Grid
import geopandas as gpd
from shapely import geometry, ops
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')
sns.set_palette('husl')

%matplotlib inline

Instantiate a grid from a DEM raster

Data from USGS hydrosheds project: https://hydrosheds.cr.usgs.gov/datadownload.php

# Instatiate a grid from a raster
grid = Grid.from_raster('../data/n30w100_con', data_name='dem')

# Plot the raw DEM data
fig, ax = plt.subplots(figsize=(8,6))
plt.imshow(grid.dem, extent=grid.extent, cmap='cubehelix', zorder=1)
plt.colorbar(label='Elevation (m)')
plt.title('Digital elevation map')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Read a flow direction grid from a raster

Data from USGS hydrosheds project: https://hydrosheds.cr.usgs.gov/datadownload.php

# Read a raster into the current grid
grid.read_raster('../data/n30w100_dir', data_name='dir')

Examine grid

# The flow direction raster is contained in a named attribute (in memory)
grid.dir

Raster([[ 64,  64,  64, ...,   1,   2, 128],
        [ 32,  64,  64, ...,   1,   1, 128],
        [ 32,  32,  32, ...,   1, 128,  64],
        ...,
        [128,  64,  64, ...,  16,   8,   4],
        [128,  64,  64, ...,   4,  16,   4],
        [  8,  64,  64, ...,   8,   4,   4]], dtype=uint8)

# There are approximately 36 million grid cells
grid.dir.size

36000000

Specify flow direction values

#N    NE    E    SE    S    SW    W    NW
dirmap = (64,  128,  1,   2,    4,   8,    16,  32)

Examine grid

# Access the flow direction grid
fdir = grid.dir

# Plot the flow direction grid
fig = plt.figure(figsize=(8,6))
plt.imshow(fdir, extent=grid.extent, cmap='viridis', zorder=1)
boundaries = ([0] + sorted(list(dirmap)))
plt.colorbar(boundaries= boundaries,
             values=sorted(dirmap))
plt.title('Flow direction grid')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Delineate catchment

# Specify pour point
x, y = -97.294167, 32.73750

# Delineate the catchment
grid.catchment(data='dir', x=x, y=y, dirmap=dirmap, out_name='catch',
               recursionlimit=15000, xytype='label', nodata_out=0)

# Clip the bounding box to the catchment
grid.clip_to('catch')

# Get a view of the catchment corresponding to the current bounding box
catch = grid.view('catch', nodata=np.nan)

# Plot the catchment
fig, ax = plt.subplots(figsize=(8,6))
im = ax.imshow(catch, extent=grid.extent, zorder=1, cmap='viridis')
plt.colorbar(im, ax=ax, boundaries=boundaries, values=sorted(dirmap),
             label='Flow Direction')
plt.title('Delineated Catchment')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Get flow accumulation

# Compute flow accumulation at each cell 
grid.accumulation(data='catch', dirmap=dirmap, out_name='acc')

# Get a view and add 1 (to help with log-scaled colors)
acc = grid.view('acc', nodata=np.nan) + 1

# Plot the result
fig, ax = plt.subplots(figsize=(8,6))
im = ax.imshow(acc, extent=grid.extent, zorder=1,
               cmap='cubehelix',
               norm=colors.LogNorm(1, grid.acc.max()))
plt.colorbar(im, ax=ax, label='Upstream Cells')
plt.title('Flow Accumulation')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Extract river network

# Extract branches for a given catchment
branches = grid.extract_river_network(fdir='catch', acc='acc',
                                      threshold=50, dirmap=dirmap)

# Plot the result
fig, ax = plt.subplots(figsize=(6.5,6.5))
plt.title('River network (>50 accumulation)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.xlim(grid.bbox[0], grid.bbox[2])
plt.ylim(grid.bbox[1], grid.bbox[3])
ax.set_aspect('equal')
for branch in branches['features']:
    line = np.asarray(branch['geometry']['coordinates'])
    plt.plot(line[:, 0], line[:, 1])

Get distances to upstream cells

# Compute flow distance using graph traversal
grid.flow_distance(data='catch', x=x, y=y, dirmap=dirmap, out_name='dist',
                   xytype='label', nodata_out=np.nan)

# Return a view of the flow distance grid
dist = grid.view('dist', nodata=np.nan)

# Plot the result
fig, ax = plt.subplots(figsize=(8,6))
im = ax.imshow(dist, extent=grid.extent, zorder=1,
               cmap='cubehelix_r')
plt.colorbar(im, ax=ax, label='Distance to outlet (cells)')
plt.title('Flow Distance')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Combine with land cover data

Data available here: https://www.mrlc.gov/nlcd2011.php

# Read land cover raster
grid.read_raster('../data/impervious_area/nlcd_2011_impervious_2011_edition_2014_10_10.img',
                 data_name='terrain', window=grid.bbox, window_crs=grid.crs)

interpolated_terrain = grid.view('terrain', nodata=np.nan)

fig, ax = plt.subplots(figsize=(8,6))
plt.imshow(interpolated_terrain, cmap='bone', zorder=1,
           extent=grid.extent)
plt.colorbar(label='Percent impervious area')
plt.title('Impervious area')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Clip soils shapefile to catchment

Data available here: https://tnris.org/data-download/#!/county/Tarrant

# Read soil dataset
soils = gpd.read_file('../data/nrcs-soils/nrcs-soils-tarrant_439.shp')
soil_id = 'MUKEY'
# Convert catchment raster to vector geometry and find intersection
shapes = grid.polygonize()
catchment_polygon = ops.unary_union([geometry.shape(shape)
                                     for shape, value in shapes])
soils = soils[soils.intersects(catchment_polygon)]
catchment_soils = gpd.GeoDataFrame(soils[soil_id],
                                   geometry=(soils.geometry
                                             .intersection(catchment_polygon)))
# Convert soil types to consecutive integer values
soil_types = np.unique(catchment_soils[soil_id])
soil_types = pd.Series(np.arange(soil_types.size), index=soil_types)
catchment_soils[soil_id] = catchment_soils[soil_id].map(soil_types)

# Plot the result
fig, ax = plt.subplots(figsize=(6.5, 6.5))
catchment_soils.plot(ax=ax, column=soil_id, categorical=True,
                     cmap='terrain', linewidth=0.5, alpha=1)
ax.set_xlim(grid.bbox[0], grid.bbox[2])
ax.set_ylim(grid.bbox[1], grid.bbox[3])
ax.set_title('Soil types (vector)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

Rasterize soil data in catchment

# Rasterize the soil polygons
soil_polygons = zip(catchment_soils.geometry.values,
                    catchment_soils[soil_id].astype(int).values)
soil_raster = grid.rasterize(soil_polygons, fill=np.nan)

# Plot the result
fig, ax = plt.subplots(figsize=(6.5, 6.5))
plt.imshow(soil_raster, cmap='terrain', extent=grid.extent, zorder=1)
ax.set_xlim(grid.bbox[0], grid.bbox[2])
ax.set_ylim(grid.bbox[1], grid.bbox[3])
ax.set_title('Soil types (raster)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')