In our previous blog on Geospatial Workflows, we went through the process of creating geospatial workflows to estimate a city’s population.
To summarize, here are the steps we followed on a Jupyter Notebook to estimate the population of Fort Portal, the tourism town of Uganda at a 100x100m granularity:
- Load GeoJSON and population data
- Polygonize: Use the Python convert raster into a vector image
- Gridify: Breakdown the tiles into 100x100m resolution
- Load building footprints dataset: Buildings are used as a proxy for the human population
- Compute building statistics for each grid
- Remove extra grids and adjust the population
- Compute population at the grid level
Implementing the principles of spatial data science and following these steps, our map looked like this:
Scaling the Solution
Let’s imagine you want to find the population of 100 cities with the help of Geospatial AI techniques. It’s a Herculean task to execute the commands one by one on a Jupyter notebook right? Maintaining a separate copy for each instance is also a tedious task. Can we automate this and make our workflow better? This is what we will explore in this post. First, let’s look at the problems we faced with the Jupyter Notebook.
Limitations with Jupyter Notebook
Version Control: Let’s face it. Jupyter Notebooks are great for executing chunks of code, but when you’re dealing with different versions of the same, it becomes difficult. We can reduce the difficulty by using nbdime, but you will add up to your commits each time you open a Notebook or click on a cell.
Scalability: For every city, you want to find the population for, you have to repeat the whole workflow and manually run the cells one at a time. The next alternative is to move everything to a Python script and save the images to inspect later (But I don’t like this alternative either).
How can we do better?
Papermill gives us superpowers to parameterize and execute Notebooks. How does it solve the problems mentioned above?
First, we always clean up the output of the parameterized Notebook before committing the code. This keeps our workflow and commit messages sane.
Second, we can execute Notebooks by running Python scripts. This will have our Notebooks ready and baked for further visual analysis.
Third, this workflow also makes it easier for us to spot bugs and fix them in place. (Don’t just take my word for it, try them out yourself!)
How can we use this technique?
Let’s repeat the previous exercise, but this time, in addition to Fort Portal, let’s include Entebbe, a city in central Uganda.
- Parameterize the required cells in the Notebook. Here’s a GIF to help you get started.
- Define a configuration file config.json that will replace the parameters cell (the cell that we tagged in step 1) with the values with which we want to execute the notebook.
[{“REGION”: “fortportal”,“UTM”: 32636,“PIPELINE”: “gridded-population”},{“REGION”: “entebbe”,“UTM”: 32636,“PIPELINE”: “gridded-population”}]
- Add an orchestrator.py to orchestrate the workflow
def process(region, pipeline):‘’’Processes the pipeline for a given region & stores the executed notebook inside nb-output/ folder.Input- region: Name of the region- pipeline: Pipeline to run (gridded-population / cluster / query-engine)ReturnNone‘’’def execute(nb, pipeline, region):‘’’Execute a notebook using papermill.’’’pm.execute_notebook(input_path=nb,output_path=f’nb-output/{pipeline}-executed-{region[“REGION”]}_@_{dt.now().strftime(“%Y-%m-%d_%I-%M-%S_%p”)}.ipynb’,parameters=region)if pipeline == ‘query’ : execute(‘2020–11–06-query-engine.ipynb’ , pipeline, region)elif pipeline == ‘cluster’ : execute(‘2020–11–06-cluster.ipynb’ , pipeline, region)elif pipeline == ‘gridded-population’: execute(‘2020–11–06-gridded-population.ipynb’, pipeline, region)else: print(‘Not a valid pipeline’)
- Run the orchestrator.py file
This will execute the parameterized Notebooks and store the executed Notebooks inside the nb-output/ folder.
Check out the detailed version of this tutorial.
You can connect with us directly and explore our Spatial Analytics Solutions that has helped organizations build urban resilience, conserve biodiversity and fight deadly diseases.