PyGrunn: large scale python satellite image processing - Ivor Bosloper

Tags: pygrunn, python, django

(One of my summaries of a talk at the 2021 10th Dutch PyGrunn one-day python conference).

Satellites. There are almost 3000 satellites orbiting the earth. Some of them have cameras, which are the interesting ones for him. He started Dacom (now CropX), agricultural software. Satellite imagery is interesting for farmers as you can see how well the crops are growing by analyzing the images.

He did a live demo. They have a website with all 800.000 fields in the Netherlands. For every field they have image data. They can show it both as a regular image, but also color-coded for amount of greenery. And of course a nice graph throughout the year.

You can do all sorts of analysis on it. Look at the variation in crop yields within the field, for instance. You might have to use more fertilizer in the low-yield areas. But you also have to use other data sources, like an elevation map.

They started out experimentally with groenmonitor.nl in 2014. In 2015, ESA launched the “Sentinel 2a” satellite (with a twin, “2b”, in 2017). The data is free, part of the EU Copernicus project! They started using the data in 2016.

The images are huge 800MB: for a 100x100km tile. They download the useful images (the ones without too much cloud cover…) and proces them, use filters, do statistics on them, etc. Lots of separate tools. They use python as the glue to tie everything together.

Some of the processing is done by open source projects provided by ESA. Also they used lots of gdal. They had to battle with performance issues. I/O overhead was one of the bigger problems. They started looking at software-as-a-service providers like sentinelhub: yes, that could work well. But they were not sure about the price they would have to pay for their huge datasets.

The EU provides the satellites and the data for free. But they still had the idea that more people could make use of it. So they recently started the “DIAS” initiative. Multiple data datacenters throughout Europe with locally stored raw data and processed data. So you can host your software there without having to worry about huge data traffic bills. Nice!

They build a website with django where they stored all the processed field data. So per date and per field you’d store min/max/mean/etc values. With postgis/geodjango of course for easy geographical handling.

One of the core tools they use is rasterstats, which calculates the min/max/mean stats for raster images. Probably it uses gdal and numpy and so behind the scenes. These statistics are then stored in django, ready for quick retrieval in the user interface.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):