(One of my summaries of a talk at the 2019 PyGrunn conference).
He works for Dacom, a firm that writes software to help farmers be more effective. Precision farming is a bit of a buzzword nowadays. You can get public elevation data, you can let someone fly over your fields to take measurements or a cart can do automatic ground samples. This way you can make a “prescription map” where to apply more fertilizer and where less will do.
Another source of data is the equipment the farmer uses to drive over his field. As an example, the presentation looks at a potato harvester.
Which route did the harvester take through the field?
What was the yield (in potatoes per hectare) in all the various spots?
Some tools and libraries that they use:
Numpy: very efficient numerical processing. Arrays.
Pandas: dataseries.
Matplotlib: graph plotting library.
Postgis: geographical extension to the postgres databases.
Pandas is handy for reading in data, from csv for instance. It integrates nicely with matplotlib. With a one-liner you can let it create a histogram from the data.
With the .describe()
function, you get basic statistics about your data.
Another example: a map (actually a graph, but it looks like a map) with color codes for the yield. The locations where the yields are lower are immediately clear this way.
When converting data, watch out with your performance. What can be done by pandas itself is much quicker than if it has to ask python to do it. For instance, creating a datetime from a year field, a month field, etc, that takes a long time as it basically happens per row. It is way quicker to let pandas concatenate the yyyy/mm/dd + time info into one string and then convert that one string to a datetime.
He showed the same example for creating a geometric point. It is quickest to
create a textual POINT(1.234,8.234)
string from two x/y fields and only
then to convert it to a point.
Use the best tool for the job. Once he had massaged the data in pandas, he
exported it to a postgis database table. Postgis has lots of geographical
functions, like ST_CENTROID
, ST_BUFFER
, and ST_MAKELINE
, which he
used to do the heavy geographical lifting.
He then used the “geopandas” extension to let pandas read the resulting postgis query’s result. Which could again be plotted with matplotlib.
Nice!
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):