Pygrunn: how to solve a python mystery - Aivars Kalvāns

Tags: pygrunn, python

(One of my summaries of the 2025 pygrunn conference in Groningen, NL).

Aivars pointed at https://www.brendangregg.com/linuxperf.html as a good overview of linux tools

A good start is the /proc filesystem, you can use it to gather information on processes, for instance to grab the environment used by a process:

$ cat /proc/1234455/environ || tr '\0' '\n'

The files/sockets used by a specific process:

$ ls /proc/12345/fd/*

You might have an unfindable file that takes up lots of space (like a logfile that has been deleted from a directory, but that is still open in some program). The command above will have (deleted) next to deleted files, so you can search for that string in the output to find the process that still has such a big file open.

Another handy tool: strace, it traces linux system kernel calls. You don’t even need root access if you just want to trace your own processes. An example command:

$ strace -f -ttt -o output.txt -s 1024 -p <PID>
$ strace -f -ttt -o output.txt -s 1024 -p your-new-process.sh

If your code does a system call (“read something from somewhere”), strace prints both the start and the end of the call. So you can find out exactly where something is blocking in case of an error. He mentioned https://filippo.io/linux-syscall-table/ as a good overview of the available system calls you might see in the output.

Disk IO problems? Try iostat -x to see where the IO is happening. When testing disk throughput, don’t just test huge blobs of data, but make sure to use the actual block size (often 4k or 8k).

When debugging network access, you often use ping or traceroute. But both use protocols (ICMP and UDP) that are often blocked by network admins. He suggests tcptraceroute which uses TCP and often gives a better view of reality.

With network problems, TCP_NODELAY is a possible cause. See https://brooker.co.za/blog/2024/05/09/nagle.html for more information. Read this especially when you see the magic number 40ms in your logs, or only get 25 transactions per second.

Tip: set timeouts for everything. The defaults are often a cause for hanging.

https://reinout.vanrees.org/images/2025/pygrunn-1.jpeg

Photo explanation: picture from our Harz (DE) holiday in 2023

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):