(One of my summaries of the 2025 pygrunn conference in Groningen, NL).
Aivars pointed at https://www.brendangregg.com/linuxperf.html as a good overview of linux tools
A good start is the /proc filesystem, you can use it to gather information on processes, for instance to grab the environment used by a process:
$ cat /proc/1234455/environ || tr '\0' '\n'
The files/sockets used by a specific process:
$ ls /proc/12345/fd/*
You might have an unfindable file that takes up lots of space (like a logfile that has
been deleted from a directory, but that is still open in some program). The command
above will have (deleted)
next to deleted files, so you can search for that string
in the output to find the process that still has such a big file open.
Another handy tool: strace, it traces linux system kernel calls. You don’t even need root access if you just want to trace your own processes. An example command:
$ strace -f -ttt -o output.txt -s 1024 -p <PID>
$ strace -f -ttt -o output.txt -s 1024 -p your-new-process.sh
If your code does a system call (“read something from somewhere”), strace prints both the start and the end of the call. So you can find out exactly where something is blocking in case of an error. He mentioned https://filippo.io/linux-syscall-table/ as a good overview of the available system calls you might see in the output.
Disk IO problems? Try iostat -x
to see where the IO is happening. When testing disk
throughput, don’t just test huge blobs of data, but make sure to use the actual block
size (often 4k or 8k).
When debugging network access, you often use ping or traceroute. But both use protocols
(ICMP and UDP) that are often blocked by network admins. He suggests tcptraceroute
which uses TCP and often gives a better view of reality.
With network problems, TCP_NODELAY
is a possible cause. See
https://brooker.co.za/blog/2024/05/09/nagle.html for more information. Read this
especially when you see the magic number 40ms
in your logs, or only get 25
transactions per second.
Tip: set timeouts for everything. The defaults are often a cause for hanging.
Photo explanation: picture from our Harz (DE) holiday in 2023
My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.
Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):