Note: This is an old post that was sitting in my drafts for a long time. It might be useful to someone still as most of the content is still relevant.
You should checkout dotnet-monitor if you are using .NET Core 3 and above! – it promises to make the while thing a lot easier. I’ve left the below content for posterity, but it’s very out of date now.
We recently had an issue with a container in development which was pegging the CPU allowance in it’s docker container.
This is a difficult issue to track down, because it was something going on in our code, but it’s not easily reproducible – it would run for hours OK, then the problem would start.
We looked at various methods of profiling .NET core applications on Linux. It’s fairly limited at this time – there’s not really any APM products that support .NET core on Linux yet. However, Microsoft provide a decent script which can produce a core dump which you can open in their PerfView application. This is often how we do the same thing for our applications running on Windows.
So, I wrote a shell script that does all this, and downloads the resulting trace file down your workstation, so you can open it in PerfView (this is a windows only application). Works fine, but I had a weird bug, where the first time you try to run the performance trace it kicks out lots of ugly errors about being unable to modprobe the lttng modules. I was running as
docker exec --privileged so it ought to have been able to work. But actually this docker bug turned out that it was probably stopping some system calls happening.
For some reason, the trace works fine the second time you run the script. So it’s not a huge problem. But why on earth does it work the second time, and not the first? Perhaps some mystery in the lttng internals which I don’t want to delve into right now.
The docker bug had some workarounds, one showing me that you can start a new container in the same PID namespace as an existing container (meaning it can see the processes of another container). This is ideal for debugging! So, what instead I decided to do, was create a new debugging container, specifically for connecting to wayward apps. This would save us having to either add all the dependencies to our own base .NET core container image, or install them at the time of debugging, neither of which are optimal.
Again, MS provide something to start with on their performance debugging, but not using the approach of attaching a new container to an existing docker process.
I decided to use the same docker base image we’ve been working from for our apps, rather than a new Ubuntu based one. Eventually I came up with a Dockerfile that looks like this:
FROM microsoft/dotnet WORKDIR /app RUN apt-get -qq update RUN apt-get -qq install -y lttng-tools liblttng-ust-dev linux-tools unzip zip coreutils binutils RUN mv /usr/bin/perf /tmp RUN ln -s /usr/bin/perf_3.16 /usr/bin/perf RUN curl -s https://raw.githubusercontent.com/dotnet/corefx-tools/master/src/performance/perfcollect/perfcollect --output perfcollect && chmod 755 perfcollect # Set tracing environment variables. ENV COMPlus_PerfMapEnabled 1 ENV COMPlus_EnableEventLog 1
Note: You need those last two env vars set when starting your .NET Core application in the other container too, so it will be debuggable by the script.
Great! Now we just need a way of running the perf capture script against our wayward container. So, this will do it, on our relevant docker host:
docker run -ti --privileged --pid container:<name | id> chrisgilbert42/dotnet-perfcapture /bin/bash
In that, we can run the perf capture:
./perfcollect collect my-trace-file