Investigating C++ compile times
Welcome back to the series!
In the previous posts we’ve explored ways to find how long a C++ build takes, installed some tools and found a way to create flame graphs out of MSBuild’s execution.
Join me this time as we explore a brand new SDK that lets us get better data in an easier manner!
C++ Build Insights SDK
Last november @KevinCadieuxMS shared a post over at Microsoft’s C++ Team Blog announcing a brand new tool called vcperf to get extensive data out of a C++ compilation. It required getting Visual Studio 16.4 as vcperf is part of the installation.
Last march they went further:
- They released the C++ Build Insights SDK so we can use it ourselves.
- They open sourced vcperf, which serves as an example tool using the SDK.
So first of all, let me thank Kevin and all of the team for providing us with these extra tools. They’re highly appreciated and useful!
And there I was, finishing the initial release of my MSBuildFlameGraph tool as they released this SDK that made my tool almost deprecated! So, I set myself on a quest to explore it and this is what I found.
Disclaimer: this post is based on the 1.0.0 version of the C++ Build Insights SDK. It’s expected it evolves over time and some of this info can become stale!
Instead of writing an incomplete and outdated list of steps to get it working, let me point you to the official documentation.
There, you’ll get info on how to collect a trace using vcperf or how the SDK is structured. We’ll be taking a look at some of its points now.
Activities and events
Put simply, the basic piece of information the SDK reports is the
Event. They have an id, a timestamp, a name and the real process/thread/processor they were emitted from. They can also have extra data depending on which event type they are.
Event is instant they call them
SimpleEvent, while it’s called
Activity if it takes time to complete (and also provides the timestamp when it finished).
You can check the full list of activities and events at the official docs.
Because activities have a
Duration and take some time to complete, there can be other activities and events that execute within them. This way, an
Activity can have both
SimpleEvent children. On the other hand, every
SimpleEvent is instant and can’t have any children.
After reading the event table and not being able to make a mental map myself, I decided to build a directed graph to visualize them.
These are the parent-child relationships for activities (yellow-bordered are top-level ones):
These are the parent-child relationships with
SimpleEvent included (gray-shaded):
If we were to check the event table from the docs, we’d see each one of the event types we’ve seen matches a class:
Let’s keep this graph in mind as we continue exploring.
Exploring the SDK via code
When you want to analyze a build you’ll usually perform these steps:
vcperf /start SomeTraceNamefrom an elevated command-line prompt.
- Build your project (from Visual Studio or command-line, no matter which).
vcperf /stopnoanalyze SomeTraceName OutputFile.etl.
OutputFile.etlwith your tool (using the SDK).
Please note that, although you can record traces yourself with the SDK, I’ll keep using vcperf for that.
This is the basic code:
That’s all of the setup you need to do to get working! Now we need to know how to get useful data from the trace.
Please note I’ll be using
CppBI as an alias to
Microsoft::Cpp::BuildInsights so code sections require as little horizontal scrolling as possible.
There are three main member functions you need to know from the
The SDK calls these whenever an
Event is read from the
.etl file. Which one depends on the kind of
Event that was reported (
So what’s that
EventStack? Whenever an
Event is emitted we also get its context via this
EventStack. Let’s say we get
OnStopActivity called. An example of the
EventStack could be:
Or say we get
But how do we retrieve different events within the stack?
Let’s say we want to know all of the command-line options within our trace. We explore the official docs and find there’s a
CommandLine event type. We’d add this method to our class:
The SDK has a very clever way of helping you get the events you’re interested in within a hierarchy:
MatchEventStackInMemberFunction call there makes magic and filters those stacks which contain the event types in
We now know all of the command-line options within the trace, but they’re all mixed up. Remember how
Linker activities were parents of
CommandLine events? Wouldn’t it be awesome to tell them apart?
Let’s create these two methods this time:
See how the signature now takes a
CommandLine preceded by either
Linker?. Now let’s update our
OnSimpleEvent method to use them both:
MatchEventStackInMemberFunction will match those stacks that have a
Compiler activity and then a
CommandLine event apart from those with a
Linker activity and then a
You can get more info on matchers and examples on the official docs.
Getting more useful data with the SDK
Now we know how to deal with the SDK, let’s try to get some data out of our build, shall we?
In this case we’ll be using a default Unreal Engine project. Oh, and because it’s the compiler and linker the ones emitting the events we can get data from projects not supported in MSBuildFlameGraph!
Collecting a trace
Before we can start, we need a trace to work with! These were my steps:
- Installed Visual Studio 15.9.22 (Visual Studio 2017).
- Cloned vcperf from GitHub and built it.
- Installed Unreal Engine 4.24.3, launched it and created the default
First Person Shooterproject with C++ code.
- Launched an elevated command prompt and navigated to the previous vcperf output directory.
vcperf /start UE4Project.
- In Visual Studio 2017,
Development|Win64our Unreal Engine solution.
vcperf /stopnoanalyze UE4Project UE4Project.etl.
Now we’ve got an
UE4Project.etl trace to analyze!
Oh, the computer I ran it on is a 8-year-old i5 laptop with no SSD and low RAM, so expect slow times here!
Listing slowest files to compile
Before we get some code, we should go back to the directed graph we built before:
See how both
BACK_END_PASS live within the
COMPILER activity? After a quick check with the official docs we can see both classes inherit from
CompilerPass. And that class has two interesting members:
InputSourcePath: path to the source file that gets compiled.
OutputObjectPath: path to the compiled output file.
Also, because they inherit from
Activity we can check the time it took from start to stop via
Nice! So we’ve got all the info we need. Now let’s write some code:
And that’s all we need! We can now run our code, sort the data we’ve collected and dump it:
That’s it, now we know which files take longer to compile! And it looks like front-end times are way larger than back-end ones!
Important note: it looks like projects compiled with Visual Studio 2017 (tested with version 15.9.22) report their
InputSourcePath as null (but their
OutputObjectPath is correct). This seems to work without a problem if you compile your project with Visual Studio 2019 (tested with version 16.5.2). For the sake of this section I had ommited this issue until now.
Listing slowest files to get included
This time we want to query the SDK for included files. Remember: file inclusion gets performed by the front-end. After a bit of searching we come up with
Activity is recursive, as a file can include some file that in turn includes some other file. However, we’re only interested in files that get included from other file. That means we want to capture a stack with two
Let’s see some code:
And that’s it! Now we have a list of the times it took to include each file (one file can be included from several files!). Why don’t we sort all inclusions by total inclusion time and dump the first ones?
Generate a directed graph for file inclusions
With the previous analyzer we could also get the inclusion relationships between files, can’t we? This time I won’t add any code, but a couple of screenshots of the resulting inclusion graph (
Yeah, no joke. That’s what I can see in a laptop resolution without zooming in! Let’s zoom a bit into something, shall we?
I’ve zoomed into the file we saw is the slowest to compile. It also shows which files include it and which files get included by it. To cut the graph down I installed Visual Studio’s
DefaultFPSCppProjectCharacter.h to select the node and clicked the
Generate a flame graph
There are other useful metrics to get, like how long do functions take to compile (careful, SDK reports their names mangled!), how many files get included on each
.h file or whether files compile in parallel at all. I’m sure you can come up with more! But this time, we want to visualize the build.
First of all, we have to go back to vcperf. When it analyzes a trace, it generates extra data to view within Windows Performance Analyzer (WPA). However, although it gives very useful insights, I find myself a bit clumsy while exploring it. Please refer to vcperf on GitHub to get a glimpse of how it looks.
We’d have to create a new analyzer to feed into the SDK that records all hierarchies and sets useful names for each entry in the graph. This means taking file paths from
FrontEndFile, function names from
Function and the like. Be careful, though, as reported events live in the stack and will get out of scope if you want to keep pointers to them!
Let’s take a look at the raw result!
Well, not specially readable either, isn’t it? To the left of the image we can see
ThreadId as reported by the SDK. And if we want to know what’s going on we have to scroll back and forth, trying to know which entry matches which parent. Or, what’s compiling in parallel? Found myself feeling as clumsy as when using WPA.
But, can we do any better?
Packing parent-children together
Let’s post-process the data and try to keep child entries as close as possible to their parents, being careful to leave the necessary space to fit them when we have entries running in parallel.
With a bit of work, we get this version:
As you can see,
ThreadId aren’t the real ones anymore. However, they approach to a more logical distribution of what’s running where! Definitely easier to read for me!
Let’s zoom into
What can we see here? As part of the
C1DLL activity we see that this
.cpp file includes three other files. We also see that the
CodeGeneration activity has all of the functions that get generated (they get generated in parallel, up to 4 because my computer has 4 logical cores).
But we can see something else: the
Compiler activity only spans the
BackEndPass of this file. And, if we go back to the full trace, all
Compiler activities only deal with one file. This maps to having a lot of
CL MSBuild tasks! Does this mean we aren’t compiling in parallel?
Well, apparently we are compiling in parallel we check the full trace. It looks UnrealHeaderTool is spawning as many
cl.exe in parallel as files we want to compile, instead of letting
cl.exe do the parallelization itself?
Getting template instantiation data
The last cool thing we’ll see requires using the open source version of vcperf as well as compiling with Visual Studio 16.4 (Visual Studio 2019). This version of vcperf reports some extra events than the one shipping with Visual Studio 2019.
Bruce wanted to make compilations slow and calculated Fibonacci in templates (with some tricks to prevent the compiler from reusing template instantiations). And this is the result!
Of course, these template instantiations can be seen in other projects! This time we’ll build the tool I’ve used for this post in
Release|x64 and zoom into
See that huge amount of small entries? Those are template instantiations! And they can be part of any file, so when you include some file you also get which templates get instantiated.
However, have you noticed the small
BackEndPass to the right? It’s got no children, hasn’t it? That’s because we’ve built a
Release version which has
LTCG enabled! Let’s take a look at the
Here’s where the
Function activities get compiled! In a previous graph we saw them as part of the
BackEndPass activity, but it didn’t have
Also, it looks like Visual Studio 2019 also adds
Thread activities between
This exploration of the C++ Build Insights SDK helped me take more informed decisions when investigating C++ compile times and I’m sure I’ll be using it in the future.
Again, let me thank the team that made the SDK possible, and @KevinCadieuxMS in particular!
To write this post I built a tool that’s now available on GitHub for everyone to try!
Thanks for reading!