Compiler-assisted benchmarking for the study of C++ metaprogram compile times.

Github project: https://github.com/jpenuchot/ctbench
Online documentation: https://jpenuchot.github.io/ctbench-docs/
Discord server: https://discord.gg/NvJFFrdS7p

ctbench allows you to declare and generate compile-time benchmark batches for given ranges, run them, aggregate and wrangle Clang profiling data, and plot them.

The project was made to fit the needs of scientific data collection and analysis, thus it is not a one-shot profiler, but a set of tools that enable reproductible data gathering from user-defined, variably sized compile-time benchmarks using Clang's time-trace feature to understand the impact of metaprogramming techniques on compile time. On top of that, ctbench is also able to measure compiler execution time to support compilers that do not have built-in profilers like GCC.

It has two main components: a C++ plotting toolset that can be used as a CLI program and as a library, and a CMake boilerplate library to generate benchmark and graph targets.

The CMake library contains all the boilerplate code to define benchmark targets compatible with the C++ plotting toolset called grapher.

Rule of Cheese can be used as an example project for using ctbench.

Examples

As an example here are benchmark curves from the Poacher project. The benchmark case sources are available here.

Clang ExecuteCompiler time curve from poacher, generated by the compare_by plotter

Clang Total Frontend time curve from poacher, generated by the compare_by plotter

Using ctbench

Build prerequisites

ArchLinux and Ubuntu 23.04 are officially supported as tests are compiled and executed on both of these Linux distributions. Others including Fedora or any other Linux distro that provides CMake 3.25 or higher should be compatible.

Required ArchLinux packages: boost boost-libs catch2 clang cmake curl fmt git llvm llvm-libs ninja nlohmann-json tar tbb unzip zip
Required Ubuntu packages: catch2 clang cmake curl git libboost-all-dev libclang-dev libfmt-dev libllvm15 libtbb-dev libtbb12 llvm llvm-dev ninja-build nlohmann-json3-dev pkg-config tar unzip zip

The Sciplot library is required too. It can be installed on ArchLinux using the sciplot-git AUR package (NB: the non-git package isn't up-to-date). Otherwise, you can install it for your whole system using CMake or locally using vcpkg:

git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
./vcpkg/vcpkg install sciplot fmt
 
cmake --preset release \
  -DCMAKE_TOOLCHAIN_FILE=vcpkg/scripts/buildsystems/vcpkg.cmake

Note: The fmt dependency is needed, as vcpkg breaks fmt's CMake integration if you have it already installed.

Installing ctbench

git clone https://github.com/jpenuchot/ctbench
cd ctbench
cmake --preset release
cmake --build --preset release
sudo cmake --build --preset release --target install

An AUR package is available for easier install and update.

Integrating ctbench in your project

ctbench can be integrated to a CMake project using find_package:

find_package(ctbench REQUIRED)

The example project is provided as a reference project for ctbench integration and usage. For more details, an exhaustive CMake API reference is available.

Declaring a benchmark case target

A benchmark case is represented by a C++ file. It will be "instanciated", ie. compiled with BENCHMARK_SIZE defined to values in a range that you provide.

BENCHMARK_SIZE is intended to be used by the preprocessor to generate a benchmark instance of the desired size:

#include <boost/preprocessor/repetition/repeat.hpp>
 
// First we generate foo<int>().
// foo<int>() uses C++20 requirements to dispatch function calls accross 16
// of its instances, according to the value of its integer template parameter.
 
#define FOO_MAX 16
 
#define DECL(z, i, nope)                                                       \
  template <int N>                                                             \
  requires(N % FOO_MAX == i) constexpr int foo() { return N * i; }
 
BOOST_PP_REPEAT(BENCHMARK_SIZE, DECL, FOO_MAX);
#undef DECL
 
// Now we generate the sum() function for instanciation
 
int sum() {
  int i;
 
#define CALL(z, n, nop) i += foo<n>();
  BOOST_PP_REPEAT(BENCHMARK_SIZE, CALL, i);
#undef CALL
  return i;
}

By default, only compiler execution time is measured. If you want to generate plots using Clang's profiler data, add the following:

add_compile_options(-ftime-trace -ftime-trace-granularity=1)

Note that plotting profiler data takes more time and will generate a lot of plot files.

Then you can declare a benchmark case target in CMake with the following:

ctbench_add_benchmark(function_selection.requires # Benchmark case name
  function_selection-requires.cpp                 # Benchmark case file
  1                                               # Range begin
  32                                              # Range end
  1                                               # Range step
  10)                                             # Iterations per size

Declaring a graph target

Once you have several benchmark cases, you can start writing a graph config.

Example configs can be found here, or by running ctbench-grapher-utils --plotter=<plotter> --command=get-default-config. A list of available plotters can be retrieved by running ctbench-grapher-utils --help.

{
  "plotter": "compare_by",
  "demangle": true,
  "draw_average": true,
  "draw_points": true,
  "key_ptrs": [
    "/name",
    "/args/detail"
  ],
  "legend_title": "Timings",
  "plot_file_extensions": [
    ".svg",
    ".png"
  ],
  "value_ptr": "/dur",
  "width": 1500,
  "height": 500,
  "x_label": "Benchmark size factor",
  "y_label": "Time (µs)"
}

This configuration uses the compare_by plotter. It compares features targeted by the JSON pointers in key_ptrs across all benchmark cases. This is the easiest way to extract and compare as many relevant time-trace features at once.

Back to CMake, you can now declare a graph target using this config to compare the time spent in the compiler execution, the frontend, and the backend between the benchmark cases you declared previously:

ctbench_add_graph(function_selection-feature_comparison-graph # Target name
  ${CONFIGS}/feature_comparison.json                          # Config
  function_selection.enable_if                                # First case
  function_selection.enable_if_t                              # Second case
  function_selection.if_constexpr                             # ...
  function_selection.control
  function_selection.requires)

For each group descriptor, a graph will be generated with one curve per benchmark case. In this case, you would then get 3 graphs (ExecuteCompiler, Frontend, and Backend) each with 5 curves (enable_if, enable_if_t, if_constexpr, control, and requires).

Related work

References

Citing ctbench

@article{Penuchot2023,
  doi = {10.21105/joss.05165},
  url = {https://doi.org/10.21105/joss.05165},
  year = {2023},
  publisher = {The Open Journal},
  volume = {8},
  number = {88},
  pages = {5165},
  author = {Jules Penuchot and Joel Falcou},
  title = {ctbench - compile-time benchmarking and analysis},
  journal = {Journal of Open Source Software},
}