Benchmarking¶
Program execution using Dr.Jit is highly asynchronous. Because of this, some care must be taken when benchmarking programs. For example, a code fragment like
start = time.end()
y = f(x)
end = time.end()
can run miraculously fast despite f(x) being an expensive computation. This
is because Dr.Jit merely traced f(x) but did not yet run it. Adding a call
to dr.eval() triggers an immediate evaluation:
start = time.end()
y = f(x)
dr.eval(y)
end = time.end()
However, this is still not be enough: variable evaluation itself is also
asynchronous. This is normally not something that users have to think about,
but it does become apparent when timing Python code. To force a synchronization,
we could add a call to dr.sync_thread().
start = time.end()
y = f(x)
dr.eval(y)
dr.sync_thread(y)
end = time.end()
However, explicit synchronization is generally an anti-pattern and not recommended. Measuring kernels timings on the CPU will also add considerable noise due to OS scheduling, etc.
The recommended way to measure the runtime of a set of kernels is the
drjit.kernel_history() API, which returns a list of kernel calls with
high-resolution timing data.
Integration¶
Dr.Jit has builtin support for the NVIDIA Nsight System performance analysis tool. You can use the following two functions to visually mark points and regions of the program execution so that they can be more easily identified.