Textures

Dr.Jit provides a convenient abstraction around the notion of a texture, i.e., a multidimensional array that can be evaluated at fractional positions. This feature leverages hardware texture units to accelerate lookups if possible, and it otherwise reverts to an efficient software implementation.

The texture implementation is fully differentiable and supports half, single, and double-precision floating-point textures in 1, 2 and 3 dimensions. Each lookup produces simultaneous evaluations for a set of channels, which conceptually increases the dimension of the underlying storage by one.

The easiest way to create a texture is by initializing it from a compatible tensor:

import drjit as dr
from drjit.auto.ad import TensorXf, Texture2f

n_channels = 3
tensor = dr.full(TensorXf, 1, shape=[1024, 768, n_channels])

# 2D texture with 3 output channels
tex = Texture2f(tensor)

To use a texture with a different number of dimensions or precision, adopt the class name appropriately (e.g., Texture3f16 for 3D half-precision).

You may optionally also specify filter and wrap modes that are used by subsequent interpolated lookups (see dr.FilterMode and dr.WrapMode for details).

tex = Texture2f(
    tensor,
    filter_mode=dr.FilterMode.Linear,
    wrap_mode=dr.WrapMode.Repeat
)

The .eval() function queries the function at a position on the unit cube. In this example involving a 2D texture, we must provide a 2D input point, and the evaluation produces three output channels.

pos = dr.cuda.Array2f([0.25, 0.5, 0.9],
                      [0.1,  0.3, 0.5])
out = tex.eval(pos)

Regular lookups use nearest neighbor or linear/bilinear/trilinear interpolation. The .eval_cubic() builds on this capability to provide a clamped cubic B-Spline interpolant at somewhat higher cost.

Note

When evaluating a texture, the numerical precision used during the interpolation is dictated by the floating point precision of the query point. You may, e.g., want to use a 32-bit position to query a 16-bit texture to avoid a loss of accuracy.

Hardware acceleration

Dr.Jit can accelerate texture lookups on the CUDA backend using hardware GPU texture units. Textures initialized with use_accel=True (the default) will create an associated CUDA texture object that leverages hardware intrinsics to perform sampling

tex = dr.cuda.Texture2f(tensor_data, use_accel=True)

Note

Only single and half-precision floating-point CUDA texture objects are supported. Double-precision textures work but won’t benefit from hardware-acceleration.

Warning

Hardware-accelerated lookups use a 9-bit fixed-point format with 8-bits of fractional value for storing the weights used for linear interpolation. See the CUDA programming guide for more details.

Migration

When hardware acceleration is disabled, Dr.Jit textures are a thin wrapper around the underlying tensor representation, which remains accessible:

tex = dr.cuda.Texture2f(tensor_data, use_accel=False)

tensor_data = tex.tensor() # Return the tensor backing this texture
array_data = tex.value()   # Same, but in array form

Hardware-accelerated Dr.Jit textures work differently: they migrate texture data into a CUDA texture object that is no longer directly accessible to Dr.Jit. This makes methods such as .tensor() and .value() rather expensive, since they must copy the texture data from the CUDA object back into memory.

If you desire access to a hardware-accelerated texture and at the same time retain the tensor representation, specify migrate=False to the texture constructor, i.e.,

tex = dr.cuda.Texture2f(tensor_data, use_accel=True, migrate=False)

This, however, doubles the storage cost associated with the texture.

Automatic differentiation

Suppose we want to compute the gradient of a lookup with respect to the input tensor of a texture

import drjit as dr
from drjit.cuda.ad import TensorXf, Texture1f, Array1f

N = 3

tensor = TensorXf([3,5,8], shape=(N, 1))

dr.enable_grad(tensor)

tex = Texture1f(tensor)
pos = Array1f(0.4)
out = Array1f(tex.eval(pos))

dr.backward(out)

grad = dr.grad(tensor)

In order to propagate gradients, the associated AD graph needs to track the collection of coordinate wrapping, texel fetching and filtering operations that are performed on the underlying tensor as part of sampling. While hardware-accelerated textures here rely on GPU intrinsics, such textures are indeed still differentiable. Internally, while the primal lookup operation is hardware-accelerated, a subsequent non-accelerated lookup is additionally performed solely to record each individual operation into the AD graph.

8-bit textures

The interface of Texture?f8u variants (e.g. Texture3f8u) slightly differs from their floating point counterparts. They store texture data compactly using unsigned 8-bit integers, but their eval_* members produce floating point output by transparently remapping 0..255 onto the interval [0, 1].

When the texture object was created with srgb=True, i.e.,

tex = Texture2f8u(tensor_u8, srgb=True)

evaluation will apply the nonlinear sRGB transfer function to turn gamma-encoded 8-bit values back into a linear scale. The CUDA and Metal backends perform both types of conversions in hardware.

Writing to textures

Hardware textures can also be written from within a kernel, turning a texture into a render target. Create the texture with writable=True and store per-texel values with .write(), which takes integer texel coordinates (one unsigned-integer array per dimension) and a list of per-channel values:

tex = Texture2f([height, width], channels=4, writable=True)

idx = dr.arange(UInt, width * height)
x, y = idx % width, idx // width
tex.write([x, y], [r, g, b, a])
dr.eval()

The access mode follows the operation, so a writable texture may be both written and sampled (a texture rendered in one kernel can be looked up in another). Its contents can be read back into a tensor via .tensor() / .value() as usual. Writing 8-bit textures clips and quantizes [0, 1] float inputs and sRGB-encodes them if needed.

This feature requires a JIT backend; it is unavailable for scalar textures.

Wrapping native textures

To share textures with a GUI or another GPU API, an existing native texture can be wrapped as a Dr.Jit texture with .from_native_handle(). The handle is an id<MTLTexture> pointer on the Metal backend or an OpenGL texture id on the CUDA backend; the shape, channel count, and component type are inferred from it (the dimensionality and precision must match the texture type used):

tex = dr.metal.Texture2f.from_native_handle(mtl_texture)                 # sample it
tex = dr.metal.Texture2f.from_native_handle(mtl_texture, writable=True)  # render into it

Pass writable=True to render into the application’s texture via .write(). The inverse, .native_handle(), returns a Dr.Jit-allocated texture’s native handle to hand to a GUI for display: the id<MTLTexture> on Metal, or the wrapped OpenGL texture id on CUDA (0 if the texture has no OpenGL identity).

A texture wrapping a cross-API handle (an OpenGL texture on the CUDA backend) is only usable between .map() and .unmap(), which must bracket each use; on Metal both are no-ops.

C++ interface

Textures are also avilable in C++. To do so, instantiate the template class drjit::Texture with any Dr.Jit array or scalar floating-point type and specify the desired number of dimensions:

using Float = dr::CUDAArray<float>;

size_t shape[2] = { 1024, 768 };
dr::Texture<Float, 2> tex(shape, 3);