Implementing Repeat & Reflect Mapping in a Parametrized Way

Repeat and reflect are extend modes that are used when implementing gradient and texture fetchers in a 2D rendering pipeline. Extend modes define how a texture or gradient address is calculated when the index goes out of bounds, or more precisely, how to walk the gradient or texture and never go out of bounds. There is an additional mode that is called pad, which is covered only briefly in this article. In total, there are 3 extend modes commonly implemented by 2D rendering pipelines:

Where result, x, and width are integers and result is always within [0, width) range (note that width is exclusive). A baseline implementation of pad, repeat, and reflect functions could look like the following:

static inline int pad(int x, int width) {
  if (x < 0)
    return 0;

  if (x >= width)
    return width - 1;

  return x;
}

static inline int repeat(int x, int width) {
  // Repeat from 0 to width.
  x = x % width;
  if (x < 0)
    x += width;

  return x;
}

static inline int reflect(int x, int width) {
  // Repeat from 0 to width * 2.
  x = x % (width * 2);
  if (x < 0)
    x += width * 2;

  // Reflect from 0 to width.
  if (x >= width)
    x = width * 2 - 1 - x;

  return x;
}

When the Width is a Power of 2

In many cases width would be a power of 2, especially when rendering gradients. In this case the expensive MOD operator % could be replaced by a cheap AND operator (&), which would make the implementation shorter and much faster:

static inline int repeat(int x, int width) {
  return x & (width - 1);
}

static inline int reflect(int x, int width) {
  // Repeat from 0 to width * 2.
  x = x & ((width * 2) - 1);

  // Reflect from 0 to width.
  if (x >= width)
    x = width * 2 - 1 - x;

  return x;
}

AND operator solves two problems - it implicitly handles the x < 0 condition, because the width is never negative (for example -2 & 255 == 254, which is the same as -2 + 256 in 2's complement arithmetic) and it replaces a very expensive MOD operator by a totally cheap AND operator. It does the same job in our case, because of power of 2 constraint. However, the reflect function is still pretty long and it could be simplified even more, especially if we want to vectorize it.

Reflect-Trick Suitable for SIMD

We know that width can be used to precalculate other values that can be used instead of it. For example if we always use width * 2 - 1 as a mask it would be better to just calculate it once and reuse the precalculated value. For the following reflect-trick we precalculate it so it becomes a mask instead of width:

static inline int precalc(int width) {
  return width * 2 - 1;
}

We also know that SIMD instructions like more min/max approach than if/mask approach, so instead of using the condition the reflection itself could be rewritten to use a single subtraction and minimum value:

// y is the precomputed value, equal to `width * 2 - 1`.
static inline int reflect(int x, int y) {
  // Repeat from 0 to width * 2.
  x = x & y;

  // Reflect by using the reflect-trick.
  return min(x, y - x);
}

The following reasons summarize why this implementation is generally better for SIMD implementation (and probably for scalar too):

Handling Both Repeat & Reflect Extend Modes at the Same Time

If we introduce another variable it would be possible to handle both repeat and reflect modes in a parametrized way. This would be extremely beneficial for 2D rendering pipelines that allow different repeat modes for X and Y. We name our constants as repeatValue and reflectValue, and precalculate them the following way:

static inline int precalcRepeatValue(int width, bool isReflecting) {
  return isReflecting ? width * 2 - 1 : width - 1;
}

static inline int precalcReflectValue(int width, bool isReflecting) {
  return width * 2 - 1;
}

And use such values in a single function:

static inline int repeatOrReflect(int x, int repeatValue, int reflectValue) {
  x = x & repeatValue;
  return min(x, reflectValue - x);
}

Based on the initialization, the repeatOrReflect() function will either repeat or reflect based on the parameters.

Reflect-Trick that uses SAR and XOR

On X86 architecture the reflection can also be performed by using a SAR (arithmetic shift right) and XOR instructions. The idea is to shift the interval of the value to be reflected from [0...width*2) to [-width..width). If we do so we can use the following:

static inline int reflect(int x) {
  const int kShift = (sizeof(int) * 8 - 1);
  return (x >> kShift) ^ x;
}

The function in this case expects value in a correct range and will always output a value within [0..width). On X86 the function would typically translate to 3 instructions:

mov idx, x   ; Copy x to idx
sar idx, 31  ; Copy sign bits across idx
xor idx, x   ; Reflect

Or 2 instructions if we can make sure that the input x is in eax register and output in edx register:

cdq          ; Sign-extend eax to edx:eax
xor edx, eax ; Reflect

Conclusion

These tricks can be used to improve the performance of gradient and texture fetching. In this post I focused mostly on repeating and reflection constrained by a power of 2, which is very useful for gradients, but it's not that much for textures. If the repeatValue is removed from repeatOrReflect function and the implementation handles the repeat during advancing x and y coordinates the same code could be used to repeat/reflect a value of any range, which is exactly how Blend2D texture fetcher works.