Petr Kobalicek

Software engineer focusing on writing

high-performance code in C++ & Asm

high-performance code in C++ & Asm

Repeat and reflect are extend modes that are used when implementing gradient and texture fetchers in a 2D rendering pipeline. Extend modes define how a texture or gradient address is calculated when the index goes out of bounds, or more precisely, how to walk the gradient or texture and never go out of bounds. There is an additional mode that is called pad, which is covered only briefly in this article. In total, there are 3 extend modes commonly implemented by 2D rendering pipelines:

- Pad - index is clamped
`result = pad(x, width)`

- Repeat - index is repeated
`result = repeat(x, width)`

- Reflect - index is reflected
`result = reflect(x, width)`

Where `result`

, `x`

, and `width`

are integers and `result`

is always within `[0, width)`

range (note that width is exclusive). A baseline implementation of pad, repeat, and reflect functions could look like the following:

```
static inline int pad(int x, int width) {
if (x < 0)
return 0;
if (x >= width)
return width - 1;
return x;
}
static inline int repeat(int x, int width) {
// Repeat from 0 to width.
x = x % width;
if (x < 0)
x += width;
return x;
}
static inline int reflect(int x, int width) {
// Repeat from 0 to width * 2.
x = x % (width * 2);
if (x < 0)
x += width * 2;
// Reflect from 0 to width.
if (x >= width)
x = width * 2 - 1 - x;
return x;
}
```

In many cases `width`

would be a power of 2, especially when rendering gradients. In this case the expensive MOD operator `%`

could be replaced by a cheap AND operator (`&`

), which would make the implementation shorter and much faster:

```
static inline int repeat(int x, int width) {
return x & (width - 1);
}
static inline int reflect(int x, int width) {
// Repeat from 0 to width * 2.
x = x & ((width * 2) - 1);
// Reflect from 0 to width.
if (x >= width)
x = width * 2 - 1 - x;
return x;
}
```

AND operator solves two problems - it implicitly handles the `x < 0`

condition, because the `width`

is never negative (for example `-2 & 255 == 254`

, which is the same as `-2 + 256`

in 2's complement arithmetic) and it replaces a very expensive MOD operator by a totally cheap AND operator. It does the same job in our case, because of power of 2 constraint. However, the reflect function is still pretty long and it could be simplified even more, especially if we want to vectorize it.

We know that `width`

can be used to precalculate other values that can be used instead of it. For example if we always use `width * 2 - 1`

as a mask it would be better to just calculate it once and reuse the precalculated value. For the following reflect-trick we precalculate it so it becomes a mask instead of width:

```
static inline int precalc(int width) {
return width * 2 - 1;
}
```

We also know that SIMD instructions like more min/max approach than if/mask approach, so instead of using the condition the reflection itself could be rewritten to use a single subtraction and minimum value:

```
// y is the precomputed value, equal to `width * 2 - 1`.
static inline int reflect(int x, int y) {
// Repeat from 0 to width * 2.
x = x & y;
// Reflect by using the reflect-trick.
return min(x, y - x);
}
```

The following reasons summarize why this implementation is generally better for SIMD implementation (and probably for scalar too):

- It uses only one constant
`y`

, which could be kept in register. - It doesn't use branching, which would need to be translated to comparison & masking.

If we introduce another variable it would be possible to handle both repeat and reflect modes in a parametrized way. This would be extremely beneficial for 2D rendering pipelines that allow different repeat modes for X and Y. We name our constants as `repeatValue`

and `reflectValue`

, and precalculate them the following way:

```
static inline int precalcRepeatValue(int width, bool isReflecting) {
return isReflecting ? width * 2 - 1 : width - 1;
}
static inline int precalcReflectValue(int width, bool isReflecting) {
return width * 2 - 1;
}
```

And use such values in a single function:

```
static inline int repeatOrReflect(int x, int repeatValue, int reflectValue) {
x = x & repeatValue;
return min(x, reflectValue - x);
}
```

Based on the initialization, the `repeatOrReflect()`

function will either repeat or reflect based on the parameters.

On X86 architecture the reflection can also be performed by using a SAR (arithmetic shift right) and XOR instructions. The idea is to shift the interval of the value to be reflected from `[0...width*2)`

to `[-width..width)`

. If we do so we can use the following:

```
static inline int reflect(int x) {
const int kShift = (sizeof(int) * 8 - 1);
return (x >> kShift) ^ x;
}
```

The function in this case expects value in a correct range and will always output a value within `[0..width)`

. On X86 the function would typically translate to 3 instructions:

```
mov idx, x ; Copy x to idx
sar idx, 31 ; Copy sign bits across idx
xor idx, x ; Reflect
```

Or 2 instructions if we can make sure that the input `x`

is in `eax`

register and output in `edx`

register:

```
cdq ; Sign-extend eax to edx:eax
xor edx, eax ; Reflect
```

These tricks can be used to improve the performance of gradient and texture fetching. In this post I focused mostly on repeating and reflection constrained by a power of 2, which is very useful for gradients, but it's not that much for textures. If the `repeatValue`

is removed from `repeatOrReflect`

function and the implementation handles the repeat during advancing `x`

and `y`

coordinates the same code could be used to repeat/reflect a value of any range, which is exactly how Blend2D texture fetcher works.