Flow Control Limitations

Pixel shader flow control instructions have limits affecting how many levels of nesting can be included in the instructions. In addition, there are some limitations for implementing per-pixel flow control with gradient instructions.

Note

When you use the *_4_0_level_9_x HLSL shader profiles, you implicitly use of the Shader Model 2.x profiles to support Direct3D 9 capable hardware. Shader Model 2.x profiles support more limited flow control behavior than the Shader Model 4.x and later profiles.

 

Pixel Shader Instruction Depth Counts

ps_2_0 does not support flow control. The limitations for the other pixel shader versions are listed below.

Instruction Depth Count for ps_2_x

Each instruction counts against one or more nesting depth limits. The following table lists the depth count that each instruction adds or subtracts from the existing depth.

Instruction Static nesting Dynamic nesting loop/rep nesting call nesting
if bool - ps 1 0 0 0
if_comp - ps 0 1 0 0
if pred - ps 0 1 0 0
else - ps 0 0 0 0
endif - ps -1(if bool - ps) -1(if pred - ps or if_comp - ps) 0 0
rep - ps 0 0 1 0
endrep - ps 0 0 -1 0
break - ps 0 0 0 0
break_comp - ps 0 1, -1 0 0
breakp - ps 0 0 0 0
call - ps 0 0 0 1
callnz bool - ps 0 0 0 1
callnz pred - ps 0 1 0 1
ret - ps 0 -1(callnz pred - ps) 0 -1
setp_comp - ps 0 0 0 0

 

Nesting Depth

Nesting depth defines the number of instructions can be called from inside of each other. Each type of instruction has one or more nesting limits as indicated in the following table.

Instruction Type Maximum
Static nesting 24 if (D3DCAPS9.D3DPSHADERCAPS2_0.StaticFlowControlDepth > 0); 0 otherwise
Dynamic nesting 0 to 24, see D3DCAPS9.D3DPSHADERCAPS2_0.DynamicFlowControlDepth
rep nesting 0 to 4, see D3DCAPS9.D3DPSHADERCAPS2_0.StaticFlowControlDepth
call nesting 0 to 4, see D3DCAPS9.D3DPSHADERCAPS2_0.StaticFlowControlDepth (independent of rep limit)

 

Instruction Depth Count for ps_2_sw

Each instruction counts against one or more nesting depth limits. This table shows the depth count that each instruction adds or subtracts from the existing depth.

Instruction Static nesting Dynamic nesting loop/rep nesting call nesting
if bool - ps 1 0 0 0
if pred - ps 0 1 0 0
if_comp - ps 0 1 0 0
else - ps 0 0 0 0
endif - ps -1(if bool - ps) -1(if pred - ps or if_comp - ps) 0 0
rep - ps 0 0 1 0
endrep - ps 0 0 -1 0
loop - ps n/a n/a n/a n/a
endloop - ps n/a n/a n/a n/a
break - ps 0 0 0 0
break_comp - ps 0 1, -1 0 0
breakp - ps 0 0 0 0
call - ps 0 0 0 1
callnz bool - ps 0 0 0 1
callnz pred - ps 0 1 0 1
ret - ps 0 -1(callnz pred - ps) 0 -1
setp_comp - ps 0 0 0 0

 

Nesting Depth

Nesting depth defines the number of instructions that can be called from inside of each other. Each type of instruction has one or more nesting limits as indicated in the following table.

Instruction Type Maximum
Static nesting 24
Dynamic nesting 24
rep nesting 4
call nesting 4

 

Instruction Depth Count for ps_3_0

Each instruction counts against one or more nesting depth limits. This table shows the depth count that each instruction adds or subtracts from the existing depth.

Instruction Static nesting Dynamic nesting loop/rep nesting call nesting
if bool - ps 1 0 0 0
if pred - ps 0 1 0 0
if_comp - ps 0 1 0 0
else - ps 0 0 0 0
endif - ps -1(if bool - ps) -1(if pred - ps or if_comp - ps) 0 0
rep - ps 0 0 1 0
endrep - ps 0 0 -1 0
loop - ps 0 0 1 0
endloop - ps 0 0 -1 0
break - ps 0 0 0 0
break_comp - ps 0 1, -1 0 0
breakp - ps 0 0 0 0
call - ps 0 0 0 1
callnz bool - ps 0 0 0 1
callnz pred - ps 0 1 0 1
ret - ps 0 -1(callnz pred - ps) 0 -1
setp_comp - ps 0 0 0 0

 

Nesting Depth

Nesting depth defines the number of instructions that can be called from inside of each other. Each type of instruction has one or more nesting limits as indicated in the following table.

Instruction Type Maximum
Static nesting 24
Dynamic nesting 24
loop/rep nesting 4
call nesting 4

 

Instruction Depth Count for ps_3_sw

Each instruction counts against one or more nesting depth limits. This table shows the depth count that each instruction adds or subtracts from the existing depth.

Instruction Static nesting Dynamic nesting loop/rep nesting call nesting
if bool - ps 1 0 0 0
if pred - ps 0 1 0 0
if_comp - ps 0 1 0 0
else - ps 0 0 0 0
endif - ps -1(if bool - ps) -1(if pred - ps or if_comp - ps) 0 0
rep - ps 0 0 1 0
endrep - ps 0 0 -1 0
loop - ps 0 0 1 0
endloop - ps 0 0 -1 0
break - ps 0 0 0 0
break_comp - ps 0 1, -1 0 0
breakp - ps 0 0 0 0
call - ps 0 0 0 1
callnz bool - ps 0 0 0 1
callnz pred - ps 0 1 0 1
ret - ps 0 -1(callnz pred - ps) 0 -1
setp_comp - ps 0 0 0 0

 

Nesting Depth

Nesting depth defines the number of instructions that can be called from inside of each other. Each type of instruction has one or more nesting limits as indicated in the following table.

Instruction Type Maximum
Static nesting 24
Dynamic nesting 24
loop/rep nesting 4
call nesting 4

 

Interaction of Per-Pixel Flow Control With Screen Gradients

The pixel shader instruction set includes several instructions that produce or use gradients of quantities with respect to screen space x and y. The most common use for gradients is to compute level-of-detail calculations for texture sampling, and in the case of anisotropic filtering, selecting samples along the axis of anisotropy. Typically, hardware implementations run the pixel shader on multiple pixels simultaneously (such as a 2x2 grid), so that gradients of quantities computed in the shader can be reasonably approximated as deltas of the values at the same point of execution in adjacent pixels.

When flow control is present in a shader, the result of a gradient calculation requested inside a given branch path is ambiguous when adjacent pixels may execute separate flow control paths. Therefore, it is deemed illegal to use any pixel shader operation that requests a gradient calculation to occur at a location that is inside a flow control construct which could vary across pixels for a given primitive being rasterized.

All pixel shader instructions are partitioned into those operations that are permitted and into those that are not permitted inside of flow control:

  • Scenario A: Operations that are not permitted inside flow control that could vary across the pixels in a primitive. These include the operations listed in the following table.

    Instruction Is Permitted in Flow Control when:
    texld - ps_2_0 and up, texldb - ps and texldp - ps A temporary register is used for the texture coordinate.
    dsx - ps and dsy - ps A temporary register is used for the operand.

     

  • Scenario B: Operations that are permitted anywhere. These include the operations listed in the following table.

    Instruction Is Permitted Anywhere when:
    texld - ps_2_0 and up, texldb - ps and texldp - ps A read-only quantity is used for the texture coordinate (may vary per-pixel, such as interpolated texture coordinates).
    dsx - ps and dsy - ps A read-only quantity is used for the input operand (may vary per-pixel, such as interpolated texture coordinates).
    texldl - ps The user provides level-of-detail as an argument, so there are no gradients, and thus no issue with flow control.
    texldd - ps The user provides gradients as input arguments, so there is no issue with flow control.

     

These restrictions are strictly enforced in shader validation. Scenarios having a branch condition that looks like it would branch consistently across a primitive, even though an operand in the condition expression is a pixel-shader-computed quantity, nevertheless still fall into scenario A and are not permitted. Similarly, scenarios where gradients are requested on some shader-computed quantity x from inside dynamic flow control, yet where it appears that x is not modified across any of the branches, nevertheless still fall into scenario A and are not permitted.

Predication is included in these restrictions on flow control, so that implementations remain free to trivially interchange the implementation of branch instructions with predicated instructions.

The user can use instructions from scenarios A and B together. For example, suppose the user needs an anisotropic texture sample given a shader computed texture coordinate; however, the texture load is only needed for pixels satisfying some per-pixel condition. To meet these requirements, the user can compute the texture coordinate for all pixels, outside per-pixel varying flow control, immediately computing gradients using dsx - ps and dsy - ps instructions. Then, within a per-pixel if bool - ps/endif - ps block, the user can use texldd - ps (texture load with user provided gradients), passing the precalculated gradients. Another way to describe this usage pattern is that, while all pixels in the primitive had to compute the texture coordinates and be involved with gradient calculation, only the pixels that needed to sample a texture actually did so.

Regardless of these rules, the burden is still on the user to ensure that before computing any gradient (or performing a texture sample that implicitly computes a gradient), the register containing the source data must have been initialized for all execution paths beforehand. Initialization of temporary registers is not validated or enforced in general.

Pixel Shader Instructions