at

 :: 

Technical-art

 :: 

Barkane's Stylistic Volumetric Clouds

Barkane's Stylistic Volumetric Clouds

§ Overview

  • The endgoal of this walkthrough is to generate cotton-like 3D clouds

  • The constraints are

    • Viewing angle: because the player can freely rotate around the world, clouds could be viewed at any angle, not just from the bottom
    • Occlusion: the cloud must fit well in front of changing skyspheres and in-game objects

    This could not be fulfilled completely, see later sections for details

    • Cost: as clouds are purely cosmetic, they should not take up too much computation comparing to the rest of the game
    • Variation: clouds should be unique enough when several are placed together
    • Stylistic: ultimately, the cloud should look like a cartoon cotton ball more than actual clouds
  • Naturally there are several approaches, and the main decision boils down to choosing between 2D and 3D noise textures. 3D noise is preferred because it provides the most nuance in different viewing angles.

  • This decision creates several issues, namely

    • How to generate 3D noise textures
    • How to sample from 3D noise textures
    • How to stylize the samples
  • The rest of the walkthrough will go over the development process and focus on how these issues are solved

Formulas borrow heavily from this presentation by Andrew Schneider from Guerrilla Games

§ Generating 3D Worley Noise

  • Typically for any generated rendering, nuance (i.e. non-repetitive details) comes from manually added details or random noise

  • For clouds, the "nuance" is the density distribution of the cloud itself

    • In games with open sky scenes, clouds are usually one large sheet. The density applies throughout that sheet and additional techniques are used to "carve" clouds out of the sheet. However for Barkane clouds are individual objects, so the "carving" sections of referenced papers can more or less be skipped.
    • The only relevant noises are Perlin and Worley noises, for simplicity's sake
  • A Worley noise is a noise based on seed points, the value at each point of the sample space is the distance to the closest seed point in that entire space (image from Wikipedia)

    • The inverted version with several round "dots" in each "cell" is ideal for rendering cotton

There are various examples online for creating 3D Worley noise directly from a 3D compute shader

However, for compatibility and performance concerns this walkthrough uses serialized 2D texture slices and combines them at runtime to form the 3D texture. This removes the need to run compute shaders inside the game.

Also, the noise generated below will not be inverted within the texture. This is to make sure it can be used as normal Worley noise in other contexts.

§ The 2D Compute Shader

  • The skeleton code of a 2D compute shader is as follows

    # pragma kernel CSWorley
    RWTexture2D<float4> _Result;
    [numthreads(8, 8, 1)]
    void CSWorley(uint3 id: SV_DispatchThreadID) {
      _Result[id.xy] = float4(id.x, id.y, id.z, 0);
    }
    

    which just writes the current thread position into the _Result buffer

    For more about thread indices, here is a Microsoft doc about it. The main idea is each thread has a 3D coordinate with each dimension going from 0 to resolution

  • Some extra data must be added to the input

    int _Width; // Number of cells along one axis of the texture (all unit length)
    int _Density; // How many samples per cell
    float _z; // the current slice
    
    // The seed points, StructuredBuffer is used to transfer data from the C# program (in the CPU) to the shader (in the GPU)
    StructuredBuffer<float3> _Pts;
    
    // 3D cell offsets, reference from Jushii Tuomi https://github.com/jushii/WorleyNoise/blob/main/Assets/Shaders/Worley3D.compute
    static const int3 offsets[] =
    {
      int3(-1, -1, -1),
      int3(0, -1, -1),
      int3(1, -1, -1),
      int3(-1, -1, 0),
      int3(0, -1, 0),
      int3(1, -1, 0),
      int3(-1, -1, 1),
      int3(0, -1, 1),
      int3(1, -1, 1),
      int3(-1, 0, -1),
      int3(0, 0, -1),
      int3(1, 0, -1),
      int3(-1, 0, 0),
      int3(0, 0, 0),
      int3(1, 0, 0),
      int3(-1, 0, 1),
      int3(0, 0, 1),
      int3(1, 0, 1),
      int3(-1, 1, -1),
      int3(0, 1, -1),
      int3(1, 1, -1),
      int3(-1, 1, 0),
      int3(0, 1, 0),
      int3(1, 1, 0),
      int3(-1, 1, 1),
      int3(0, 1, 1),
      int3(1, 1, 1),
    };
    
  • In the CSWorley kernel, we first figure out the cell index of the current thread based on its thread index. (Note that the z coordinate is injected instead of using the thread index)

    float fWidth = (float)_Width;
    float fDensity = (float)_Density;
    
    float3 raw = float3(id.x, id.y, _z); // XYZ position in [0, _Width * _Density) range
    float3 pos = raw / fDensity; // Grid position in [0, _Width) range
    int3 cell = floor(pos); // cell index in [0, _Width) range
    
    float sqMin = 100000f; // some very large value that guarantees to be updated later
    
  • Then, for each neighboring cell, we compute the distance from the current sampling point to the seed point in that cell

    for (int i = 0; i < 27; i++) {
      int3 neighbor = cell + offsets[i]; // the neighbor's cell's position
      int3 samplingCoordinate = neighbor;
      int ptIdx = samplingCoordinate.x + _Width * (samplingCoordinate.y + _Width * samplingCoordinate.z);
    
      float3 toNeighbor = neighbor - pos; // distance from current sample to the neighboring cell (not the neighboring cell's seed point)
      float3 displacement = _Pts[ptIdx] + toNeighbor; // distance from current sample to the neighboring seed point
    
      sqMin = min(sqMin, dot(displacement, displacement)); // squared distance
    
      // ...
    }
    
  • However, this actually leads to out of bounds errors since

    • A negative offset on the 0th cell results in a negative index
    • A positive offset on the last cell results goes out of bounds of the _Pts buffer
  • To fix this, we first introduce some helper functions

    bool legal(int dim) {
      return dim >= 0 && dim < _Width;
    }
    
    bool legal(int3 dims) {
      return legal(dims.x) && legal(dims.y) && legal(dims.z);
    }
    
    int idx(int3 dims) {
      return dims.x + _Width * (dims.y + _Width * (dims.z));
    }
    
  • Then we apply those helper functions inside the CSWorley loop

    // ...
    int3 neighbor = cell + offsets[i];
    int3 samplingCoordinate = neighbor;
    
    if (!legal(neighbor)) samplingCoordinate = (neighbor + fWidth) % fWidth;
    
    // ...
    
    • Instead of just ditching the sample, we loop the sample with % so that negative indices sample from the back of the buffer and vice versa
  • Finally, we write the result back to the _Result buffer so it transfers back to the CPU

    _Result[id.xy] = float4(sqMin, 0, 0, 0);
    
  • Note that instead of comparing the euclidian distance we compared the squared distance (result of the dot product). That distance is in the range from 0 to sqrt(3) instead of 0 to 1

    • The sqrt(3) comes from the fact that we are using cube cells and the diagonal of that is sqrt(3)

    • Instead of doing sqrt(sqMin) / sqrt(3) it is simpler to just use sqrt(sqMin / 3)

      _Result[id.xy] = float4(sqrt(sqMin / 3), 0, 0, 0);
      

§ Channel Usage

  • Note that the current shader only uses the R channel of the _Result buffer, while the 3 other channels are just empty

  • To utilize these other channels, 4 separate sets of seed points could be used to create 4 unique black and white textures

    [numthreads(8, 8, 1)]
    void CSWorley(uint3 id : SV_DispatchThreadID)
    {
      float fWidth = (float)_Width;
      float fDensity = (float)_Density;
    
      float3 raw = float3(id.x, id.y, _z);
      float3 pos = raw / fDensity;
      int3 cell = floor(pos);
    
      float4 sqMin4 = float4(10000, 10000, 10000, 10000);
      for (int i = 0; i < 27; i++) {
        int3 neighbor = cell + offsets[i];
        int3 samplingCoordinate = neighbor;
    
        if (!legal(neighbor)) samplingCoordinate = (neighbor + fWidth) % fWidth;
    
        int ptIdx = samplingCoordinate.x + _Width * (samplingCoordinate.y + _Width * samplingCoordinate.z);
    
        float3 toNeighbor = neighbor - pos;
    
        float3 displacement1 = _Pts1[ptIdx] + toNeighbor;
        float3 displacement2 = _Pts2[ptIdx] + toNeighbor;
        float3 displacement3 = _Pts3[ptIdx] + toNeighbor;
        float3 displacement4 = _Pts4[ptIdx] + toNeighbor;
    
        sqMin4 = float4(
          min(sqMin4.x, dot(displacement1, displacement1)),
          min(sqMin4.y, dot(displacement2, displacement2)),
          min(sqMin4.z, dot(displacement3, displacement3)),
          min(sqMin4.w, dot(displacement4, displacement4)));
      }
      _Result[id.xy] = sqrt(sqMin4 / 3.0);
    }
    

§ Shader Invocation And Texture Saving

Checkout this Unity document for relevant concepts

  • Note that the compute shader only uses random points, but it never actually generates them

  • The actual generation happens in another C# script and gets sent over to the compute shader's buffers

  • Each set of seed points is generated by

    ComputeBuffer PtsBuffer(int width)
    {
      var numCells = width * width * width;
      // sizeof(float) is used instead of sizeof(Vector3) because latter is not provided
      var buf = new ComputeBuffer(numCells, sizeof(float) * 3, ComputeBufferType.Structured);
      var pts = new Vector3[numCells];
      for (int i = 0; i < numCells; i++)
          // relative location in the cell, in the range (0, 1)
          pts[i] = new Vector3(Random.value, Random.value, Random.value);
      buf.SetData(pts);
      return buf;
    }
    
  • And to invoke the shader

    void Generate(int width, int density, ComputeShader shader)
    {
      var resolution = width * density;
    
      // the resulting 3D texture
      var t3D = new Texture3D(resolution, resolution, resolution, TextureFormat.RGBA32, false)
      {
        wrapMode = TextureWrapMode.Repeat
      };
    
      // setting relevant data
      var (pts1, pts2, pts3, pts4) = (PtsBuffer(width), PtsBuffer(width), PtsBuffer(width), PtsBuffer(width));
    
      shader.SetInt("_Width", width);
      shader.SetInt("_Density", density);
      shader.SetBuffer(0, "_Pts1", pts1);
      shader.SetBuffer(0, "_Pts2", pts2);
      shader.SetBuffer(0, "_Pts3", pts3);
      shader.SetBuffer(0, "_Pts4", pts4);
    
      // one slice
      var rt2D = new RenderTexture(resolution, resolution, 0)
      {
        enableRandomWrite = true,
        filterMode = FilterMode.Point
      };
      rt2D.Create();
    
      // serialization method, will elaborate later
      BakeTexture(t3D, rt2D, shader, resolution);
    
      // saving the 3D texture as an asset
      AssetDatabase.CreateAsset(t3D, $"Assets/Materials/Noise/{ShaderName}.asset");
      AssetDatabase.SaveAssets();
    
      // destroying allocated resources so Unity doesn't throw allocation errors
      #region release
      DestroyImmediate(rt2D);
      pts1.Release();
      pts2.Release();
      pts3.Release();
      pts4.Release();
      #endregion
    
    }
    
    void BakeTexture(Texture3D t3D, RenderTexture rt2D, ComputeShader shader, int resolution)
    {
      // get rendertexture to render layers to this 2D slice
      // then, 2D slices are transferred to a 1D array
      // the 1D array is used to create the 3D texture
      var temp = new Texture2D(resolution, resolution)
      {
        filterMode = FilterMode.Point
      };
    
      // serializing to a 1D array
      var area = resolution * resolution;
      var volume = area * resolution;
      var colors = new Color[volume];
    
      shader.SetTexture(0, "_Result", rt2D);
    
      //loop through slices
      for (var slice = 0; slice < resolution; slice++)
      {
        // get shader result
        // the original implementation uses Blit (Material-based), this is switched to the newer Compute Shader version of Dispatch
        shader.SetFloat("_z", slice);
        // the numbers here needs to match the `numthreads` tag in the compute shader
        shader.Dispatch(0, resolution / 8, resolution / 8, 1);
    
        // write the render texture into the 2D slice
        RenderTexture.active = rt2D;
        temp.ReadPixels(new Rect(0, 0, Resolution, Resolution), 0, 0);
        temp.Apply();
        RenderTexture.active = null;
    
        // 1D representation of the current slice
        var sliceColors = temp.GetPixels32();
    
        // transfer it to the 1D representation of the 3D texture
        var sliceBase = slice * area;
        for (var pixel = 0; pixel < area; pixel++)
        {
          colors[sliceBase + pixel] = sliceColors[pixel];
        }
      }
    
      // force the 3D texture to update before saving it in Generate
      t3D.SetPixels(colors);
      t3D.Apply();
    
      // similarly, temporary allocated resources are destroyed
      DestroyImmediate(temp);
    }
    
  • The result using _Width=16 and _Density=10 is as follows

§ Ray-Marching

The original code is based on Unity and more specifically the Universal Render Pipeline. The following technical details are omitted, but feel free to look up the Unity forums for relevant information to

  • Incorporate shader code into URP + Shader Graph
  • Get light position from within the shader / Passing light position to the shader

The code in Barkane had to deal with duplicate parameters. For example, function A needs a1, a2 but it also calls B that needs b1, b2, so it ends up being in the form void A(a1, a2, b1, b2) and void B(b1, b2) which is really redundant. For brevity I am going to write functions like these in the form void A(a1, a2) void B(b1, b2).

  • This section will go over the ray march method itself, the rest of the cloud shader will be explained in the sections below
    • This section references:
      • Shaders from this project by Sebastian Lague
      • Code from the appendix of this paper by Fredrik Häggström, starting from B.6
      • Formulas from this presentation by Andrew Schneider from Guerrilla Games

§ Theory

  • Before any stylization, rendering a cloud is equivalent with computing the amount of light travelling from a point on the cloud to a pixel on the screen

    • P(something) denote the probability of something happening
  • This is an approximation of scattering, or light bouncing between particles within the cloud

    • In a general sense, light is less likely to travel along a ray the deeper the ray goes into the cloud, since part of it gets bounced into other directions
    • The more particles there are along the ray, there are more objects to bounce the light ray away
      • To count the number of particles, ray-marching is needed to sample the density of the cloud at different points. linear density * length = mass.
      • Whenever we need to find this probability also called transmittance, we will need to add ray-marching to the shader.
    • Later sections will go into depth for computing this probability
  • In this simplified model, light travels...

    • From the sun to the view ray, and then "bounced" at the intersection to a direction following the view ray
      • This is then approximated by only sampling a few points at the end of each view ray, rather than all points on the view ray
    • From the background to the camera, partially blocked by the cloud

§ Relevant Values

  • v, the normalized view direction that goes from the eye to the point on the cloud (in object space)
  • p, the position vector that goes from the center of the cloud (object space (0, 0, 0)) to the point on the cloud (in object space)
  • l, the light-facing direction that goes from the point on the cloud to the sun (in object space)
    • The sun is visualized as an infinite plane, so this is just taking the opposite of where the sun "points" towards
  • r, the object space length of the radius, set to 0.5 to accommodate Unity's default sphere model

Any other values introduced are parameters of the shader. You can either pass them in as parameters or insert magic numbers in the code (which is not recommended). For brevity they will be used in the code but not written as parameters.

§ Marching

  • The two marching functions are just logical translations of what's stated in the theory section

  • Along the view ray

    // returned values:
    // - cloudColor: RGB colors of the cloud
    // - transmittance: overall transmittance of the cloud
    void march(float3 p, float3 v, float3 l, out float3 cloudColor, out float transmittance) {
      // The view ray enters the cloud at p, we need to find where it exits
      // The spherical shape allows a quick solution using quadratic formula (derivation omitted)
      float I = dot(v, p);
      float end = -I;
      I = I * I - dot(p, p) + r * r;
      if (I < 0) discard; // the ray does not actually enter the cloud
      end += sqrt(I);
    
      float energy = 0; // start with no light, gradually accumulate this value
      transmittance = 1; // start with totally transparent, gradually decrease this value
    
      const float stepSize = end / steps;
      for(uint i = 0; i < steps; i++) {
        const float t = stepSize * (i + 1);
        const float3 pSample = p + t * v; // the current position of the view ray in the cloud
        const float density = densityAt(p); // will elaborate in later sections
        const float mass = density * stepSize * cloudAbsorption; // cloudAbsorption is a stylistic constant
        const float sunTransmittance = sunMarch(pSample, l); // transmittance from sun to view ray, will elaborate below
        
        energy += mass * transmittance * sunTransmittance; // light accumulated on this segment of the view ray
        transmittance *= beer(mass * cloudAbsorption); // this segment blocks light coming from later steps, will elaborate in later sections
      }
    
      const float4 backgroundColor = panoramic(v); // sampling the skysphere for background lighting, explanation omitted 
    
      cloudColor = backgroundColor.xyz * transmittance + energy * sunColor;
    }
    
  • From the light to the view ray

    float sunMarch(float3 p, float3 l) {
      // similar endpoint calculation
      float I = dot(l, p);
      float end = -I;
      I = I * I - dot(p, p) + r * r;
      if (I < 0) return 1; // completely transparent if the ray doesn't enter the cloud, this shouldn't really happen
      end += sqrt(I);
      
      const float stepSize = end / steps;
      float mass = 0;
      for(uint i = 0; i < steps; i++) {
        float t = stepSize * (i + 1);
        float3 pSample = p + l * t;
        mass += stepSize * densityAt(pSample);
      }
    
      return beer(mass * sunAbsorption); // sunAbsorption is a stylistic constant
    }
    
  • Note that step sizes are calculated in object space, this isn't actually physical (since scale needs to be taken account of in stepSize) but ensures uniformity between clouds of different sizes. Smaller clouds will not appear see-through even though they should in a more physical render.

  • Here beer refers to Beer's Law: T = e^(-m) (i.e. light is bounced away by cloud mass so transmittance T decreases)

    Schneider's presentation explored something they call "beer-powder approximation" which did not really fit the style of Barkane and was skipped. Beer's Law is just a starting point!

  • The cloudAbsorption value decides the overall brightness, and the sunAbsorption decides the top-bottom brightness difference

    • Low sunAbsorption

    • High sunAbsorption

  • As a side note, the material itself is set to unlit transparent with the alpha value set to 1 - transmittance

§ Scattering

  • So far, the cloud's looks are entirely dependent on the density

  • While the leveled view might be fine, views from extreme angles look really dull

    • There is no silver-lining that is characteristic of clouds
  • To correct this, Mie scattering needs to be added into the light accumulation process

    • When added, lining up v and l will create strong highlights. This is similar to looking at the sun through the cloud.

    • Mie scattering is approximated by the Henyey-Greenstein phase function that takes in an angle (between the view direction and the light direction) and outputs a modifier to the light intensity

      float hg(float cosa, float g) {
        float g2 = g * g;
        return (1 - g2) / (4 * 3.1415 * pow(1 + g2 - 2 * g * cosa, 1.5));
      }
      
    • And inside the marching method...

      const float cosa = dot(v, l);
      const float phaseVal = hg(cosa, g);
      energy += stepSize * density * transmittance * lightTransmittance * phaseVal;
      
  • This gives the clouds silver lining when looking from below, but what about above?

    • The solution is to add a negative HG term that gives highlight when viewed along the sun's direction

    • The two terms are combined by averaging

      float twoTermHG(float t, float g1, float g2, float cosa, float gBias) {
        return lerp(hg(cosa, g1), hg(cosa, -g2), t) + gBias;
      }
      
    • And inside the marching method...

      const float phaseVal = twoTermHG(0.5, g1, g2, cosa, gBias);
      

      Where g1 stands for the forward scattering constant, g2 stands for the backward scattering constant.

  • Forward scattering preview

  • Backward scattering preview (the highlight between the first two clouds from the left)

  • Generally, larger values of g give smaller and more energized highlights

Another type of scattering that affects the color of the light is called Rayleigh scattering. This is skipped in Barkane because of stylistic reasons.

§ Density

  • In previous previews, densityAt is substituted with sampling the 3D texture directly

  • It is better to add a dropoff function that smoothly transitions out of the cloud

    float densityDrop(float3 p, float dropoff) {
      float sq = dot(p, p);
      float t = sq / r;
      return exp(-dropoff * t) - exp(-dropoff); // the subtraction makes sure it is zeroed at 1
    }
    

    Where dropoff is some value greater than 1. The larger the value the more dramatically density drops.

  • The entire densityAt

    float densityAt(float3 p) {
      float3 pWorld = mul(unity_ObjectToWorld, float4(p, 1)).xyz;
      pWorld = pWorld * scl;
    
      return baseDensity
        * (weights.x + weights.y
          // originally the negation is built into the worley generation shader
          // it is moved here to make the worley noise texture itself more usable elsewhere
          - weights.x * worley.Sample(worleyState, pWorld).r
          - weights.y * worley.Sample(worleyState, pWorld).g);
        * densityDrop(p, densityDropoff);
    }
    
    • Here the ObjectToWorld transform is needed to make the noise resolution uniform between differently scaled clouds
    • scl is the scale of the noise
    • worley.Sample(worleyState, pWorld) samples the 3D noise with a sampler state and a position. This is what I found that works with Shader Graph, where the following parameters are passed from the graph itself
      • UnitySamplerState worleyState
      • UnityTexture3D worley
    • I found the 2 channel (R and G) enough. If you want more detailed noise feel free to add the B and W channels as well
      • The 2 channels are sampled separately. You can also write this frequency difference into the texture itself and sample both channels in one call. This requires a slightly more complicated compute shader invocation.

§ Stylization

This section goes over additional steps specifically for stylizing. Previous steps are also stylized, just not specifically written for it.

§ Cutoff

  • A hard border fits the background more, meaning a cutoff needs to be added for points above a specific transmittance value

    // added at the bottom of the march function
    if (transmittance > cutoff) {
      transmittance = 1;
    }
    
  • Example with threshold at 0.5

§ Final Renderings

  • At steps = 50

  • At steps = 25

  • At steps = 10

  • At steps = 4

  • The first three renderings do not look that different, mainly because the texture scale is set to a very small value. For higher resolutions (and more frequency multiples than just two) the rendering might look different.

  • Without rigorously testing performance, 10 steps is where the framerate drop begins to be noticeable

  • Most of the performance should be taken up by texture sampling, as per-step computations are relatively light

    • By the nature of the ray marching the number of density sampling is O(steps^2)
    • This could be helped if different channels in the noise texture were already scaled according to frequency, this way only one sample is needed.

§ Conclusion

§ Answering the Challenges

  • Viewing angle
    • Cloud lighting looks believable at most viewing angles
  • Occlusion
    • Artifacts still exist between adjacent clouds: a certain cloud could leap before an adjacent cloud when it "wins" the Z-test after a small angle variation
    • The fainted edges are not entirely convincing
  • Cost
    • Constant-step for loops are unwrapped statically by the compiler, meaning there is little branching
    • Step count can be low enough (4 for both stage I and II) without causing visual issues
  • Variation:
    • Noise looks random enough
  • Stylistic
    • Several techniques were used to make the cloud look cartoonish
    • It is hard to balance between contrast and liveliness (grays look too dull but are necessary for giving "mass" to the cloud)

§ Future Improvements

  • Connecting step count with Unity's Quality settings
  • Using fiber-like textures to further stylize the noise
  • Fixing occlusion issues for adjacent clouds

§