feat: Import 35+ skills, merge duplicates, add openclaw installer

Major updates: - Added 35+ new skills from awesome-opencode-skills and antigravity repos - Merged SEO skills into seo-master - Merged architecture skills into architecture - Merged security skills into security-auditor and security-coder - Merged testing skills into testing-master and testing-patterns - Merged pentesting skills into pentesting - Renamed website-creator to thai-frontend-dev - Replaced skill-creator with github version - Removed Chutes references (use MiniMax API instead) - Added install-openclaw-skills.sh for cross-platform installation - Updated .env.example with MiniMax API credentials
2026-03-26 11:37:39 +07:00
parent 48595100a1
commit 7edf5bc4d0
469 changed files with 131580 additions and 417 deletions
--- a/skills/shader-dev/reference/ambient-occlusion.md
+++ b/skills/shader-dev/reference/ambient-occlusion.md
@@ -0,0 +1,382 @@
+# SDF Ambient Occlusion — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing a complete step-by-step tutorial, mathematical derivations, variant analysis, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax (uniform, varying, function definitions)
+- **Signed Distance Field (SDF)** concept: `map(p)` returns the distance from point p to the nearest surface
+- **Raymarching** basic loop: marching along a ray to find surface intersections
+- **Surface normal computation**: Obtaining the normal direction via SDF gradient (finite differences)
+- Vector math fundamentals: dot product, normalization, vector addition/subtraction
+
+## Core Principles in Detail
+
+The core idea of SDF ambient occlusion: **Sample the SDF at multiple distances along the surface normal and compare the "expected distance" with the "actual distance" to estimate the degree of occlusion.**
+
+For a point P on the surface with normal N, at distance h:
+- **Expected distance** = h (if the surroundings are completely open, the SDF value should equal the distance to the surface)
+- **Actual distance** = map(P + N × h) (real SDF value)
+- **Occlusion contribution** = h - map(P + N × h) (the larger the difference, the more nearby geometry is occluding)
+
+The final result is a weighted sum of occlusion contributions from multiple sample points, yielding a [0, 1] occlusion factor:
+- 1.0 = no occlusion (bright)
+- 0.0 = fully occluded (dark corner)
+
+Key mathematical formula (additive accumulation form):
+
+```
+AO = 1 - k × Σ(weight_i × max(0, h_i - map(P + N × h_i)))
+```
+
+Where `weight_i` typically decays exponentially or geometrically (closer samples have higher weight), and `k` is a global intensity coefficient.
+
+## Implementation Steps in Detail
+
+### Step 1: Build the Base SDF Scene
+
+**What**: Define a `map()` function that returns the signed distance value for any point in space.
+
+**Why**: AO computation relies entirely on SDF queries, so a working distance field is needed first.
+
+```glsl
+float map(vec3 p) {
+    float d = p.y; // Ground plane
+    d = min(d, length(p - vec3(0.0, 1.0, 0.0)) - 1.0); // Sphere
+    d = min(d, length(vec2(length(p.xz) - 1.5, p.y - 0.5)) - 0.4); // Torus
+    return d;
+}
+```
+
+### Step 2: Compute Surface Normal
+
+**What**: Compute the normal direction via finite difference approximation of the SDF gradient.
+
+**Why**: AO sampling probes outward along the normal direction; the normal determines the sampling direction.
+
+```glsl
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+```
+
+### Step 3: Implement Classic Normal-Direction AO (Additive Accumulation)
+
+**What**: Sample the SDF at 5 distances along the normal direction, accumulating occlusion.
+
+**Why**: This is a classic method — the most concise and efficient SDF-AO implementation. 5 samples strike an excellent balance between quality and performance. The weight decays at 0.95 exponentially, giving closer samples more influence (near-surface occlusion is more perceptually important).
+
+```glsl
+// Classic AO
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0; // Initial weight
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0; // Sample distance: 0.01 ~ 0.13
+        float d = map(pos + h * nor);             // Actual SDF distance
+        occ += (h - d) * sca;                     // Accumulate (expected - actual) × weight
+        sca *= 0.95;                              // Weight decay
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+### Step 4: Apply AO to Lighting
+
+**What**: Multiply the AO factor into ambient and indirect light components.
+
+**Why**: AO simulates the degree to which indirect light is occluded. Physically, it should only affect ambient/indirect light, not the direct light source's diffuse and specular (direct light occlusion is handled by shadows). However, in practice AO is often multiplied into all lighting for a stronger visual effect.
+
+```glsl
+float ao = calcAO(pos, nor);
+
+// Method A: Affect only ambient light (physically correct)
+vec3 ambient = vec3(0.2, 0.3, 0.5) * ao;
+vec3 color = diffuse * shadow + ambient;
+
+// Method B: Affect all lighting (stronger visual effect)
+vec3 color = (diffuse * shadow + ambient) * ao;
+
+// Method C: Combined with sky visibility bias
+float skyVis = 0.5 + 0.5 * nor.y; // Upward-facing surfaces are brighter
+vec3 color = diffuse * shadow + ambient * ao * skyVis;
+```
+
+### Step 5: Raymarching Main Loop Integration
+
+**What**: Integrate AO into the complete raymarching pipeline.
+
+**Why**: AO is part of the lighting computation and needs to be calculated after hitting a surface but before final output.
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // ... camera setup, ray generation ...
+
+    // Raymarching loop
+    float t = 0.0;
+    for (int i = 0; i < 128; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < 0.001) break;
+        t += d;
+        if (t > 100.0) break;
+    }
+
+    // Compute lighting on hit
+    vec3 col = vec3(0.0);
+    if (t < 100.0) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+        float ao = calcAO(pos, nor);
+
+        // Lighting
+        vec3 lig = normalize(vec3(1.0, 0.8, -0.6));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        float sky = 0.5 + 0.5 * nor.y;
+        col = vec3(1.0) * dif + vec3(0.2, 0.3, 0.5) * sky * ao;
+    }
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Multiplicative AO
+
+**Difference from base version**: Starts at 1.0 and progressively multiplies down, rather than using additive accumulation then inverting. The multiplicative form naturally guarantees the result stays in [0, 1], avoids the need for clamping, and provides more natural falloff for multiple overlapping occlusions.
+
+**Source**: Multiplicative accumulation approach
+
+```glsl
+// Multiplicative AO
+float calcAO_multiplicative(vec3 pos, vec3 nor) {
+    float ao = 1.0;
+    float dist = 0.0;
+    for (int i = 0; i <= 5; i++) {
+        dist += 0.1; // Uniform step of 0.1
+        float d = map(pos + nor * dist);
+        ao *= 1.0 - max(0.0, (dist - d) * 0.2 / dist);
+    }
+    return ao;
+}
+```
+
+### Variant 2: Multi-Scale AO
+
+**Difference from base version**: Exponentially increases sampling distances (0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4), computing short-range and long-range occlusion separately. Short-range AO reveals contact shadows and surface detail; long-range AO reveals large-scale environmental occlusion. Fully unrolled with no loops, making it GPU-efficient.
+
+**Source**: Multi-scale sampling approach
+
+```glsl
+// Multi-scale AO
+float calcAO_multiscale(vec3 pos, vec3 nor) {
+    // Short-range AO (contact shadows)
+    float aoS = 1.0;
+    aoS *= clamp(map(pos + nor * 0.1) * 10.0, 0.0, 1.0);  // Adjustable: distance 0.1, weight 10.0
+    aoS *= clamp(map(pos + nor * 0.2) * 5.0,  0.0, 1.0);  // Adjustable: distance 0.2, weight 5.0
+    aoS *= clamp(map(pos + nor * 0.4) * 2.5,  0.0, 1.0);  // Adjustable: distance 0.4, weight 2.5
+    aoS *= clamp(map(pos + nor * 0.8) * 1.25, 0.0, 1.0);  // Adjustable: distance 0.8, weight 1.25
+
+    // Long-range AO (large-scale occlusion)
+    float ao = aoS;
+    ao *= clamp(map(pos + nor * 1.6) * 0.625,  0.0, 1.0); // Adjustable: distance 1.6
+    ao *= clamp(map(pos + nor * 3.2) * 0.3125, 0.0, 1.0); // Adjustable: distance 3.2
+    ao *= clamp(map(pos + nor * 6.4) * 0.15625,0.0, 1.0);  // Adjustable: distance 6.4
+
+    return max(0.035, pow(ao, 0.3)); // pow compresses dynamic range, min prevents total black
+}
+```
+
+### Variant 3: Jittered Sampling AO
+
+**Difference from base version**: Adds hash-based jitter on top of uniform sample positions, breaking the banding artifacts caused by fixed sample spacing. Also uses a `1/(1+l)` distance-decay weight so farther samples have less influence.
+
+**Source**: Jittered sampling approach
+
+```glsl
+// Jittered sampling AO
+float hash(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_jittered(vec3 pos, vec3 nor, float maxDist) {
+    float ao = 0.0;
+    const float nbIte = 6.0;          // Adjustable: number of samples
+    for (float i = 1.0; i < nbIte + 0.5; i++) {
+        float l = (i + hash(i)) * 0.5 / nbIte * maxDist; // Jittered sample position
+        ao += (l - map(pos + nor * l)) / (1.0 + l);       // Distance-decay weight
+    }
+    return clamp(1.0 - ao / nbIte, 0.0, 1.0);
+}
+// Usage example: calcAO_jittered(pos, nor, 4.0)
+```
+
+### Variant 4: Hemispherical Random Direction AO
+
+**Difference from base version**: Instead of sampling only along the normal direction, generates multiple random directions within the normal hemisphere. Closer to the true physical model of ambient occlusion (light arriving from all directions in the hemisphere), but requires more samples (typically 32) for smooth results.
+
+**Source**: Hemispherical random direction approach
+
+```glsl
+// Hemispherical random direction AO
+vec2 hash2(float n) {
+    return fract(sin(vec2(n, n + 1.0)) * vec2(43758.5453, 22578.1459));
+}
+
+float calcAO_hemisphere(vec3 pos, vec3 nor, float seed) {
+    float occ = 0.0;
+    for (int i = 0; i < 32; i++) {                              // Adjustable: sample count (16~64)
+        float h = 0.01 + 4.0 * pow(float(i) / 31.0, 2.0);      // Quadratic distribution biased toward near-field
+        vec2 an = hash2(seed + float(i) * 13.1) * vec2(3.14159, 6.2831); // Random spherical coordinates
+        vec3 dir = vec3(sin(an.x) * sin(an.y), sin(an.x) * cos(an.y), cos(an.x));
+        dir *= sign(dot(dir, nor));                               // Flip to normal hemisphere
+        occ += clamp(5.0 * map(pos + h * dir) / h, -1.0, 1.0); // Signed occlusion
+    }
+    return clamp(occ / 32.0, 0.0, 1.0);
+}
+```
+
+### Variant 5: Fibonacci Sphere Uniform Hemisphere AO
+
+**Difference from base version**: Uses Fibonacci sphere points instead of random directions, achieving quasi-uniform hemisphere sampling distribution. Avoids the clustering problem of pure random sampling, yielding higher quality at the same sample count. Can also be paired with a separate directional occlusion function (e.g., SSS/soft shadow) for multi-level occlusion.
+
+**Source**: Fibonacci sphere sampling approach
+
+```glsl
+// Fibonacci sphere sampling AO
+vec3 forwardSF(float i, float n) {
+    const float PI  = 3.141592653589793;
+    const float PHI = 1.618033988749895;
+    float phi = 2.0 * PI * fract(i / PHI);
+    float zi = 1.0 - (2.0 * i + 1.0) / n;
+    float sinTheta = sqrt(1.0 - zi * zi);
+    return vec3(cos(phi) * sinTheta, sin(phi) * sinTheta, zi);
+}
+
+float hash1(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_fibonacci(vec3 pos, vec3 nor) {
+    float ao = 0.0;
+    for (int i = 0; i < 32; i++) {                         // Adjustable: sample count
+        vec3 ap = forwardSF(float(i), 32.0);
+        float h = hash1(float(i));
+        ap *= sign(dot(ap, nor)) * h * 0.1;                // Flip to hemisphere + random scale
+        ao += clamp(map(pos + nor * 0.01 + ap) * 3.0, 0.0, 1.0);
+    }
+    ao /= 32.0;
+    return clamp(ao * 6.0, 0.0, 1.0);
+}
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The performance bottleneck of SDF-AO lies almost entirely in **SDF sample count** — each `map()` call is a full scene distance computation. For complex scenes, this can be very expensive.
+
+### Optimization Techniques
+
+#### 1. Reduce Sample Count
+
+Classic normal-direction AO only needs 3~5 samples for acceptable quality. Hemispherical sampling is more physically correct but requires 16~32 samples; use it when the performance budget allows.
+
+#### 2. Early Exit Optimization
+
+Exit the loop early when accumulated occlusion is already large enough, avoiding unnecessary SDF computations.
+
+```glsl
+if (occ > 0.35) break; // Early exit when heavily occluded
+```
+
+#### 3. Unroll Loops
+
+For fixed sample counts (especially 4~7), manually unrolling loops avoids branch overhead and is GPU-friendly. The multi-scale AO variant fully unrolls 7 samples.
+
+#### 4. Simplify AO for Distant Objects
+
+Objects far from the camera can use fewer AO samples or skip AO entirely.
+
+```glsl
+float aoSteps = mix(5.0, 2.0, clamp(t / 50.0, 0.0, 1.0));
+```
+
+#### 5. Precompilation Switches
+
+Use `#ifdef` to disable AO in debug or low-performance modes.
+
+```glsl
+#ifdef ENABLE_AMBIENT_OCCLUSION
+    float ao = calcAO(pos, nor);
+#else
+    float ao = 1.0;
+#endif
+```
+
+#### 6. Hand-Painted Pseudo-AO Blending
+
+For static or semi-static scenes, pseudo-AO values (based on material ID or position) can be precomputed and blended with real-time AO to reduce runtime computation.
+
+```glsl
+float focc = /* preset occlusion based on material */;
+float finalAO = calcAO(pos, nor) * focc;
+```
+
+#### 7. SDF Simplification
+
+A simplified version of `map()` (ignoring small details) can be used for AO sampling, since AO is inherently low-frequency information.
+
+## Combination Suggestions in Detail
+
+### 1. AO + Soft Shadow
+
+The most common combination. AO handles indirect light occlusion (corners, crevices); soft shadows handle direct light occlusion. Simply multiply the two:
+
+```glsl
+float sha = calcShadow(pos, lightDir, 0.02, 20.0, 8.0);
+float ao = calcAO(pos, nor);
+col = diffuse * sha + ambient * ao; // Each handles its own domain
+// Or more simply:
+col = lighting * sha * ao;
+```
+
+### 2. AO + Sky Visibility
+
+Use the normal's y component to estimate the degree of upward-facing, multiplied with AO to simulate sky light occlusion:
+
+```glsl
+float skyVis = 0.5 + 0.5 * nor.y;
+col += skyColor * ao * skyVis;
+```
+
+### 3. AO + Subsurface Scattering / Bounce Light
+
+AO can modulate bounce light and SSS intensity (occluded areas also don't receive bounce light):
+
+```glsl
+float bou = clamp(-nor.y, 0.0, 1.0); // Downward-facing surfaces receive ground bounce
+col += bounceColor * bou * ao;
+col += sssColor * sss * (0.05 + 0.95 * ao); // SSS also modulated by AO
+```
+
+### 4. AO + Convexity / Corner Detection
+
+The same SDF probing loop can sample both outward (+N) and inward (-N), yielding AO and convexity information respectively, useful for edge highlights or wear effects:
+
+```glsl
+vec2 aoAndCorner = getOcclusion(pos, nor); // .x = AO, .y = convexity
+col *= aoAndCorner.x;                       // AO darkening
+col = mix(col, edgeColor, aoAndCorner.y);   // Convexity coloring
+```
+
+### 5. AO + Fresnel Environment Reflection
+
+AO should also modulate the environment reflection term; otherwise concave areas will show unnatural bright environment reflections:
+
+```glsl
+float fre = pow(1.0 - max(dot(rd, nor), 0.0), 5.0);
+col += envColor * fre * ao; // Reduce environment reflection in occluded areas
+```
--- a/skills/shader-dev/reference/analytic-ray-tracing.md
+++ b/skills/shader-dev/reference/analytic-ray-tracing.md
@@ -0,0 +1,651 @@
+# Analytic Ray Tracing - Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step tutorial, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **Vector math fundamentals**: Dot product `dot()`, cross product `cross()`, vector normalization `normalize()`
+- **Quadratic equation solving**: Discriminant `b²-4ac`, meaning of the two roots
+- **Ray parametric representation**: `P(t) = ro + t * rd`, where `ro` is the ray origin, `rd` is the direction, `t` is the distance
+- **GLSL fundamentals**: `struct`, `inout` parameters, `vec3`/`vec4` operations
+- **ShaderToy framework**: `mainImage()` function, `iResolution`, `iTime`, and other uniforms
+
+## Use Cases (Complete List)
+
+- When rendering scenes composed of geometric primitives (spheres, planes, boxes, cylinders, tori, etc.)
+- When precise surface intersection points, normals, and distances are needed (no iterative approximation required)
+- When efficient ray intersection is needed in real-time rendering (several times faster than ray marching)
+- Building the underlying geometric engine for ray tracers and path tracers
+- Creating visualization effects for hard-surface modeling (jewelry, mechanical parts, chess scenes, etc.)
+- Scenes requiring precise shadows, reflections, and refractions (analytic solutions have no sampling error)
+
+## Core Principles in Detail
+
+The core idea of analytic ray tracing is: substitute the ray equation `P(t) = O + tD` into the implicit equation of the geometric body, obtaining an algebraic equation in `t`, then solve it using closed-form formulas.
+
+### Unified Framework
+
+All analytic intersection functions follow the same pattern:
+
+1. **Set up equation**: Substitute the ray parametric form into the geometry's implicit equation
+2. **Simplify and solve**: Use algebraic identities to reduce to a standard form (quadratic/quartic equation)
+3. **Discriminant check**: Discriminant < 0 indicates no intersection
+4. **Select nearest intersection**: Take the smallest positive root satisfying distance constraints
+5. **Compute normal**: Evaluate the gradient of the implicit equation at the intersection point
+
+### Key Mathematical Formulas
+
+**Sphere** `|P-C|² = r²` → quadratic equation: `t² + 2bt + c = 0`
+
+**Plane** `N·P + d = 0` → linear equation: `t = -(N·O + d) / (N·D)`
+
+**Box** Intersection of three pairs of parallel planes → Slab Method: `tN = max(t1.x, t1.y, t1.z), tF = min(t2.x, t2.y, t2.z)`
+
+**Ellipsoid** `|P/R|² = 1` → sphere intersection in scaled space
+
+**Torus** `(|P_xy| - R)² + P_z² = r²` → quartic equation, solved via resolvent cubic
+
+## Implementation Steps in Detail
+
+### Step 1: Ray Generation
+
+**What**: Generate a ray from the camera position through each pixel.
+
+**Why**: This is the starting point of ray tracing. Each pixel corresponds to a ray from the camera through the near plane. The standard approach is to construct a camera coordinate system (right, up, forward) and map normalized screen coordinates to world-space directions.
+
+```glsl
+// Construct camera ray
+vec3 generateRay(vec2 fragCoord, vec2 resolution, vec3 ro, vec3 ta) {
+    vec2 p = (2.0 * fragCoord - resolution) / resolution.y;
+
+    // Build camera coordinate system
+    vec3 cw = normalize(ta - ro);               // forward
+    vec3 cu = normalize(cross(cw, vec3(0, 1, 0))); // right
+    vec3 cv = cross(cu, cw);                    // up
+
+    float fov = 1.5; // Adjustable: field of view control (larger = narrower angle)
+    vec3 rd = normalize(p.x * cu + p.y * cv + fov * cw);
+    return rd;
+}
+```
+
+### Step 2: Ray-Sphere Intersection
+
+**What**: Compute the exact intersection of a ray with a sphere. This is the most fundamental and commonly used intersection function.
+
+**Why**: Substituting the ray `P = O + tD` into the sphere equation `|P - C|² = r²` and expanding yields a quadratic equation in `t`. The discriminant `h = b² - c` determines the number of intersections (0, 1, or 2); the smallest positive root is the nearest intersection.
+
+This is a ubiquitous technique, with two common variants:
+
+**Code (optimized version, assumes sphere centered at origin)**:
+```glsl
+// Ray-sphere intersection (optimized version for sphere at origin)
+// ro: ray origin (sphere center offset already subtracted)
+// rd: ray direction (must be normalized)
+// r:  sphere radius
+// Returns: intersection distance, MAX_DIST if no intersection
+float iSphere(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float h = b * b - c;       // Discriminant (optimized: 4a factor omitted)
+    if (h < 0.0) return MAX_DIST; // No intersection
+
+    h = sqrt(h);
+    float d1 = -b - h;        // Near intersection
+    float d2 = -b + h;        // Far intersection
+
+    // Select the nearest intersection within valid range
+    if (d1 >= distBound.x && d1 <= distBound.y) {
+        normal = normalize(ro + rd * d1);
+        return d1;
+    } else if (d2 >= distBound.x && d2 <= distBound.y) {
+        normal = normalize(ro + rd * d2);
+        return d2;
+    }
+    return MAX_DIST;
+}
+```
+
+**Code (general version, arbitrary sphere center)**:
+```glsl
+// Ray-sphere intersection (general version, supports arbitrary sphere center)
+// sph: vec4(center.xyz, radius)
+float sphIntersect(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    if (h < 0.0) return -1.0;
+    return -b - sqrt(h);  // Returns only the near intersection
+}
+```
+
+### Step 3: Ray-Plane Intersection
+
+**What**: Compute the intersection of a ray with an infinite plane.
+
+**Why**: The plane equation `N·P + d = 0` substituted with the ray yields a linear equation, solved directly by division. This is the simplest intersection primitive, commonly used for floors, walls, Cornell Boxes, etc. Note: when `N·D ≈ 0`, the ray is parallel to the plane.
+
+```glsl
+// Ray-plane intersection
+// planeNormal: plane normal (must be normalized)
+// planeDist:   distance from plane to origin (N·P + planeDist = 0)
+float iPlane(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+             vec3 planeNormal, float planeDist) {
+    float denom = dot(rd, planeNormal);
+    // Only intersects when ray hits the front face of the plane
+    if (denom > 0.0) return MAX_DIST;
+
+    float d = -(dot(ro, planeNormal) + planeDist) / denom;
+
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+
+    normal = planeNormal;
+    return d;
+}
+
+// Quick version: horizontal ground plane (y-axis aligned)
+float iGroundPlane(vec3 ro, vec3 rd, float height) {
+    return -(ro.y - height) / rd.y;
+}
+```
+
+### Step 4: Ray-Box Intersection (Slab Method)
+
+**What**: Compute the intersection of a ray with an axis-aligned bounding box (AABB).
+
+**Why**: The Slab Method treats the box as the intersection of three pairs of parallel planes. It computes the ray's intersection with each pair of planes `(tmin, tmax)`, then takes the maximum of all `tmin` values and the minimum of all `tmax` values. If `tN > tF` or `tF < 0`, there is no intersection. The normal is determined by which face was hit first.
+
+```glsl
+// Ray-box intersection (Slab Method, optimized version)
+// boxSize: box half-size vec3(halfW, halfH, halfD)
+float iBox(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 boxSize) {
+    vec3 m = sign(rd) / max(abs(rd), 1e-8); // Avoid division by zero
+    vec3 n = m * ro;
+    vec3 k = abs(m) * boxSize;
+
+    vec3 t1 = -n - k;  // Near plane intersections
+    vec3 t2 = -n + k;  // Far plane intersections
+
+    float tN = max(max(t1.x, t1.y), t1.z); // Entry distance into the box
+    float tF = min(min(t2.x, t2.y), t2.z); // Exit distance from the box
+
+    if (tN > tF || tF <= 0.0) return MAX_DIST; // No intersection
+
+    if (tN >= distBound.x && tN <= distBound.y) {
+        // Normal: determine which face was hit
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tN;
+    } else if (tF >= distBound.x && tF <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tF;
+    }
+    return MAX_DIST;
+}
+```
+
+### Step 5: Ray-Ellipsoid Intersection
+
+**What**: Compute the intersection of a ray with an ellipsoid.
+
+**Why**: An ellipsoid can be viewed as a sphere scaled differently along each axis. By dividing both the ray origin and direction by the ellipsoid radii `R`, a sphere intersection is performed in scaled space, then the normal is transformed back to the original space. This "space transformation" technique is one of the core ideas of analytic intersection.
+
+```glsl
+// Ray-ellipsoid intersection
+// rad: vec3(rx, ry, rz) three-axis radii
+float iEllipsoid(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 rad) {
+    // Transform to unit sphere space
+    vec3 ocn = ro / rad;
+    vec3 rdn = rd / rad;
+
+    float a = dot(rdn, rdn);
+    float b = dot(ocn, rdn);
+    float c = dot(ocn, ocn);
+    float h = b * b - a * (c - 1.0);
+
+    if (h < 0.0) return MAX_DIST;
+
+    float d = (-b - sqrt(h)) / a;
+
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+
+    // Normal in original space: gradient of implicit equation |P/R|²=1 → P/(R²)
+    normal = normalize((ro + d * rd) / rad);
+    return d;
+}
+```
+
+### Step 6: Ray-Cylinder Intersection
+
+**What**: Compute the intersection of a ray with a finite cylinder (with end caps).
+
+**Why**: Cylinder intersection has two parts: (1) project the problem onto a plane perpendicular to the axis, solving a quadratic equation for side surface intersections; (2) check if the intersection is within the finite length, and if not, test the end cap planes.
+
+```glsl
+// Ray-capped cylinder intersection
+// pa, pb: two endpoints of the cylinder axis
+// ra: cylinder radius
+float iCylinder(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+                vec3 pa, vec3 pb, float ra) {
+    vec3 ca = pb - pa;          // Cylinder axis vector
+    vec3 oc = ro - pa;
+
+    float caca = dot(ca, ca);
+    float card = dot(ca, rd);
+    float caoc = dot(ca, oc);
+
+    // Project onto plane perpendicular to axis, build quadratic equation
+    float a = caca - card * card;
+    float b = caca * dot(oc, rd) - caoc * card;
+    float c = caca * dot(oc, oc) - caoc * caoc - ra * ra * caca;
+    float h = b * b - a * c;
+
+    if (h < 0.0) return MAX_DIST;
+
+    h = sqrt(h);
+    float d = (-b - h) / a;
+
+    // Check if side intersection is within finite length
+    float y = caoc + d * card;
+    if (y > 0.0 && y < caca && d >= distBound.x && d <= distBound.y) {
+        normal = (oc + d * rd - ca * y / caca) / ra;
+        return d;
+    }
+
+    // Test end caps
+    d = ((y < 0.0 ? 0.0 : caca) - caoc) / card;
+    if (abs(b + a * d) < h && d >= distBound.x && d <= distBound.y) {
+        normal = normalize(ca * sign(y) / caca);
+        return d;
+    }
+
+    return MAX_DIST;
+}
+```
+
+### Step 7: Scene Intersection & Shading
+
+**What**: Traverse all objects in the scene, find the nearest intersection, and compute lighting.
+
+**Why**: Scene traversal in analytic ray tracing is linear — each ray tests all objects sequentially. Through the unified intersection API (`distBound` parameter), each time a nearer intersection is found, the search range is automatically shortened, achieving implicit culling.
+
+```glsl
+#define MAX_DIST 1e10
+
+// Unified scene intersection function
+// Returns vec3(current nearest distance, final intersection distance, material ID)
+vec3 worldHit(vec3 ro, vec3 rd, vec2 dist, out vec3 normal) {
+    vec3 d = vec3(dist, 0.0); // (distBound.x, distBound.y, matID)
+    vec3 tmpNormal;
+
+    // Ground plane
+    float t = iPlane(ro, rd, d.xy, normal, vec3(0, 1, 0), 0.0);
+    if (t < d.y) { d.y = t; d.z = 1.0; }
+
+    // Sphere
+    t = iSphere(ro - vec3(0, 0.5, 0), rd, d.xy, tmpNormal, 0.5);
+    if (t < d.y) { d.y = t; d.z = 2.0; normal = tmpNormal; }
+
+    // Box
+    t = iBox(ro - vec3(2, 0.5, 0), rd, d.xy, tmpNormal, vec3(0.5));
+    if (t < d.y) { d.y = t; d.z = 3.0; normal = tmpNormal; }
+
+    return d;
+}
+
+// Basic shading (Lambertian + shadow)
+vec3 shade(vec3 pos, vec3 normal, vec3 rd, vec3 albedo) {
+    vec3 lightDir = normalize(vec3(-1.0, 0.75, 1.0));
+
+    // Diffuse
+    float diff = max(dot(normal, lightDir), 0.0);
+
+    // Ambient
+    float amb = 0.5 + 0.5 * normal.y;
+
+    return albedo * (amb * 0.2 + diff * 0.8);
+}
+```
+
+### Step 8: Reflection & Refraction
+
+**What**: Implement iterative reflection/refraction for non-recursive ray bounces.
+
+**Why**: GLSL does not support recursion, so loops are used to simulate multiple bounces. At each bounce, the intersection point plus offset (epsilon) serves as the new ray origin, with the reflected/refracted direction as the new direction. The Fresnel term determines the energy distribution between reflection and refraction.
+
+```glsl
+#define MAX_BOUNCES 4       // Adjustable: number of reflection bounces (more = more realistic but slower)
+#define EPSILON 0.001        // Adjustable: self-intersection offset
+
+// Schlick Fresnel approximation
+float schlickFresnel(float cosTheta, float F0) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+vec3 radiance(vec3 ro, vec3 rd) {
+    vec3 color = vec3(0.0);
+    vec3 mask = vec3(1.0);
+    vec3 normal;
+
+    for (int i = 0; i < MAX_BOUNCES; i++) {
+        vec3 res = worldHit(ro, rd, vec2(EPSILON, MAX_DIST), normal);
+
+        if (res.z < 0.5) {
+            // No object hit → sky color
+            color += mask * vec3(0.6, 0.8, 1.0);
+            break;
+        }
+
+        vec3 hitPos = ro + rd * res.y;
+        vec3 albedo = getAlbedo(res.z);
+
+        // Fresnel reflection coefficient
+        float F = schlickFresnel(max(0.0, dot(normal, -rd)), 0.04);
+
+        // Add diffuse contribution
+        color += mask * (1.0 - F) * shade(hitPos, normal, rd, albedo);
+
+        // Update mask and ray (reflection)
+        mask *= F * albedo;
+        rd = reflect(rd, normal);
+        ro = hitPos + EPSILON * rd;
+    }
+
+    return color;
+}
+```
+
+## Complete Code Template
+
+For a complete runnable ShaderToy template, see the "Complete Code Template" section in [SKILL.md](SKILL.md), which includes sphere, plane, and box primitives with support for reflections and Blinn-Phong shading.
+
+The following table describes the adjustable parameters in the template:
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `MAX_DIST` | `1e10` | Maximum trace distance |
+| `EPSILON` | `0.001` | Self-intersection offset |
+| `MAX_BOUNCES` | `4` | Maximum number of reflections |
+| `NUM_SPHERES` | `3` | Number of spheres |
+| `FOV` | `1.5` | Field of view (larger = narrower angle) |
+| `GAMMA` | `2.2` | Gamma correction value |
+| `SHADOW_ENABLED` | `true` | Whether shadows are enabled |
+
+## Variant Details
+
+### Variant 1: Path Tracing
+
+Difference from base version: Replaces deterministic reflection with random hemisphere sampling to achieve global illumination. Requires multi-frame accumulation and random number generation.
+
+Key code:
+```glsl
+// Cosine-weighted random hemisphere direction
+vec3 cosWeightedRandomHemisphereDirection(vec3 n, inout float seed) {
+    vec2 r = hash2(seed);
+    vec3 uu = normalize(cross(n, abs(n.y) > 0.5 ? vec3(1,0,0) : vec3(0,1,0)));
+    vec3 vv = cross(uu, n);
+    float ra = sqrt(r.y);
+    float rx = ra * cos(6.2831 * r.x);
+    float ry = ra * sin(6.2831 * r.x);
+    float rz = sqrt(1.0 - r.y);
+    return normalize(rx * uu + ry * vv + rz * n);
+}
+
+// Replace reflect in the bounce loop:
+rd = cosWeightedRandomHemisphereDirection(normal, seed);
+ro = hitPos + EPSILON * rd;
+mask *= mat.albedo; // No Fresnel weighting
+```
+
+### Variant 2: Analytical Soft Shadow
+
+Difference from base version: Uses the analytical distance from a sphere to the ray to compute soft shadow gradients, without additional sampling.
+
+Key code:
+```glsl
+// Sphere soft shadow
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    // d: closest distance from ray to sphere surface, t: distance along ray
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    float t = -b - sqrt(max(h, 0.0));
+    return (t > 0.0) ? max(d, 0.0) / t : 1.0;
+}
+```
+
+### Variant 3: Analytical Antialiasing
+
+Difference from base version: Uses the analytical distance from a sphere to the ray to compute pixel coverage, achieving edge smoothing without multi-sampling.
+
+Key code:
+```glsl
+// Sphere distance information (for antialiasing)
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w; // Closest distance
+    return vec2(d, -b - sqrt(max(h, 0.0)));                // (distance, depth)
+}
+
+// In rendering, use coverage instead of hard boundary:
+float px = 2.0 / iResolution.y; // Pixel size
+vec2 dt = sphDistances(ro, rd, sph);
+float coverage = 1.0 - clamp(dt.x / (dt.y * px), 0.0, 1.0);
+col = mix(bgColor, sphereColor, coverage);
+```
+
+### Variant 4: Refraction (with Snell's Law)
+
+Difference from base version: Adds refracted rays; requires detecting whether the ray hits the surface from outside or inside, and flipping the normal accordingly.
+
+Key code:
+```glsl
+float refrIndex = 1.5; // Adjustable: index of refraction (glass≈1.5, water≈1.33)
+
+// Add refraction branch in the bounce loop:
+bool inside = dot(rd, normal) > 0.0;
+vec3 n = inside ? -normal : normal;
+float eta = inside ? refrIndex : 1.0 / refrIndex;
+vec3 refracted = refract(rd, n, eta);
+
+// Fresnel determines reflection/refraction ratio
+float cosI = abs(dot(rd, n));
+float F = schlick(cosI, pow((1.0 - eta) / (1.0 + eta), 2.0));
+
+if (refracted != vec3(0.0) && hash1(seed) > F) {
+    rd = refracted;
+} else {
+    rd = reflect(rd, n);
+}
+ro = hitPos + rd * EPSILON;
+```
+
+### Variant 5: Higher-Order Algebraic Surfaces (Quartic Surfaces - Sphere4, Goursat, Torus)
+
+Difference from base version: Substitutes the ray into quartic equations, solving via the resolvent cubic method. Suitable for tori, super-ellipsoids, and similar shapes.
+
+Key code:
+```glsl
+// Ray-Sphere4 intersection (|x|⁴+|y|⁴+|z|⁴ = r⁴)
+float iSphere4(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float ra) {
+    float r2 = ra * ra;
+    vec3 d2 = rd*rd, d3 = d2*rd;
+    vec3 o2 = ro*ro, o3 = o2*ro;
+    float ka = 1.0 / dot(d2, d2);
+
+    float k0 = ka * dot(ro, d3);
+    float k1 = ka * dot(o2, d2);
+    float k2 = ka * dot(o3, rd);
+    float k3 = ka * (dot(o2, o2) - r2 * r2);
+
+    // Reduce to depressed quartic, solve via resolvent cubic
+    float c0 = k1 - k0 * k0;
+    float c1 = k2 + 2.0 * k0 * (k0 * k0 - 1.5 * k1);
+    float c2 = k3 - 3.0 * k0 * (k0 * (k0 * k0 - 2.0 * k1) + 4.0/3.0 * k2);
+
+    float p = c0 * c0 * 3.0 + c2;
+    float q = c0 * c0 * c0 - c0 * c2 + c1 * c1;
+    float h = q * q - p * p * p * (1.0/27.0);
+
+    if (h < 0.0) return MAX_DIST; // Convex body: only need to handle 2 real roots case
+
+    h = sqrt(h);
+    float s = sign(q+h) * pow(abs(q+h), 1.0/3.0);
+    float t = sign(q-h) * pow(abs(q-h), 1.0/3.0);
+
+    vec2 v = vec2((s+t) + c0*4.0, (s-t) * sqrt(3.0)) * 0.5;
+    float r = length(v);
+    float d = -abs(v.y) / sqrt(r + v.x) - c1/r - k0;
+
+    if (d >= distBound.x && d <= distBound.y) {
+        vec3 pos = ro + rd * d;
+        normal = normalize(pos * pos * pos); // Gradient: 4x³
+        return d;
+    }
+    return MAX_DIST;
+}
+```
+
+## Performance Optimization Details
+
+### 1. Distance Bound Pruning
+
+The most important optimization. Each time a nearer intersection is found, `distBound.y` is shortened, and subsequent objects are automatically skipped:
+```glsl
+// distBound.y continuously shrinks with opU
+d = opU(d, iSphere(..., d.xy, ...), matId);
+d = opU(d, iBox(..., d.xy, ...), matId);   // Automatically skips objects farther than current hit
+```
+
+### 2. Bounding Sphere / Bounding Box Pre-Test
+
+For complex geometry (tori, Goursat surfaces, etc.), test a simple bounding sphere first to check for possible intersection:
+```glsl
+// Test bounding sphere before torus intersection
+if (iSphere(ro, rd, distBound, tmpNormal, torus.x + torus.y) > distBound.y) {
+    return MAX_DIST; // Bounding sphere missed, skip expensive quartic equation
+}
+```
+
+### 3. Shadow Ray Early Exit
+
+Shadow detection only needs to know "whether there is an occluder," not the nearest intersection, so a simplified intersection function can be used:
+```glsl
+// Fast sphere occlusion test (only checks for intersection, no normal computation)
+float fastSphIntersect(vec3 ro, vec3 rd, vec3 center, float r) {
+    vec3 v = ro - center;
+    float b = dot(v, rd);
+    float c = dot(v, v) - r * r;
+    float d = b * b - c;
+    if (d > 0.0) {
+        float t = -b - sqrt(d);
+        if (t > 0.0) return t;
+        t = -b + sqrt(d);
+        if (t > 0.0) return t;
+    }
+    return -1.0;
+}
+```
+
+### 4. Grid Acceleration Structure
+
+For large numbers of identical primitives (e.g., hundreds of spheres), use a spatial grid to accelerate ray traversal:
+```glsl
+// 3D DDA grid traversal (for scenes with many spheres)
+vec3 pos = floor(ro / GRIDSIZE) * GRIDSIZE;
+vec3 ri = 1.0 / rd;
+vec3 rs = sign(rd) * GRIDSIZE;
+vec3 dis = (pos - ro + 0.5 * GRIDSIZE + rs * 0.5) * ri;
+
+for (int i = 0; i < MAX_STEPS; i++) {
+    // Test spheres in current cell
+    testSphereInGrid(pos.xz, ro, rd, ...);
+    // DDA step to next cell
+    vec3 mm = step(dis.xyz, dis.zyx);
+    dis += mm * rs * ri;
+    pos += mm * rs;
+}
+```
+
+### 5. Avoiding Unnecessary sqrt
+
+Return early when the discriminant is negative, avoiding `sqrt()` on negative numbers. In some scenarios, the discriminant's sign can be used for coarse pre-filtering:
+```glsl
+// Check if ray is heading toward sphere and not inside it
+if (c > 0.0 && b > 0.0) return MAX_DIST; // Fast cull
+```
+
+## Combination Suggestions in Detail
+
+### 1. Analytic Intersection + Raymarching SDF
+
+Use analytic primitives for large simple geometry (ground, bounding boxes), and SDF raymarching for complex details (fractals, smooth boolean operations). Analytic intersection provides precise start/end distances, accelerating marching convergence:
+```glsl
+float d = iBox(ro, rd, distBound, normal, boxSize); // Analytic box
+if (d < MAX_DIST) {
+    // Refine with SDF inside the box
+    float t = d;
+    for (int i = 0; i < 64; i++) {
+        float h = sdfScene(ro + t * rd);
+        if (h < 0.001) break;
+        t += h;
+    }
+}
+```
+
+### 2. Analytic Intersection + Volumetric Effects
+
+Use analytic intersection to obtain precise entry/exit distances, then perform volumetric sampling (clouds, fog, subsurface scattering) within that range:
+```glsl
+// Use analytic ellipsoid intersection to obtain volume bounds
+float tEnter = (-b - sqrt(h)) / a;
+float tExit  = (-b + sqrt(h)) / a;
+float thickness = tExit - tEnter; // Analytic thickness
+
+// Sample volume within [tEnter, tExit]
+vec3 volumeColor = vec3(0.0);
+float dt = (tExit - tEnter) / float(VOLUME_STEPS);
+for (int i = 0; i < VOLUME_STEPS; i++) {
+    vec3 p = ro + rd * (tEnter + float(i) * dt);
+    volumeColor += sampleVolume(p) * dt;
+}
+```
+
+### 3. Analytic Intersection + PBR Material System
+
+Analytic intersection provides precise normals and intersection positions, feeding directly into Cook-Torrance and other PBR shading models:
+```glsl
+// Cook-Torrance BRDF (requires precise normals)
+float D = beckmannDistribution(NdotH, roughness);
+float G = geometricAttenuation(NdotV, NdotL, VdotH, NdotH);
+float F = fresnelSchlick(VdotH, F0);
+vec3 specular = vec3(D * G * F) / (4.0 * NdotV * NdotL);
+```
+
+### 4. Analytic Intersection + Spatial Transforms
+
+Reuse the same intersection function for transformed geometry by rotating/translating/scaling the ray:
+```glsl
+// Rotate object: rotate the ray instead of the object
+vec3 localRo = rotateY(ro - objectPos, angle);
+vec3 localRd = rotateY(rd, angle);
+float t = iBox(localRo, localRd, distBound, localNormal, boxSize);
+// Transform normal back to world space
+normal = rotateY(localNormal, -angle);
+```
+
+### 5. Analytic Intersection + Analytical AO / Soft Shadow / Antialiasing
+
+A fully analytic rendering pipeline: intersection, shadows, occlusion, and edge smoothing all use closed-form formulas, producing zero noise:
+```glsl
+// Fully analytic pipeline (no random sampling, no noise)
+float t = sphIntersect(ro, rd, sph);        // Analytic intersection
+float shadow = sphSoftShadow(hitPos, ld, sph); // Analytic soft shadow
+float ao = sphOcclusion(hitPos, normal, sph);  // Analytic ambient occlusion
+float coverage = sphAntiAlias(ro, rd, sph, px); // Analytic antialiasing
+```
--- a/skills/shader-dev/reference/anti-aliasing.md
+++ b/skills/shader-dev/reference/anti-aliasing.md
@@ -0,0 +1,71 @@
+# Anti-Aliasing Detailed Reference
+
+## Prerequisites
+- Understanding of screen-space derivatives (`dFdx`, `dFdy`, `fwidth`)
+- Multipass buffer setup (for TAA)
+- Basic signal processing concepts
+
+## Sampling Theory (Nyquist)
+
+The **Nyquist-Shannon theorem** states: to accurately represent a signal, sampling rate must be ≥ 2× the highest frequency present. In shader terms:
+- Pixel grid = sampling rate
+- Procedural detail / edge sharpness = signal frequency
+- When detail frequency > pixel frequency → aliasing (moiré, crawling edges)
+
+**Solutions**: either increase sampling rate (SSAA) or reduce signal frequency (analytical AA, filtering).
+
+## SSAA Implementation Details
+
+### Jitter Patterns
+- **Grid**: `offset = vec2(m, n) / AA - 0.5` — simple, uniform coverage
+- **Rotated grid (RGSS)**: 4 samples at rotated positions — better edge coverage for near-horizontal/vertical lines
+- **Halton sequence**: quasi-random low-discrepancy — best coverage for high sample counts
+
+### Performance
+AA=2 (4 samples) is the practical limit for real-time SDF scenes. AA=3 (9 samples) for offline/screenshot quality only.
+
+## SDF Analytical AA Deep Dive
+
+### Why `fwidth` Works
+
+`fwidth(d) = abs(dFdx(d)) + abs(dFdy(d))` approximates how much the SDF value changes across one pixel. Using this as the smoothstep width:
+- Edge transition spans exactly ~1 pixel regardless of zoom level
+- No texture sampling needed — purely analytical
+- Works for any SDF shape
+
+### Signed Distance to Coverage
+
+For a 2D SDF with value `d` at a pixel center:
+```
+coverage ≈ clamp(0.5 - d / fwidth(d), 0.0, 1.0)
+```
+This maps the signed distance to an approximate pixel coverage, equivalent to a box filter over the pixel footprint.
+
+## TAA with Neighborhood Clamping
+
+Full TAA pipeline:
+1. **Jitter**: offset pixel center by Halton(2,3) sequence each frame
+2. **Render**: full scene at jittered position → Buffer A
+3. **Reproject**: use motion vectors to find previous frame's pixel for current position
+4. **Clamp**: restrict history color to the min/max of current frame's 3×3 neighborhood (prevents ghosting)
+5. **Blend**: `output = mix(current, clampedHistory, 0.9)`
+
+### Neighborhood Clamping
+```glsl
+vec3 minCol = vec3(1e10), maxCol = vec3(-1e10);
+for (int x = -1; x <= 1; x++)
+for (int y = -1; y <= 1; y++) {
+    vec3 s = texelFetch(currentBuffer, ivec2(fragCoord) + ivec2(x,y), 0).rgb;
+    minCol = min(minCol, s);
+    maxCol = max(maxCol, s);
+}
+vec3 clampedHistory = clamp(history, minCol, maxCol);
+```
+
+## FXAA Algorithm Walkthrough
+
+1. **Luma computation**: Convert 5 samples (center + NSEW) to luminance
+2. **Edge detection**: `lumaRange = lumaMax - lumaMin` — skip if below threshold
+3. **Edge orientation**: Compare horizontal vs vertical luma gradients to determine edge direction
+4. **Sub-pixel blending**: Sample along the edge direction at 1/3 and 2/3 offsets
+5. **Quality**: The simplified version uses 2 taps; full FXAA 3.11 uses up to 12 taps along the edge for better endpoint detection
--- a/skills/shader-dev/reference/atmospheric-scattering.md
+++ b/skills/shader-dev/reference/atmospheric-scattering.md
@@ -0,0 +1,571 @@
+# Atmospheric & Subsurface Scattering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, variant details, and complete combination code examples.
+
+## Prerequisites
+
+Foundational concepts required before using this Skill:
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, vector normalization
+- **Ray-Sphere Intersection**: given a ray origin and direction, find the intersection distances with a sphere surface
+- **Physical Meaning of Exponential Functions** (Beer-Lambert Law): light attenuates exponentially through a medium, `I = I₀ × e^(-σ×d)`, where σ is the extinction coefficient and d is the distance
+- **Basic Ray Marching Concepts**: advancing step by step along a ray direction, accumulating information at each sample point
+
+## Core Principles
+
+Atmospheric scattering simulates the process of photons passing through the atmosphere and colliding with gas molecules/aerosol particles, changing direction. There are three core physical mechanisms:
+
+### 1. Rayleigh Scattering (Molecular Scattering)
+
+Caused by particles much smaller than the wavelength of light (nitrogen, oxygen molecules). **Short wavelengths (blue light) scatter much more strongly than long wavelengths (red light)** — this is why the sky is blue and sunsets are red.
+
+The scattering coefficient is inversely proportional to the fourth power of wavelength:
+```
+β_R(λ) ∝ 1/λ⁴
+```
+Typical sea-level values for Earth: `β_R = vec3(5.5e-6, 13.0e-6, 22.4e-6)` (RGB channels, in m⁻¹)
+
+**Rayleigh Phase Function** (describes the angular distribution of light scattering, symmetric front-to-back):
+```
+P_R(θ) = 3/(16π) × (1 + cos²θ)
+```
+
+### 2. Mie Scattering (Aerosol Scattering)
+
+Caused by particles roughly the same size as the wavelength of light (water droplets, dust). **Wavelength-independent (all colors scatter equally)**, but with strong forward scattering characteristics, forming the halo around the sun.
+
+Typical sea-level values for Earth: `β_M = vec3(21e-6)` (same for all channels)
+
+**Henyey-Greenstein Phase Function** (describes the strong forward scattering of Mie scattering):
+```
+P_HG(θ, g) = (1 - g²) / (4π × (1 + g² - 2g·cosθ)^(3/2))
+```
+Where `g ∈ (-1, 1)` controls forward scattering strength; typical Earth atmosphere value `g ≈ 0.76 ~ 0.88`.
+
+### 3. Beer-Lambert Attenuation
+
+Exponential attenuation of light through a medium:
+```
+T(A→B) = exp(-∫ σ_e(s) ds)   // Transmittance from A to B
+```
+Where `σ_e` is the extinction coefficient (extinction = scattering + absorption).
+
+### Overall Algorithm Flow
+
+March along the view direction (ray march), at each sample point:
+1. Compute the atmospheric density at that point (decreases exponentially with altitude)
+2. Perform a second march toward the light source to compute the optical depth from the sun to that point
+3. Use Beer-Lambert to calculate the sun light intensity reaching that point
+4. Use the phase function to compute the amount of light scattered toward the camera
+5. Accumulate contributions from all sample points
+
+## Implementation Steps
+
+### Step 1: Ray-Sphere Intersection
+
+**What**: Compute the intersection points of the view ray with the atmospheric shell to determine the ray march start/end range.
+
+**Why**: The atmosphere is a spherical shell around the planet; we only integrate within the shell.
+
+```glsl
+// Ray-sphere intersection, returns distances to two intersection points (t_near, t_far)
+// p: ray origin (relative to sphere center), dir: ray direction, r: sphere radius
+vec2 raySphereIntersect(vec3 p, vec3 dir, float r) {
+    float b = dot(p, dir);
+    float c = dot(p, p) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5); // No intersection
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+Derivation: sphere equation `|p + t·dir|² = r²` expands to `t² + 2t·dot(p,dir) + dot(p,p) - r² = 0`. Since `dir` is normalized, `a=1` can be omitted, and the two t values are solved directly with the quadratic formula.
+
+### Step 2: Define Atmospheric Physical Constants
+
+**What**: Set the scale parameters and scattering coefficients for the planet and atmosphere.
+
+**Why**: These physical constants determine the sky's color characteristics. The different RGB values in Rayleigh produce the blue sky (blue channel has the largest scattering coefficient); Mie's uniform values produce white halos (all wavelengths scatter equally).
+
+```glsl
+#define PLANET_RADIUS 6371e3          // Earth radius (m)
+#define ATMOS_RADIUS  6471e3          // Atmosphere outer radius (m), about 100km above Earth's radius
+#define PLANET_CENTER vec3(0.0)       // Planet center position
+
+// Scattering coefficients (m⁻¹), sea-level values
+#define BETA_RAY vec3(5.5e-6, 13.0e-6, 22.4e-6) // Tunable: Rayleigh scattering, changes sky base color
+#define BETA_MIE vec3(21e-6)                      // Tunable: Mie scattering, changes halo intensity
+#define BETA_OZONE vec3(2.04e-5, 4.97e-5, 1.95e-6) // Tunable: ozone absorption, affects zenith deep blue
+
+// Mie phase function anisotropy parameter
+#define MIE_G 0.76   // Tunable: 0.76~0.88, larger = more concentrated sun halo
+
+// Scale heights (m): altitude at which density drops to 1/e
+#define H_RAY 8000.0  // Tunable: Rayleigh scale height, larger = thicker atmosphere
+#define H_MIE 1200.0  // Tunable: Mie scale height, larger = higher haze layer
+
+// Ozone parameters (optional)
+#define H_OZONE 30e3         // Ozone peak altitude
+#define OZONE_FALLOFF 4e3    // Ozone falloff width
+
+// Sample step counts
+#define PRIMARY_STEPS 32 // Tunable: primary ray steps, more = higher quality
+#define LIGHT_STEPS 8    // Tunable: light direction steps
+```
+
+Parameter tuning guide:
+- Increase overall `BETA_RAY` → more vivid sky color
+- Modify `BETA_RAY` RGB ratios → change sky base hue (e.g., increasing the red component produces a more purple sky)
+- Increase `BETA_MIE` → brighter halo around the sun, more haze
+- Increase `MIE_G` → halo more concentrated toward the sun direction (narrower disk)
+- Increase `H_RAY` → effective atmosphere thickness increases, sky color more uniform
+- Increase `H_MIE` → haze layer higher, low-altitude fog effect weakened
+
+### Step 3: Implement Phase Functions
+
+**What**: Compute the probability distribution of light being scattered at different angles.
+
+**Why**: The Rayleigh phase is symmetrically distributed (scatters both forward and backward); the Mie phase is strongly biased forward. This determines the brightness distribution across the sky — brighter facing the sun (Mie dominant), with some brightness away from the sun (Rayleigh dominant).
+
+```glsl
+// Rayleigh phase function: symmetric front-to-back
+float phaseRayleigh(float cosTheta) {
+    return 3.0 / (16.0 * 3.14159265) * (1.0 + cosTheta * cosTheta);
+}
+
+// Henyey-Greenstein phase function: forward scattering
+// g: anisotropy parameter, 0 = isotropic, close to 1 = strong forward scattering
+float phaseMie(float cosTheta, float g) {
+    float gg = g * g;
+    float num = (1.0 - gg) * (1.0 + cosTheta * cosTheta);
+    float denom = (2.0 + gg) * pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+    return 3.0 / (8.0 * 3.14159265) * num / denom;
+}
+```
+
+Note: the Mie phase function here uses the Cornette-Shanks improved version (with an additional `(1 + cos²θ)` term in the numerator and `(2 + g²)` normalization correction in the denominator), which is more physically accurate than the original HG.
+
+### Step 4: Atmospheric Density Sampling
+
+**What**: Compute the atmospheric particle density at a given point based on altitude.
+
+**Why**: Atmospheric density decreases exponentially with altitude, and different components (Rayleigh, Mie, ozone) have different decay rates. Rayleigh particles (gas molecules) have a scale height of about 8km, Mie particles (aerosols) are concentrated in the lower layer with a scale height of about 1.2km, and ozone peaks at approximately 30km altitude.
+
+```glsl
+// Returns vec3(rayleigh_density, mie_density, ozone_density)
+vec3 atmosphereDensity(vec3 pos, float planetRadius) {
+    float height = length(pos) - planetRadius;
+
+    float densityRay = exp(-height / H_RAY);
+    float densityMie = exp(-height / H_MIE);
+
+    // Ozone: peaks at ~30km altitude, approximated with Lorentzian distribution
+    float denom = (H_OZONE - height) / OZONE_FALLOFF;
+    float densityOzone = (1.0 / (denom * denom + 1.0)) * densityRay;
+
+    return vec3(densityRay, densityMie, densityOzone);
+}
+```
+
+Mathematical explanation of ozone distribution: `1/(x² + 1)` is the form of a Lorentzian/Cauchy distribution, reaching its maximum value of 1 at `x=0` (i.e., `height = H_OZONE`), then symmetrically decaying on both sides. Multiplying by `densityRay` accounts for ozone also being affected by the overall atmospheric density decrease.
+
+### Step 5: Light Direction Optical Depth
+
+**What**: From a sample point on the primary ray, march toward the sun to the atmosphere edge, accumulating optical depth.
+
+**Why**: This determines how much the sunlight has been attenuated before reaching that point. At sunset, the light path passes through more atmosphere, and blue light is scattered away (because Rayleigh scattering coefficient's blue component is largest), leaving only red light — this is the physical reason sunsets are red.
+
+```glsl
+// Compute optical depth from pos along sunDir to the atmosphere edge
+vec3 lightOpticalDepth(vec3 pos, vec3 sunDir) {
+    float atmoDist = raySphereIntersect(pos - PLANET_CENTER, sunDir, ATMOS_RADIUS).y;
+    float stepSize = atmoDist / float(LIGHT_STEPS);
+    float rayPos = stepSize * 0.5;
+
+    vec3 optDepth = vec3(0.0); // (ray, mie, ozone)
+
+    for (int i = 0; i < LIGHT_STEPS; i++) {
+        vec3 samplePos = pos + sunDir * rayPos;
+        float height = length(samplePos - PLANET_CENTER) - PLANET_RADIUS;
+
+        // If sample point is below the surface, it's occluded by the planet
+        if (height < 0.0) return vec3(1e10); // Fully occluded
+
+        vec3 density = atmosphereDensity(samplePos, PLANET_RADIUS);
+        optDepth += density * stepSize;
+
+        rayPos += stepSize;
+    }
+    return optDepth;
+}
+```
+
+`stepSize * 0.5` as the starting offset is the midpoint sampling rule, which approximates the integral more accurately than endpoint sampling.
+
+### Step 6: Primary Scattering Integral (Core Loop)
+
+**What**: Ray march along the view direction, computing the in-scattering contribution at each sample point and accumulating.
+
+**Why**: This is the core of the entire algorithm — integrating all scattered light along the view direction that reaches the eye. Each point's contribution = sunlight reaching that point × density at that point × attenuation from that point to the camera.
+
+Mathematical expression:
+```
+L(camera) = ∫[tStart→tEnd] sunIntensity × T(sun→s) × σ_s(s) × P(θ) × T(s→camera) ds
+```
+Where T is transmittance, σ_s is the scattering coefficient, and P is the phase function.
+
+```glsl
+vec3 calculateScattering(
+    vec3 rayOrigin,    // Camera position
+    vec3 rayDir,       // View direction
+    float maxDist,     // Maximum distance (scene occlusion)
+    vec3 sunDir,       // Sun direction
+    vec3 sunIntensity  // Sun intensity
+) {
+    // Compute ray-atmosphere intersection
+    vec2 atmoHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, ATMOS_RADIUS);
+    if (atmoHit.x > atmoHit.y) return vec3(0.0); // Missed atmosphere
+
+    // Compute ray-planet intersection (ground occlusion)
+    vec2 planetHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, PLANET_RADIUS);
+
+    // Determine march range
+    float tStart = max(atmoHit.x, 0.0);
+    float tEnd = atmoHit.y;
+    if (planetHit.x > 0.0) tEnd = min(tEnd, planetHit.x); // Ground occlusion
+    tEnd = min(tEnd, maxDist); // Scene object occlusion
+
+    float stepSize = (tEnd - tStart) / float(PRIMARY_STEPS);
+
+    // Precompute phase functions (view-sun angle is constant along the entire ray)
+    float cosTheta = dot(rayDir, sunDir);
+    float phaseR = phaseRayleigh(cosTheta);
+    float phaseM = phaseMie(cosTheta, MIE_G);
+
+    // Accumulators
+    vec3 totalRay = vec3(0.0); // Rayleigh in-scatter
+    vec3 totalMie = vec3(0.0); // Mie in-scatter
+    vec3 optDepthI = vec3(0.0); // View direction optical depth (ray, mie, ozone)
+
+    float rayPos = tStart + stepSize * 0.5;
+
+    for (int i = 0; i < PRIMARY_STEPS; i++) {
+        vec3 samplePos = rayOrigin + rayDir * rayPos;
+
+        // 1. Sample density
+        vec3 density = atmosphereDensity(samplePos, PLANET_RADIUS) * stepSize;
+        optDepthI += density;
+
+        // 2. Compute light direction optical depth
+        vec3 optDepthL = lightOpticalDepth(samplePos, sunDir);
+
+        // 3. Beer-Lambert attenuation: total attenuation from sun through this point to camera
+        vec3 tau = BETA_RAY * (optDepthI.x + optDepthL.x)
+                 + BETA_MIE * 1.1 * (optDepthI.y + optDepthL.y) // 1.1 is Mie extinction/scattering ratio
+                 + BETA_OZONE * (optDepthI.z + optDepthL.z);
+        vec3 attenuation = exp(-tau);
+
+        // 4. Accumulate in-scattering
+        totalRay += density.x * attenuation;
+        totalMie += density.y * attenuation;
+
+        rayPos += stepSize;
+    }
+
+    // 5. Final color = scattering coefficient × phase function × accumulated scattering
+    return sunIntensity * (
+        totalRay * BETA_RAY * phaseR +
+        totalMie * BETA_MIE * phaseM
+    );
+}
+```
+
+Key detail explanations:
+- `1.1` is the Mie extinction/scattering ratio: Mie particles not only scatter light but also absorb a small amount, so the extinction coefficient ≈ 1.1 × scattering coefficient
+- `optDepthI` records all three components simultaneously for correctly compositing all extinction contributions in the attenuation calculation
+- Phase functions are precomputed outside the loop because the angle between view and sun directions is constant along the entire ray
+
+### Step 7: Tone Mapping and Output
+
+**What**: Apply tone mapping and gamma correction to the HDR scattering results.
+
+**Why**: The scattering calculation outputs HDR linear values (potentially much greater than 1.0), which must be mapped to [0,1] for display. Different tonemapping methods affect the final look:
+
+- **Exposure mapping `1 - exp(-x)`**: simplest, naturally saturates and never overexposes, but limited highlight detail
+- **Reinhard**: preserves more highlight detail, suitable for high dynamic range scenes
+- **ACES**: cinematic tone mapping, richer colors but more complex implementation
+
+```glsl
+// Method 1: Simple exposure mapping (most common)
+vec3 tonemapExposure(vec3 color) {
+    return 1.0 - exp(-color); // Natural saturation, never overexposes
+}
+
+// Method 2: Reinhard (preserves more highlight detail)
+vec3 tonemapReinhard(vec3 color) {
+    float l = dot(color, vec3(0.2126, 0.7152, 0.0722));
+    vec3 tc = color / (color + 1.0);
+    return mix(color / (l + 1.0), tc, tc);
+}
+
+// Gamma correction
+vec3 gammaCorrect(vec3 color) {
+    return pow(color, vec3(1.0 / 2.2));
+}
+```
+
+Reinhard implementation detail: uses a blend of luminance `l` (perceptually weighted) and per-channel mapping `tc`, balancing color fidelity and highlight detail.
+
+## Variant Details
+
+### Variant 1: Non-Physical Analytical Approximation (No Ray March)
+
+**Difference from the base version**: No ray marching at all — uses analytical functions to simulate sky color with extremely high performance. Not based on physical scattering equations, but uses empirical formulas to simulate visual effects.
+
+**Use cases**: Mobile platforms, backgrounds, scenes with low physical accuracy requirements.
+
+**How it works**:
+- `zenithDensity` simulates atmospheric density variation with viewing angle (denser looking toward the horizon)
+- `getSkyAbsorption` uses `exp2` to simulate atmospheric absorption (similar to Beer-Lambert)
+- `getMie` uses distance falloff + smoothstep to simulate the sun halo
+- The final blend considers the sun altitude's effect on the overall sky color tone
+
+**Performance comparison**: No loops, no ray march — only a small amount of math per pixel, 10-50x faster than the base version.
+
+### Variant 2: With Ozone Absorption Layer
+
+**Difference from the base version**: Adds ozone absorption as a third component, making the zenith deeper blue and introducing subtle purple tones at sunset.
+
+**Use cases**: Pursuing more physically accurate sky colors.
+
+**Physical principle**: Ozone primarily absorbs in the Chappuis band (500-700nm, i.e., green and red), which makes the zenith direction (short light path, remaining light after Rayleigh scattering is filtered by ozone) appear deeper blue. At sunset, the long light path makes ozone absorption more significant — after red is Rayleigh-scattered and green is ozone-absorbed, only blue-purple tones remain.
+
+**Key modification**: Set `BETA_OZONE` to a non-zero value in the complete template to enable — already built-in.
+
+### Variant 3: Subsurface Scattering (SSS)
+
+**Difference from the base version**: Scatters inside a semi-transparent object rather than in the atmosphere. Estimates object thickness via SDF and controls light transmission with thickness.
+
+**Use cases**: Candles, skin, jelly, leaves, and other translucent materials.
+
+**How it works**:
+1. Use Snell's law (`refract`) to calculate the refracted direction after light enters the object
+2. March along the refracted direction in the SDF, accumulating negative distance values (SDF is negative inside the object)
+3. Greater accumulated negative value means a thicker object, less light transmission
+4. Use a power function to control the attenuation curve (`pow` parameter is tunable)
+
+**Tunable parameters**:
+- IOR (index of refraction): 1.3 (water) ~ 1.5 (glass) ~ 2.0 (gemstone), affects refraction angle
+- `MAX_SCATTER`: maximum scatter march distance, affects SSS penetration depth
+- `SCATTER_STRENGTH`: scattering intensity multiplier
+- Step size 0.2: smaller = more accurate but slower
+
+**Usage**:
+```glsl
+float ss = max(0.0, subsurface(hitPos, viewDir, normal));
+vec3 sssColor = albedo * smoothstep(0.0, 2.0, pow(ss, 0.6));
+finalColor = mix(lambertian, sssColor, 0.7) + specular;
+```
+
+### Variant 4: LUT Precomputation Pipeline (Production-Grade)
+
+**Difference from the base version**: Precomputes Transmittance, Multiple Scattering, and Sky-View into separate LUT textures; at runtime only performs lookups, with extremely high frame rates.
+
+**Use cases**: Production-grade sky rendering in game engines and real-time applications requiring high frame rates.
+
+**Architecture details**:
+
+- **Buffer A (Transmittance LUT)**: 256x64 texture, parameterized by (sunCosZenith, height), storing transmittance from a certain height along a direction to the atmosphere edge. This is the most fundamental LUT; all other LUTs depend on it.
+
+- **Buffer B (Multiple Scattering LUT)**: 32x32 texture, precomputing multiple scattering contributions. Single scattering is not accurate enough — in the real atmosphere, light is scattered multiple times. This LUT uses an iterative method to approximate the cumulative effect of multiple scattering.
+
+- **Buffer C (Sky-View LUT)**: 200x200 texture, storing sky colors for all directions. Uses nonlinear height mapping to allocate more precision to the horizon region (where color changes are most dramatic).
+
+- **Image Pass**: Only looks up the Sky-View LUT + overlays the sun disk; each pixel requires only one texture query.
+
+```glsl
+// Transmittance LUT query (from Hillaire 2020 implementation)
+vec3 getValFromTLUT(sampler2D tex, vec2 bufferRes, vec3 pos, vec3 sunDir) {
+    float height = length(pos);
+    vec3 up = pos / height;
+    float sunCosZenithAngle = dot(sunDir, up);
+    vec2 uv = vec2(
+        256.0 * clamp(0.5 + 0.5 * sunCosZenithAngle, 0.0, 1.0),
+        64.0 * max(0.0, min(1.0, (height - groundRadiusMM) / (atmosphereRadiusMM - groundRadiusMM)))
+    );
+    uv /= bufferRes;
+    return texture(tex, uv).rgb;
+}
+```
+
+**Performance**: The Image Pass is nearly O(1); all heavy computation is done in low-resolution LUTs. LUTs can be incrementally updated as the sun angle changes.
+
+### Variant 5: Analytical Fast Atmosphere (No Ray March but Supports Aerial Perspective)
+
+**Difference from the base version**: Uses analytical exponential approximations instead of ray marching, while supporting distance-attenuated aerial perspective effects.
+
+**Use cases**: Game scenes requiring atmospheric perspective without per-pixel ray marching.
+
+**How it works**:
+- `getRayleighMie` uses `1 - exp(-x)` form to approximate the scattering integral (analytical solution based on Beer-Lambert)
+- `getLightTransmittance` uses multiple exponential term superposition to approximate optical depth at different sun altitudes
+- No loops required — only a fixed number of math operations per pixel
+
+```glsl
+// Based on Felix Westin's Fast Atmosphere
+void getRayleighMie(float opticalDepth, float densityR, float densityM, out vec3 R, out vec3 M) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    R = (1.0 - exp(-opticalDepth * densityR * C_RAYLEIGH / 2.5)) * 2.5;
+    M = (1.0 - exp(-opticalDepth * densityM * C_MIE / 0.5)) * 0.5;
+}
+
+// Analytical approximation of light transmittance (replaces ray march)
+vec3 getLightTransmittance(vec3 lightDir) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    vec3 C_OZONE = vec3(0.650, 1.881, 0.085) * 1e-6;
+    float extinction = exp(-clamp(lightDir.y + 0.05, 0.0, 1.0) * 40.0)
+                     + exp(-clamp(lightDir.y + 0.5, 0.0, 1.0) * 5.0) * 0.4
+                     + pow(clamp(1.0 - lightDir.y, 0.0, 1.0), 2.0) * 0.02
+                     + 0.002;
+    return exp(-(C_RAYLEIGH + C_MIE + C_OZONE) * extinction * 1e6);
+}
+```
+
+**Mathematical basis of the analytical approximation**: Treating the atmosphere as a single uniform layer, the scattering integral `∫ e^(-σx) dx` has the analytical solution `(1 - e^(-σL)) / σ`. The `2.5` and `0.5` in the code are empirical scaling factors to make the analytical result visually approximate a full ray march.
+
+## Performance Optimization Details
+
+### Bottleneck 1: Nested Ray March (O(N×M) Samples)
+
+N primary ray steps × M light direction steps per step = N×M density calculations.
+
+**Optimization approaches**:
+- **Reduce step counts**: Use `PRIMARY_STEPS=12, LIGHT_STEPS=4` on mobile; visual difference is small but performance improvement is significant
+- **Analytical approximation**: Replace the light direction ray march with the Fast Atmosphere approach, reducing complexity from O(N×M) to O(N)
+- **Transmittance LUT**: After precomputation, runtime only performs lookups, reducing complexity to O(N) or even O(1)
+
+### Bottleneck 2: Dense exp() and pow() Calls
+
+Multiple exponential function calls at each sample point — these are relatively expensive operations on GPUs.
+
+**Optimization approaches**:
+- Replace Henyey-Greenstein phase function with Schlick approximation:
+```glsl
+// Schlick approximation, only 1 division, no pow
+float k = 1.55 * g - 0.55 * g * g * g;
+float phaseSchlick = (1.0 - k * k) / (4.0 * PI * pow(1.0 + k * cosTheta, 2.0));
+```
+- Combine multiple exp calls: `exp(a) * exp(b) = exp(a+b)`, reducing exp call count
+- Use `exp2` instead of `exp` in scenarios with lower precision requirements (exp2 is faster on some GPUs)
+
+### Bottleneck 3: Full-Screen Per-Pixel Computation
+
+Each pixel independently computes the full scattering.
+
+**Optimization approaches**:
+- **Sky-View LUT**: Render the sky to a low-resolution LUT (e.g., 200x200), then look up at full resolution. Allocate more resolution near the horizon (nonlinear mapping)
+- **Half-resolution rendering**: Compute scattering at half resolution, then bilinearly upsample. For sky — a low-frequency signal — quality loss is minimal
+
+### Bottleneck 4: High Sample Count Needed to Avoid Banding
+
+Low step counts lead to visible banding artifacts.
+
+**Optimization approaches**:
+- **Non-uniform stepping**: `newT = ((i + 0.3) / numSteps) * tMax`, offset by 0.3 instead of 0.5 to reduce visual artifacts
+- **Jittered start offset**: `startOffset += hash(fragCoord) * stepSize`, randomly offsetting the march start per pixel
+- **Temporal blue noise dithering**: Use temporal blue noise to jitter sample positions across frames; combined with TAA, banding is nearly eliminated
+
+## Combination Suggestions
+
+### 1. Atmospheric Scattering + Volumetric Clouds
+
+Atmospheric scattering provides sky background color and light source color; volumetric cloud lighting uses the atmospheric transmittance to determine the sun light color reaching the cloud layer.
+
+Key integration points:
+- Setting the `maxDist` parameter of the atmospheric scattering function to the cloud layer distance achieves correct pre-cloud atmospheric effects
+- During cloud layer rendering, use the transmittance LUT to get the sun light color upon reaching the cloud layer
+- Sky color behind clouds should be the full atmospheric scattering result
+
+```glsl
+// Pseudo-code example
+float cloudDist = rayMarchClouds(rayOrigin, rayDir);
+vec3 cloudColor = calculateCloudLighting(cloudPos, sunDir, transmittance);
+vec3 skyBehind = calculateScattering(rayOrigin, rayDir, 1e12, sunDir, sunIntensity);
+vec3 skyBeforeCloud = calculateScattering(rayOrigin, rayDir, cloudDist, sunDir, sunIntensity);
+
+// Compositing: pre-cloud atmosphere + cloud × cloud opacity + post-cloud sky × transmittance
+vec3 final = skyBeforeCloud + cloudColor * cloudAlpha + skyBehind * (1.0 - cloudAlpha) * atmosphereTransmittance;
+```
+
+### 2. Atmospheric Scattering + SDF Scene
+
+Pass the SDF ray march hit distance as the `maxDist` parameter to `calculateScattering()`, and the scene color as `sceneColor`, to automatically get aerial perspective effects.
+
+```glsl
+// SDF ray march yields hit information
+float hitDist = sdfRayMarch(rayOrigin, rayDir);
+vec3 sceneColor = shadeSurface(hitPos, normal, lightDir);
+
+// Atmospheric scattering automatically handles perspective
+vec3 final = calculateScattering(
+    rayOrigin, rayDir, hitDist,
+    sceneColor, sunDir, SUN_INTENSITY
+);
+```
+
+### 3. Atmospheric Scattering + God Rays
+
+Adding an occlusion parameter in the scattering integral (via shadow map or additional ray march for occlusion detection) can produce volumetric light beam effects.
+
+```glsl
+// Add occlusion detection in the main loop
+for (int i = 0; i < PRIMARY_STEPS; i++) {
+    // ... density sampling ...
+
+    // God rays: check if sample point is occluded
+    float occlusion = 1.0;
+    if (sdfScene(samplePos + sunDir * 0.1) < 0.0) {
+        occlusion = 0.0; // Occluded by scene object, no in-scattering
+    }
+
+    totalRay += density.x * attenuation * occlusion;
+    totalMie += density.y * attenuation * occlusion;
+}
+```
+
+The Fast Atmosphere example implements this functionality through the `occlusion` parameter.
+
+### 4. Atmospheric Scattering + Terrain Rendering
+
+Use aerial perspective: distant terrain colors blend into atmospheric scattering color based on distance.
+
+Key formula:
+```glsl
+// Basic aerial perspective
+vec3 finalColor = terrainColor * transmittance + inscattering;
+
+// transmittance: atmospheric transmittance from camera to terrain point
+// inscattering: scattered light between camera and terrain point
+// Distant objects: transmittance → 0, inscattering dominates → appears blue/gray
+```
+
+### 5. SSS + PBR Materials
+
+Combine subsurface scattering with GGX microsurface specular and Fresnel reflection. SSS contribution replaces part of the diffuse (via mix), with the specular layer added on top:
+
+```glsl
+// Complete PBR + SSS shading
+float fresnel = pow(max(0.0, 1.0 + dot(normal, viewDir)), 5.0);
+vec3 diffuse = mix(lambert, sssContribution, 0.7);  // SSS replaces part of diffuse
+vec3 final = ambient + albedo * diffuse + specular + fresnel * envColor;
+```
+
+Layering logic:
+1. Bottom layer: ambient light
+2. Diffuse layer: blend of Lambert and SSS (SSS allows light to pass through dark sides)
+3. Specular layer: GGX microsurface reflection
+4. Fresnel layer: enhanced environment reflection at grazing angles
--- a/skills/shader-dev/reference/camera-effects.md
+++ b/skills/shader-dev/reference/camera-effects.md
@@ -0,0 +1,80 @@
+# Camera Effects Detailed Reference
+
+## Prerequisites
+- Ray marching fundamentals (ray origin, ray direction)
+- Multipass buffers (for accumulation-based DoF)
+- Hash functions for stochastic sampling
+
+## Thin Lens Model Derivation
+
+A real camera lens focuses light from a focal plane onto the sensor. Points not on the focal plane project to a **circle of confusion (CoC)** on the sensor.
+
+### Circle of Confusion Formula
+```
+CoC = |S2 - S1| × A × f / (S1 × (S2 - f))
+```
+Where:
+- `S1` = focal distance (distance to in-focus plane)
+- `S2` = object distance
+- `A` = aperture diameter
+- `f` = focal length
+
+### Simplified for Shaders
+```
+CoC ≈ apertureSize × |depth - focalDistance| / depth
+```
+
+### Ray-Based Implementation
+Instead of computing CoC per pixel, we model the physical process:
+1. Choose a random point on the aperture disk → new ray origin
+2. The focal point (where the original ray hits the focal plane) stays fixed
+3. New ray direction = `normalize(focalPoint - newOrigin)`
+4. Average many such samples → natural bokeh with correct occlusion
+
+### Aperture Shape
+- Circular: `vec2 p = sqrt(r) * vec2(cos(a), sin(a))` — uniform disk
+- Polygonal: reject samples outside polygon for hexagonal/octagonal bokeh
+- The `sqrt(r)` is critical for uniform distribution (area-preserving)
+
+## Poisson Disk Sampling
+
+Pre-computed 16-point Poisson disk for blur kernels:
+```glsl
+const vec2 poissonDisk[16] = vec2[](
+    vec2(-0.94201624, -0.39906216), vec2(0.94558609, -0.76890725),
+    vec2(-0.09418410, -0.92938870), vec2(0.34495938,  0.29387760),
+    vec2(-0.91588581,  0.45771432), vec2(-0.81544232, -0.87912464),
+    vec2(-0.38277543,  0.27676845), vec2(0.97484398,  0.75648379),
+    vec2(0.44323325, -0.97511554),  vec2(0.53742981, -0.47373420),
+    vec2(-0.26496911, -0.41893023), vec2(0.79197514,  0.19090188),
+    vec2(-0.24188840,  0.99706507), vec2(-0.81409955,  0.91437590),
+    vec2(0.19984126,  0.78641367),  vec2(0.14383161, -0.14100790)
+);
+```
+
+Advantages over regular grid: no structured aliasing patterns, better coverage per sample count.
+
+## Motion Blur Approaches
+
+### Stochastic Time Sampling (Ray Marching)
+For each pixel, pick a random time within the shutter interval:
+```
+t_sample = iTime + (rand - 0.5) * shutterDuration
+```
+Use `t_sample` for all scene animation. Accumulate multiple frames for convergence.
+
+### Velocity Buffer (Post-Process)
+1. Render scene + store per-pixel velocity vectors
+2. For each pixel, sample along the velocity direction
+3. Weight samples by distance from center (triangle filter)
+
+### Hybrid
+Use temporal accumulation (TAA-style) with per-frame time jitter — converges over frames with no per-frame cost increase.
+
+## Film Grain Characteristics
+
+Real film grain properties:
+- **Luminance-dependent**: More visible in shadows, less in highlights
+- **Temporally varying**: Different pattern each frame (use `fract(iTime)` in hash seed)
+- **Spatially uncorrelated**: Use pixel coordinates in hash, not UV (grain should be screen-resolution)
+- **Intensity**: 0.02-0.05 for subtle, 0.1+ for stylized/vintage look
--- a/skills/shader-dev/reference/cellular-automata.md
+++ b/skills/shader-dev/reference/cellular-automata.md
@@ -0,0 +1,635 @@
+# Cellular Automata & Reaction-Diffusion — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, variant details, performance analysis, and complete code examples for combination suggestions.
+
+---
+
+## Prerequisites
+
+### GLSL Basics
+- **Uniform variables**: `iResolution` (viewport resolution), `iFrame` (current frame number), `iTime` (elapsed time), `iMouse` (mouse position)
+- **Texture sampling**: `texture(iChannel0, uv)` samples using UV coordinates (with filtering), `texelFetch(iChannel0, ivec2(px), 0)` samples at exact integer pixel coordinates
+- **Multi-buffer feedback architecture**: ShaderToy supports Buffer A~D, each buffer can bind itself or other buffers as iChannel input
+
+### ShaderToy Multi-Pass Mechanism
+Data written by Buffer A → next frame Buffer A reads via iChannel0 self-feedback. This is the core mechanism for inter-frame state persistence. The Image pass handles final visual output.
+
+### 2D Grid Sampling
+- Pixel coordinates `fragCoord` are floating point, range `[0.5, resolution - 0.5]`
+- UV coordinates = `fragCoord / iResolution.xy`, range `[0, 1]`
+- `texelFetch(iChannel0, ivec2(px), 0)` reads the specified pixel exactly (no filtering), suitable for discrete CA
+- `texture(iChannel0, uv)` uses hardware bilinear interpolation, suitable for continuous RD
+
+### Basic Vector Math
+- `normalize(v)`: normalize a vector
+- `dot(a, b)`: dot product
+- `cross(a, b)`: cross product
+- `length(v)`: vector length
+
+### Convolution Kernel Concepts
+A 3x3 stencil performs a weighted sum of the center pixel and its 8 neighbors. Different weights produce different effects:
+- **Laplacian kernel**: Detects deviation of the current value from the neighborhood mean (diffusion)
+- **Gaussian kernel**: Blur/smoothing
+- **Sobel kernel**: Edge detection/gradient computation
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Grid State Storage and Self-Feedback
+
+**What**: Use ShaderToy's Buffer self-read mechanism to persistently store simulation state in a buffer texture. Each frame reads the previous frame's state, computes new state, and writes it back.
+
+**Why**: GPU shaders are inherently stateless; buffer inter-frame feedback is required for time-step iteration. State is stored in RGBA channels — CA can use a single channel for alive/dead, while RD uses two channels for u and v respectively.
+
+**Code**:
+```glsl
+// Buffer A: read previous frame's own state
+// iChannel0 is bound to Buffer A itself (self-feedback)
+vec4 prevState = texelFetch(iChannel0, ivec2(fragCoord), 0);
+
+// Can also sample with UV coordinates (supports texture filtering)
+vec2 uv = fragCoord / iResolution.xy;
+vec4 prevSmooth = texture(iChannel0, uv);
+```
+
+**Key points**:
+- `texelFetch` performs no filtering, reads a single pixel exactly, suitable for discrete CA
+- `texture` uses hardware bilinear interpolation, blending adjacent pixel values near pixel boundaries, suitable for continuous RD
+- The four RGBA channels can store different state variables (e.g., u, v, velocity field components, etc.)
+
+### Step 2: Initialization (Noise Seeding)
+
+**What**: Initialize the grid with pseudo-random noise on the first frame (or first few frames) to provide seeds for the simulation.
+
+**Why**: Both CA and RD need initial perturbation to start evolution. Different initial conditions produce different final patterns. In practice, seeding is often repeated for the first 2~10 frames, since ShaderToy occasionally skips the first frame.
+
+**Code**:
+```glsl
+// Simple hash noise function
+float hash1(float n) {
+    return fract(sin(n) * 138.5453123);
+}
+
+vec3 hash33(in vec2 p) {
+    float n = sin(dot(p, vec2(41, 289)));
+    return fract(vec3(2097152, 262144, 32768) * n);
+}
+
+// Initialization branch in mainImage
+if (iFrame < 2) {
+    // CA: random binary initialization
+    float f = step(0.9, hash1(fragCoord.x * 13.0 + hash1(fragCoord.y * 71.1)));
+    fragColor = vec4(f, 0.0, 0.0, 0.0);
+} else if (iFrame < 10) {
+    // RD: random continuous value initialization
+    vec3 noise = hash33(fragCoord / iResolution.xy + vec2(53, 43) * float(iFrame));
+    fragColor = vec4(noise, 1.0);
+}
+```
+
+**Key points**:
+- `hash1` is a simple pseudo-random number generator based on `sin`, producing values in [0, 1)
+- `hash33` generates a 3D random vector from 2D coordinates, used for multi-channel RD initialization
+- CA initialization uses `step(0.9, ...)` to produce approximately 10% density of living cells
+- RD initialization uses continuous random values, with `iFrame` added so each frame seeds differently
+- Multi-frame seeding (`iFrame < 10`) ensures sufficiently rich initial perturbation
+
+### Step 3: Neighbor Sampling and Laplacian Computation
+
+**What**: Perform weighted sampling of the current pixel's 8 (or 4) neighbors, computing the Laplacian or neighbor count.
+
+**Why**: This is the core of CA/RD — local rules drive state updates through neighbor information. The Laplacian describes how much a point's value deviates from the surrounding average, physically corresponding to diffusion. The nine-point stencil is more accurate and isotropic than a simple cross stencil.
+
+**Three Sampling Methods Compared**:
+
+| Method | Use Case | Advantages | Disadvantages |
+|------|----------|------|------|
+| Method A: Discrete neighbor counting | CA | Exact integer coordinates, no filtering error | Can only handle discrete states |
+| Method B: Nine-point Laplacian | RD | Good isotropy, high accuracy | 9 texture samples |
+| Method C: 3x3 Gaussian blur | Simplified RD | Good smoothing effect | Not a true Laplacian |
+
+**Method A Code Details**:
+```glsl
+// Discrete CA neighbor counting using texelFetch for exact reads
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;  // Wrap-around boundary (toroidal topology), left overflow appears on right
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+
+ivec2 px = ivec2(fragCoord);
+// Moore neighborhood: sum of 8 neighbors
+int k = cell(px + ivec2(-1,-1)) + cell(px + ivec2(0,-1)) + cell(px + ivec2(1,-1))
+      + cell(px + ivec2(-1, 0))                          + cell(px + ivec2(1, 0))
+      + cell(px + ivec2(-1, 1)) + cell(px + ivec2(0, 1)) + cell(px + ivec2(1, 1));
+```
+
+**Method B Code Details**:
+```glsl
+// Nine-point Laplacian stencil (for RD)
+// Weights: diagonal 0.5, cross 1.0, center -6.0 (sum = 0, ensuring Laplacian of a constant field is zero)
+vec2 laplacian(vec2 uv) {
+    vec2 px = 1.0 / iResolution.xy;
+    vec4 P = vec4(px, 0.0, -px.x);
+    return
+        0.5 * texture(iChannel0, uv - P.xy).xy   // bottom-left
+      +       texture(iChannel0, uv - P.zy).xy   // bottom
+      + 0.5 * texture(iChannel0, uv - P.wy).xy   // bottom-right
+      +       texture(iChannel0, uv - P.xz).xy   // left
+      - 6.0 * texture(iChannel0, uv).xy           // center
+      +       texture(iChannel0, uv + P.xz).xy   // right
+      + 0.5 * texture(iChannel0, uv + P.wy).xy   // top-left
+      +       texture(iChannel0, uv + P.zy).xy   // top
+      + 0.5 * texture(iChannel0, uv + P.xy).xy;  // top-right
+}
+```
+
+**Method C Code Details**:
+```glsl
+// 3x3 weighted blur (Gaussian approximation)
+// Weights: diagonal 1, cross 2, center 4, total 16
+// Uses vec3 swizzle to cleverly encode 9 offset directions
+float blur3x3(vec2 uv) {
+    vec3 e = vec3(1, 0, -1);  // e.x=1, e.y=0, e.z=-1
+    vec2 px = 1.0 / iResolution.xy;
+    float res = 0.0;
+    // e.xx=(1,1), e.xz=(1,-1), e.zx=(-1,1), e.zz=(-1,-1) → four diagonals
+    res += texture(iChannel0, uv + e.xx * px).x + texture(iChannel0, uv + e.xz * px).x
+         + texture(iChannel0, uv + e.zx * px).x + texture(iChannel0, uv + e.zz * px).x;       // ×1
+    // e.xy=(1,0), e.yx=(0,1), e.yz=(0,-1), e.zy=(-1,0) → four edges
+    res += (texture(iChannel0, uv + e.xy * px).x + texture(iChannel0, uv + e.yx * px).x
+          + texture(iChannel0, uv + e.yz * px).x + texture(iChannel0, uv + e.zy * px).x) * 2.; // ×2
+    // e.yy=(0,0) → center
+    res += texture(iChannel0, uv + e.yy * px).x * 4.;                                          // ×4
+    return res / 16.0;
+}
+```
+
+### Step 4: State Update Rules
+
+**What**: Apply CA rules or RD differential equations based on neighbor information to compute new state values.
+
+**Why**: This is the core simulation logic. CA uses discrete decisions (birth/survival/death), RD uses continuous differential equations with Euler integration.
+
+**CA Rule Details**:
+
+Conway's Game of Life B3/S23 means:
+- B3 = Birth when 3 neighbors
+- S23 = Survive when 2 or 3 neighbors
+
+```glsl
+int e = cell(px);  // current state (0 or 1)
+// Equivalent to: if (k==3) born/survive; else if (k==2 && alive) survive; else die
+float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+```
+
+**Generic Bitmask Rules**: Bitmasks can encode arbitrary CA rule sets without modifying logic code. For example:
+- B3/S23 → bornset=8 (binary 1000, bit 3), stayset=12 (binary 1100, bits 2,3)
+- B36/S23 → bornset=40 (bits 3,5), stayset=12
+
+```glsl
+// stayset/bornset are bitmasks; bit n=1 means triggered when neighbor count is n
+float ff = 0.0;
+if (currentAlive) {
+    ff = ((stayset & (1 << (k - 1))) > 0) ? float(k) : 0.0;  // survive
+} else {
+    ff = ((bornset & (1 << (k - 1))) > 0) ? 1.0 : 0.0;       // birth
+}
+```
+
+**RD Gray-Scott Update Details**:
+
+Physical meaning of the Gray-Scott equations:
+- `Du·∇²u`: diffusion of u (spatial smoothing)
+- `-u·v²`: reaction consumption (u decreases when u and v meet)
+- `F·(1-u)`: replenishment of u (feed, pulling u back toward 1.0)
+- `Dv·∇²v`: diffusion of v
+- `+u·v²`: reaction production (v increases when u and v meet)
+- `-(F+k)·v`: removal of v (combined decay from kill + feed)
+
+```glsl
+float u = prevState.x;
+float v = prevState.y;
+vec2 Duv = laplacian(uv) * DIFFUSION;  // DIFFUSION = vec2(Du, Dv)
+float du = Duv.x - u * v * v + F * (1.0 - u);
+float dv = Duv.y + u * v * v - (F + k) * v;
+// Forward Euler integration, clamp to prevent numerical instability
+fragColor.xy = clamp(vec2(u + du * DT, v + dv * DT), 0.0, 1.0);
+```
+
+**Simplified RD Details**:
+This approach doesn't use the standard Gray-Scott equations, but instead uses gradient-driven displacement and random decay to approximate reaction-diffusion behavior. The results are more organic but less controllable.
+
+```glsl
+float avgRD = blur3x3(uv);
+vec2 pwr = (1.0 / iResolution.xy) * 1.5;
+// Compute gradient (similar to Sobel)
+vec2 lap = vec2(
+    texture(iChannel0, uv + vec2(pwr.x, 0)).y - texture(iChannel0, uv - vec2(pwr.x, 0)).y,
+    texture(iChannel0, uv + vec2(0, pwr.y)).y - texture(iChannel0, uv - vec2(0, pwr.y)).y
+);
+uv = uv + lap * (1.0 / iResolution.xy) * 3.0;  // Displace sampling point along gradient (diffusion)
+float newRD = texture(iChannel0, uv).x + (noise.z - 0.5) * 0.0025 - 0.002;  // Random decay
+newRD += dot(texture(iChannel0, uv + (noise.xy - 0.5) / iResolution.xy).xy, vec2(1, -1)) * 0.145;  // Reaction term
+```
+
+### Step 5: Visualization and Coloring
+
+**What**: Map simulation buffer data to visual effects — color mapping, gradient lighting, bump mapping, etc.
+
+**Why**: Raw simulation data consists of scalar/vector values in 0~1 range, requiring artistic processing to produce appealing visuals. The most common technique is computing the gradient of buffer values to obtain normal information for bump lighting.
+
+**Color mapping techniques**:
+```glsl
+// Basic: nonlinear color separation
+// c is a [0,1] value; different pow exponents make RGB channels respond at different rates
+float c = 1.0 - texture(iChannel0, uv).y;
+vec3 col = pow(vec3(1.5, 1, 1) * c, vec3(1, 4, 12));
+// R channel responds linearly, G channel with 4th power (rapid decay in dark areas), B channel with 12th power (blue only at brightest spots)
+```
+
+**Gradient normal computation**:
+```glsl
+// Compute surface normals from scalar field (for bump map lighting)
+vec3 normal(vec2 uv) {
+    vec3 delta = vec3(1.0 / iResolution.xy, 0.0);
+    // Central difference for x and y gradients
+    float du = texture(iChannel0, uv + delta.xz).x - texture(iChannel0, uv - delta.xz).x;
+    float dv = texture(iChannel0, uv + delta.zy).x - texture(iChannel0, uv - delta.zy).x;
+    // z component controls bump intensity (smaller = stronger bumps)
+    return normalize(vec3(du, dv, 1.0));
+}
+```
+
+**Specular highlight effect**:
+```glsl
+// Produce specular edges via sampling offset
+float c2 = 1.0 - texture(iChannel0, uv + 0.5 / iResolution.xy).y;
+// c2*c2 - c*c is positive at gradient changes, producing edge highlights
+col += vec3(0.36, 0.73, 1.0) * max(c2 * c2 - c * c, 0.0) * 12.0;
+```
+
+**Vignette + gamma correction**:
+```glsl
+// Vignette: darken edges
+col *= pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.125) * 1.15;
+// Fade-in effect
+col *= smoothstep(0.0, 1.0, iTime / 2.0);
+// Gamma correction (approximately 2.0)
+fragColor = vec4(sqrt(min(col, 1.0)), 1.0);
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Conway's Game of Life (Discrete CA)
+
+**Difference from base version**: Uses discrete binary state and neighbor counting rules instead of continuous RD equations. This is the most classic cellular automaton, with simple rules that can give rise to extremely complex behavior (gliders, oscillators, still lifes, etc.).
+
+**Complete Buffer A code**:
+```glsl
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;  // wrap-around boundary
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 px = ivec2(fragCoord);
+
+    // Moore neighborhood counting
+    int k = cell(px+ivec2(-1,-1)) + cell(px+ivec2(0,-1)) + cell(px+ivec2(1,-1))
+          + cell(px+ivec2(-1, 0))                        + cell(px+ivec2(1, 0))
+          + cell(px+ivec2(-1, 1)) + cell(px+ivec2(0, 1)) + cell(px+ivec2(1, 1));
+    int e = cell(px);
+
+    // B3/S23 rule
+    float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+
+    // Initialization: approximately 10% random living cells
+    if (iFrame < 2) {
+        f = step(0.9, fract(sin(fragCoord.x * 13.0 + sin(fragCoord.y * 71.1)) * 138.5));
+    }
+
+    fragColor = vec4(f, 0.0, 0.0, 1.0);
+}
+```
+
+**Adjustment directions**:
+- Modifying B/S rule numbers can produce completely different behavior
+- Increasing initial density (changing the 0.9 in `step(0.9, ...)`) alters the evolution result
+- The .y channel can store "age" for color mapping during visualization
+
+### Variant 2: Configurable Rule Set CA (Birth/Survival Bitmask)
+
+**Difference from base version**: Uses bitmasks to encode arbitrary CA rules, supporting Moore/von Neumann/extended neighborhoods, capable of producing worms, sponges, explosions, and other patterns.
+
+**Bitmask encoding explanation**:
+- `BORN_SET = 8` is binary `0b1000`, meaning bit 3 is set → B3 (birth when 3 neighbors)
+- `STAY_SET = 12` is binary `0b1100`, meaning bits 2,3 are set → S23 (survive when 2 or 3 neighbors)
+- `LIVEVAL` controls the living cell's state value; when greater than 1, combined with `DECIMATE` it can produce gradient decay effects
+- `DECIMATE` is the per-frame decay amount, producing a "trailing" effect
+
+**Key code**:
+```glsl
+#define BORN_SET  8        // birth bitmask, 8 = B3 (bit 3 set)
+#define STAY_SET  12       // survival bitmask, 12 = S23 (bits 2,3 set)
+#define LIVEVAL   2.0      // living cell state value
+#define DECIMATE  1.0      // decay value (0=no decay)
+
+// Rule evaluation
+float ff = 0.0;
+float ev = texelFetch(iChannel0, px, 0).w;
+if (ev > 0.5) {
+    // Living cell: decay first, then check if survival rule is met
+    if (DECIMATE > 0.0) ff = ev - DECIMATE;
+    if ((STAY_SET & (1 << (k - 1))) > 0) ff = LIVEVAL;
+} else {
+    // Dead cell: check if birth rule is met
+    ff = ((BORN_SET & (1 << (k - 1))) > 0) ? LIVEVAL : 0.0;
+}
+```
+
+**Notable rule sets**:
+- B3/S23 (Conway Life): BORN=8, STAY=12
+- B36/S23 (HighLife): BORN=40, STAY=12 — has self-replicators
+- B1/S1 (Gnarl): BORN=2, STAY=2 — fractal growth
+- B3/S012345678 (Life without death): BORN=8, STAY=511 — only grows, never dies
+
+### Variant 3: Separable Gaussian Blur RD (Multi-Buffer Architecture)
+
+**Difference from base version**: Replaces the single 3x3 Laplacian with separable horizontal/vertical Gaussian blur for the diffusion step, achieving a larger effective diffusion radius with smoother patterns.
+
+**Architecture**:
+- Buffer A: Reaction step (reads Buffer C's blur result as diffusion term)
+- Buffer B: Horizontal Gaussian blur (reads Buffer A)
+- Buffer C: Vertical Gaussian blur (reads Buffer B)
+
+**Why separate**:
+- A direct NxN kernel requires N² samples
+- Separating into horizontal + vertical passes requires N samples each, 2N total
+- A 9-tap separable blur = 18 samples ≈ equivalent to an 81-point 9x9 kernel
+
+**Buffer B complete code (horizontal blur)**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float h = 1.0 / iResolution.x;
+    vec4 sum = vec4(0.0);
+    // 9-tap Gaussian weights (approximate normal distribution)
+    sum += texture(iChannel0, fract(vec2(uv.x - 4.0*h, uv.y))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x - 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x - 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x - 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x,         uv.y))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x + 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x + 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x + 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x + 4.0*h, uv.y))) * 0.05;
+    fragColor = vec4(sum.xyz / 0.98, 1.0);  // 0.98 = weight sum, normalized
+}
+```
+
+Buffer C has identical structure but blurs along the y-axis (replace `uv.x ± n*h` with `uv.y ± n*v`, where `v = 1.0/iResolution.y`).
+
+### Variant 4: Continuous Differential Operator CA (Vein/Fluid Style)
+
+**Difference from base version**: Computes curl, divergence, and Laplacian on the grid, combined with multi-step advection loops, producing vein/fluid-like organic patterns that sit between CA and PDE fluid simulation.
+
+**Core concepts**:
+- **Curl**: Describes the rotational tendency of a field, used to produce vortex effects
+- **Divergence**: Describes the spreading/converging tendency of a field
+- **Advection**: Propagates field values along the velocity field direction
+
+**Parameter tuning guide**:
+- `STEPS (10~60)`: Advection steps; more = smoother but slower
+- `ts (0.1~0.5)`: Advection rotation strength, controls vortex intensity
+- `cs (-3~-1)`: Curl scaling; negative values produce counter-clockwise rotation
+- `ls (0.01~0.1)`: Laplacian scaling, controls diffusion strength
+- `amp (0.5~2.0)`: Self-amplification coefficient
+- `upd (0.2~0.6)`: Update smoothing coefficient, controls old/new state blend ratio
+
+**Key code**:
+```glsl
+#define STEPS 40
+#define ts    0.2
+#define cs   -2.0
+#define ls    0.05
+#define amp   1.0
+#define upd   0.4
+
+// Discrete curl and divergence on a 3x3 stencil
+// Standard weights: _K0=-20/6 (center), _K1=4/6 (edge), _K2=1/6 (corner)
+curl = uv_n.x - uv_s.x - uv_e.y + uv_w.y
+     + _D * (uv_nw.x + uv_nw.y + uv_ne.x - uv_ne.y
+           + uv_sw.y - uv_sw.x - uv_se.y - uv_se.x);
+div  = uv_s.y - uv_n.y - uv_e.x + uv_w.x
+     + _D * (uv_nw.x - uv_nw.y - uv_ne.x - uv_ne.y
+           + uv_sw.x + uv_sw.y + uv_se.y - uv_se.x);
+
+// Multi-step advection loop
+for (int i = 0; i < STEPS; i++) {
+    advect(off, vUv, texel, curl, div, lapl, blur);
+    offd = rot(offd, ts * curl);  // rotate offset direction
+    off += offd;                   // accumulate offset
+    ab += blur / float(STEPS);    // accumulate blurred value
+}
+```
+
+### Variant 5: RD-Driven 3D Surface (Raymarched RD)
+
+**Difference from base version**: 2D RD results serve as a texture mapped onto a 3D sphere, driving surface displacement and color; the Image pass becomes a full raymarcher.
+
+**Implementation points**:
+1. Buffer A maintains the standard RD simulation unchanged
+2. Image pass becomes a raymarching renderer
+3. The SDF function maps 3D points to spherical UV, then samples the RD buffer
+4. RD values drive surface displacement
+
+**Key code**:
+```glsl
+// Image pass: use RD texture for displacement in the SDF
+vec2 map(in vec3 pos) {
+    vec3 p = normalize(pos);
+    vec2 uv;
+    // Spherical parameterization: 3D point → 2D UV
+    uv.x = 0.5 + atan(p.z, p.x) / (2.0 * 3.14159);  // longitude [0, 1]
+    uv.y = 0.5 - asin(p.y) / 3.14159;                 // latitude [0, 1]
+
+    float y = texture(iChannel0, uv).y;     // read v component from RD buffer
+    float displacement = 0.1 * y;            // displacement amount (adjustable scale factor)
+    float sd = length(pos) - (2.0 + displacement);  // base sphere SDF + displacement
+    return vec2(sd, y);  // return distance and material parameter
+}
+```
+
+**Extension directions**:
+- Replace the sphere with a torus, plane, or other base shapes
+- Use the two RD channels to separately drive displacement and color
+- Add normal perturbation for finer surface detail
+- Combine with environment maps for reflection/refraction
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### 1. texelFetch vs texture
+
+**Discrete CA** should use `texelFetch(iChannel0, ivec2(px), 0)` instead of `texture()`:
+- Avoids unnecessary texture filtering overhead
+- Guarantees pixel-precise reads without floating-point precision causing sampling of adjacent pixels
+- For binary states (0/1), any interpolation introduces errors
+
+**Continuous RD** can use `texture()` with linear filtering:
+- Hardware automatically performs bilinear interpolation
+- The interpolation effect is equivalent to additional smoothing/diffusion, which can be advantageous in some cases
+- Hardware-accelerated, faster than manual interpolation
+
+### 2. Separable Blur Instead of Large-Kernel Laplacian
+
+If a large diffusion radius is needed:
+- **Don't** use a larger NxN Laplacian kernel → O(N²) samples
+- **Do** use separable two-pass Gaussian blur (horizontal + vertical) → O(2N) samples
+- Implemented through additional buffer passes
+
+**Numerical comparison**:
+| Method | Equivalent Kernel Size | Sample Count |
+|------|-----------|---------|
+| 3x3 Laplacian | 3×3 | 9 |
+| 5x5 Laplacian | 5×5 | 25 |
+| 9x9 Laplacian | 9×9 | 81 |
+| Separable 9-tap Gaussian | ≈9×9 | 18 |
+| Separable 13-tap Gaussian | ≈13×13 | 26 |
+
+### 3. Multi-Step Sub-Iteration
+
+For RD, you can loop multiple sub-iterations within a single frame using smaller DT, improving convergence speed while maintaining stability:
+
+```glsl
+#define SUBSTEPS 4     // sub-iteration count
+#define SUB_DT 0.25    // = DT / SUBSTEPS
+for (int i = 0; i < SUBSTEPS; i++) {
+    vec2 lap = laplacian9(uv);
+    float uvv = u * v * v;
+    u += (DU * lap.x - uvv + F * (1.0 - u)) * SUB_DT;
+    v += (DV * lap.y + uvv - (F + K) * v) * SUB_DT;
+}
+```
+
+**Note**: In sub-iterations, the Laplacian is only correct when read from the texture on the first step; subsequent steps should recompute the Laplacian based on updated values. However, in practice, the approximation of single-read multi-step integration is often good enough.
+
+### 4. Reduced-Resolution Simulation
+
+If the target display resolution is high but the pattern's spatial frequency doesn't require 1:1 pixel precision:
+- Run the simulation at lower resolution in the buffer (not directly configurable in ShaderToy, but possible in custom engines)
+- Use bilinear interpolation upsampling in the Image pass
+- Can save 4x~16x computation
+
+### 5. Avoiding Branches and Conditionals
+
+Use `step()`, `mix()`, `clamp()` instead of `if/else` for CA rule evaluation to reduce GPU warp divergence:
+
+```glsl
+// Original if/else version:
+// if (k==3) f=1.0; else if (k==2 && e==1) f=1.0; else f=0.0;
+
+// Branch-free version:
+float f = max(step(abs(float(k) - 3.0), 0.5),
+              step(abs(float(k) - 2.0), 0.5) * step(0.5, float(e)));
+```
+
+**Explanation**:
+- `step(abs(float(k) - 3.0), 0.5)` is 1.0 when k=3, otherwise 0.0
+- `step(abs(float(k) - 2.0), 0.5) * step(0.5, float(e))` is 1.0 when k=2 and e=1
+- `max()` combines the two conditions
+
+---
+
+## Combination Suggestions — Full Details
+
+### 1. RD + Raymarching (3D Displacement/Shaping)
+
+Map RD results as a heightmap onto 3D surfaces (sphere, plane, torus) and create organic bumpy surfaces through SDF displacement. Suitable for biological organisms, alien terrain, and similar effects.
+
+**Complete Image pass example** (sphere + RD displacement):
+```glsl
+vec2 map(in vec3 pos) {
+    vec3 p = normalize(pos);
+    vec2 uv;
+    uv.x = 0.5 + atan(p.z, p.x) / (2.0 * 3.14159);
+    uv.y = 0.5 - asin(p.y) / 3.14159;
+    float y = texture(iChannel0, uv).y;
+    float displacement = 0.1 * y;
+    float sd = length(pos) - (2.0 + displacement);
+    return vec2(sd, y);
+}
+
+// Use map() in the raymarch loop
+// Normals computed via central difference of map()
+// Material color based on y value returned by map() for color mapping
+```
+
+### 2. CA/RD + Particle Systems
+
+Use CA/RD fields as velocity fields or spawn probability fields for particles:
+- Particles flow along RD gradients
+- New particles spawn at living CA cells
+- Produces "living" particle effects
+
+**Implementation approach**:
+- Buffer A: RD/CA simulation
+- Buffer B: Particle position storage (each pixel stores one particle's position)
+- Image: Visualize particles and/or fields
+
+### 3. RD + Post-Processing Lighting
+
+In the Image pass, compute normals from RD values → bump mapping → lighting/reflection/refraction. Combined with environment maps (cubemaps), this can produce etched metal surfaces, liquid ripples, and similar effects.
+
+**Key techniques**:
+- Compute gradients from RD scalar field to get normals
+- Use Phong/Blinn-Phong lighting model
+- Normals used to sample cubemaps for environment reflections
+- Multiple color mapping schemes increase visual richness
+
+### 4. CA + Color Decay Trails
+
+Living cells use high values; after death, values decay each frame (instead of immediately dropping to zero), with different decay rates in RGB channels producing colorful trailing effects. This is the core technique of the Automata X Showcase.
+
+**Implementation code example**:
+```glsl
+// Add decay logic after CA update
+vec4 prev = texelFetch(iChannel0, px, 0);
+if (f > 0.5) {
+    // Living cell: set to high value
+    fragColor = vec4(1.0, 1.0, 1.0, 1.0);
+} else {
+    // Dead cell: different decay rates per channel
+    fragColor = vec4(
+        prev.x * 0.99,   // R decays slowly → longest red trail
+        prev.y * 0.95,   // G decays moderately
+        prev.z * 0.90,   // B decays fast → shortest blue trail
+        1.0
+    );
+}
+```
+
+### 5. RD + Domain Warping
+
+Apply vortex warp or spiral zoom domain transforms to the RD sampling UV before computing, causing the diffusion field itself to be distorted, producing spiral and vortex-like organic patterns. Flexi's Expansive RD uses this technique.
+
+**Implementation code example**:
+```glsl
+// Apply domain transform to UV before RD update
+vec2 warpedUV = uv;
+// Vortex warp
+float angle = length(uv - 0.5) * 3.14159 * 2.0;
+float s = sin(angle * 0.1);
+float c = cos(angle * 0.1);
+warpedUV = (warpedUV - 0.5) * mat2(c, -s, s, c) + 0.5;
+
+// Sample state using transformed UV
+vec2 state = texture(iChannel0, warpedUV).xy;
+// Then proceed with normal RD computation...
+```
--- a/skills/shader-dev/reference/color-palette.md
+++ b/skills/shader-dev/reference/color-palette.md
@@ -0,0 +1,481 @@
+# Color Palette & Color Space Techniques - Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax: `vec3`, `mix`, `clamp`, `smoothstep`, `fract`, `mod`
+- Basic properties of trigonometric functions `cos`/`sin` (periodicity, range [-1, 1])
+- Color space fundamentals: RGB is a cube, HSV/HSL is cylindrical coordinates, Lab/Lch is a perceptually uniform space
+- Gamma correction concept: monitors store sRGB (nonlinear), shading computations should be performed in linear space
+
+## Step-by-Step Tutorial
+
+### Step 1: Cosine Palette Function
+
+**What**: Implement the most fundamental and commonly used procedural palette function
+
+**Why**: Only 4 vec3 parameters are needed to generate infinite smooth color ramps, with extremely low computational cost (a single cos operation). This function is widely used in the ShaderToy community and is the cornerstone of procedural coloring.
+
+**Mathematical Derivation**:
+```
+color(t) = a + b * cos(2pi * (c * t + d))
+```
+
+- **a** = brightness offset (center luminance of the color ramp), typically ~0.5
+- **b** = amplitude (color contrast), typically ~0.5
+- **c** = frequency (how many times each channel oscillates), vec3(1,1,1) means R/G/B each oscillate once
+- **d** = phase offset (hue starting position per channel), this is the key parameter controlling color style
+
+When a=b=0.5, c=(1,1,1), changing d alone generates completely different color ramps like rainbow, warm tones, cool tones, etc.
+
+**Code**:
+```glsl
+// Cosine Palette
+// a: offset/center color, b: amplitude, c: frequency, d: phase
+// t: input scalar, typically [0,1] but can exceed this range
+vec3 palette(float t, vec3 a, vec3 b, vec3 c, vec3 d) {
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 2: Classic Parameter Presets
+
+**What**: Provide ready-to-use palette parameters
+
+**Why**: The original demo showcases 7 classic parameter combinations, covering common needs like rainbow, warm, cool, and duotone schemes. Memorizing a few parameter sets enables rapid color adjustment.
+
+**Code**:
+```glsl
+// Rainbow color ramp (classic)
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,1,1) d=(0.0, 0.33, 0.67)
+
+// Warm gradient
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,1,1) d=(0.0, 0.10, 0.20)
+
+// Blue-purple to orange tones
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,0.7,0.4) d=(0.0, 0.15, 0.20)
+
+// Custom warm-cool mix
+// a=(.8,.5,.4) b=(.2,.4,.2) c=(2,1,1) d=(0.0, 0.25, 0.25)
+
+// Simplified version: fix a/b/c, just adjust d
+vec3 palette(float t) {
+    vec3 a = vec3(0.5, 0.5, 0.5);
+    vec3 b = vec3(0.5, 0.5, 0.5);
+    vec3 c = vec3(1.0, 1.0, 1.0);
+    vec3 d = vec3(0.263, 0.416, 0.557);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 3: HSV to RGB Conversion (Standard + Smooth)
+
+**What**: Implement branchless HSV to RGB conversion and its cubic smooth variant
+
+**Why**: HSV space is ideal for rotating by hue, scaling by saturation/value. The standard implementation has C0 discontinuity (piecewise linear); the smooth version achieves C1 continuity through Hermite interpolation, producing smoother hue animation.
+
+**Principle**: Using vectorized `mod` + `abs` + `clamp` operations avoids if/else branching:
+
+```
+rgb = clamp(abs(mod(H*6 + vec3(0,4,2), 6) - 3) - 1, 0, 1)
+```
+
+This essentially uses piecewise linear functions to model R/G/B channel variation with hue H. C1 discontinuity can be eliminated via cubic smoothing `rgb*rgb*(3-2*rgb)`.
+
+**Code**:
+```glsl
+// Standard HSV -> RGB (branchless)
+// c.x = Hue [0,1], c.y = Saturation [0,1], c.z = Value [0,1]
+vec3 hsv2rgb(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+
+// Smooth HSV -> RGB (C1 continuous)
+vec3 hsv2rgb_smooth(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    rgb = rgb * rgb * (3.0 - 2.0 * rgb); // Cubic Hermite smoothing
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+```
+
+### Step 4: HSL to RGB Conversion
+
+**What**: Implement HSL color space conversion
+
+**Why**: HSL is more intuitive than HSV — L=0 is black, L=1 is white, L=0.5 is pure color. Suitable for scenarios requiring control over "lightness" rather than "value" (e.g., mapping iteration counts to hue in data visualization).
+
+**Code**:
+```glsl
+// Hue -> RGB base color (branchless)
+vec3 hue2rgb(float h) {
+    return clamp(abs(mod(h * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+}
+
+// HSL -> RGB
+// h: Hue [0,1], s: Saturation [0,1], l: Lightness [0,1]
+vec3 hsl2rgb(float h, float s, float l) {
+    vec3 rgb = hue2rgb(h);
+    return l + s * (rgb - 0.5) * (1.0 - abs(2.0 * l - 1.0));
+}
+```
+
+### Step 5: Bidirectional RGB <-> HSV Conversion
+
+**What**: Implement the reverse conversion from RGB back to HSV
+
+**Why**: When blending colors in HSV space, you need to first convert both endpoint colors from RGB to HSV, interpolate, then convert back. RGB to HSV uses a classic branchless implementation.
+
+**Code**:
+```glsl
+// RGB -> HSV (branchless method)
+vec3 rgb2hsv(vec3 c) {
+    vec4 K = vec4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0);
+    vec4 p = mix(vec4(c.bg, K.wz), vec4(c.gb, K.xy), step(c.b, c.g));
+    vec4 q = mix(vec4(p.xyw, c.r), vec4(c.r, p.yzx), step(p.x, c.r));
+    float d = q.x - min(q.w, q.y);
+    float e = 1.0e-10;
+    return vec3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), d / (q.x + e), q.x);
+}
+```
+
+### Step 6: CIE Lab/Lch Perceptually Uniform Interpolation
+
+**What**: Implement the complete RGB <-> Lab <-> Lch conversion pipeline
+
+**Why**: Linear interpolation in RGB and HSV spaces is not perceptually uniform — the human eye is more sensitive to green than red. Interpolation in Lch (Lightness-Chroma-Hue) space produces the most visually natural gradients, especially suitable for UI color schemes and artistic gradients.
+
+**Mathematical Derivation**: The conversion pipeline is RGB -> XYZ (via sRGB D65 matrix) -> Lab (via nonlinear mapping) -> Lch (via converting a,b to polar coordinates: Chroma, Hue). The inverse process reverses each step.
+
+**Code**:
+```glsl
+// Helper function: XYZ nonlinear mapping
+float xyzF(float t) { return mix(pow(t, 1.0/3.0), 7.787037 * t + 0.139731, step(t, 0.00885645)); }
+float xyzR(float t) { return mix(t * t * t, 0.1284185 * (t - 0.139731), step(t, 0.20689655)); }
+
+// RGB -> Lch (via XYZ -> Lab -> polar coordinates)
+vec3 rgb2lch(vec3 c) {
+    // RGB -> XYZ (sRGB D65 matrix)
+    c *= mat3(0.4124, 0.3576, 0.1805,
+              0.2126, 0.7152, 0.0722,
+              0.0193, 0.1192, 0.9505);
+    // XYZ -> Lab
+    c = vec3(xyzF(c.x), xyzF(c.y), xyzF(c.z));
+    vec3 lab = vec3(max(0.0, 116.0 * c.y - 16.0),
+                    500.0 * (c.x - c.y),
+                    200.0 * (c.y - c.z));
+    // Lab -> Lch (convert a,b to polar: Chroma, Hue)
+    return vec3(lab.x, length(lab.yz), atan(lab.z, lab.y));
+}
+
+// Lch -> RGB (inverse process)
+vec3 lch2rgb(vec3 c) {
+    // Lch -> Lab
+    c = vec3(c.x, cos(c.z) * c.y, sin(c.z) * c.y);
+    // Lab -> XYZ
+    float lg = (1.0 / 116.0) * (c.x + 16.0);
+    vec3 xyz = vec3(xyzR(lg + 0.002 * c.y),
+                    xyzR(lg),
+                    xyzR(lg - 0.005 * c.z));
+    // XYZ -> RGB (inverse matrix)
+    return xyz * mat3( 3.2406, -1.5372, -0.4986,
+                      -0.9689,  1.8758,  0.0415,
+                       0.0557, -0.2040,  1.0570);
+}
+
+// Circular hue interpolation (avoids 0/360 degree wraparound jump)
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), 6.28318) + 9.42477, 6.28318) - 3.14159;
+    return ang * x + b;
+}
+
+// Lch space linear interpolation
+vec3 lerpLch(vec3 a, vec3 b, float x) {
+    return vec3(mix(b.xy, a.xy, x), lerpAngle(a.z, b.z, x));
+}
+```
+
+### Step 7: sRGB Gamma and Linear Space Workflow
+
+**What**: Implement correct sRGB encode/decode functions and a complete linear-space pipeline
+
+**Why**: All lighting/blending computations must be performed in linear space. sRGB textures need to be decoded first (pow 2.2 or exact piecewise function), then encoded back to sRGB after computation. Ignoring this step causes colors to appear too dark and unnatural blending.
+
+**Complete Pipeline**: sRGB texture decode -> linear space shading/blending -> Reinhard tonemap -> sRGB encode
+
+**Code**:
+```glsl
+// Exact sRGB encode (linear -> sRGB)
+float sRGB_encode(float t) {
+    return mix(1.055 * pow(t, 1.0/2.4) - 0.055, 12.92 * t, step(t, 0.0031308));
+}
+vec3 sRGB_encode(vec3 c) {
+    return vec3(sRGB_encode(c.x), sRGB_encode(c.y), sRGB_encode(c.z));
+}
+
+// Fast approximation (sufficient for most scenarios)
+// Decode: pow(color, vec3(2.2))
+// Encode: pow(color, vec3(1.0/2.2))
+
+// Reinhard tone mapping (maps HDR values to [0,1])
+vec3 tonemap_reinhard(vec3 col) {
+    return col / (1.0 + col);
+}
+```
+
+### Step 8: Blackbody Radiation Palette
+
+**What**: Implement a physics-based temperature-to-color mapping
+
+**Why**: Used for fire, lava, stars, hot metal, and other scenarios requiring physically realistic emission colors. More believable than manual color tuning, with intuitive parameterization (input is just temperature).
+
+**Mathematical Derivation**: Maps temperature T to CIE chromaticity coordinates (cx, cy) via Planck locus approximation, then converts to XYZ -> RGB, combined with Stefan-Boltzmann law (T^4) brightness scaling to produce physically realistic emission colors.
+
+**Code**:
+```glsl
+// Blackbody radiation palette
+// t: normalized temperature [0,1], internally mapped to [0, TEMP_MAX] Kelvin
+#define TEMP_MAX 4000.0 // Tunable: maximum temperature (K), affects color gamut width
+vec3 blackbodyPalette(float t) {
+    t *= TEMP_MAX;
+    // Planck locus approximation on CIE chromaticity diagram
+    float cx = (0.860117757 + 1.54118254e-4 * t + 1.28641212e-7 * t * t)
+             / (1.0 + 8.42420235e-4 * t + 7.08145163e-7 * t * t);
+    float cy = (0.317398726 + 4.22806245e-5 * t + 4.20481691e-8 * t * t)
+             / (1.0 - 2.89741816e-5 * t + 1.61456053e-7 * t * t);
+    // CIE chromaticity coordinates -> XYZ tristimulus values
+    float d = 2.0 * cx - 8.0 * cy + 4.0;
+    vec3 XYZ = vec3(3.0 * cx / d, 2.0 * cy / d, 1.0 - (3.0 * cx + 2.0 * cy) / d);
+    // XYZ -> sRGB matrix
+    vec3 RGB = mat3(3.240479, -0.969256, 0.055648,
+                   -1.537150,  1.875992, -0.204043,
+                   -0.498535,  0.041556,  1.057311) * vec3(XYZ.x / XYZ.y, 1.0, XYZ.z / XYZ.y);
+    // Stefan-Boltzmann brightness scaling (T^4)
+    return max(RGB, 0.0) * pow(t * 0.0004, 4.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Multi-Harmonic Cosine Palette (Anti-Aliased)
+
+**Difference from base version**: Extends the single cos to 9 layers of different frequencies for richer color detail; uses `fwidth()` for band-limited filtering to prevent high-frequency aliasing.
+
+**Principle**: `fwidth()` returns the variation across adjacent pixels. When oscillation frequency exceeds pixel resolution (i.e., w approaches or exceeds one full TAU period), `smoothstep` attenuates the cos contribution to 0, achieving approximate sinc filtering.
+
+**Complete code**:
+```glsl
+// Band-limited cos: automatically attenuates when oscillation frequency exceeds pixel resolution
+vec3 fcos(vec3 x) {
+    vec3 w = fwidth(x);
+    return cos(x) * smoothstep(TAU, 0.0, w); // Approximate sinc filtering
+}
+
+// 9-layer stacked palette
+vec3 getColor(float t) {
+    vec3 col = vec3(0.4);
+    col += 0.12 * fcos(TAU * t *   1.0 + vec3(0.0, 0.8, 1.1));
+    col += 0.11 * fcos(TAU * t *   3.1 + vec3(0.3, 0.4, 0.1));
+    col += 0.10 * fcos(TAU * t *   5.1 + vec3(0.1, 0.7, 1.1));
+    col += 0.09 * fcos(TAU * t *   9.1 + vec3(0.2, 0.8, 1.4));
+    col += 0.08 * fcos(TAU * t *  17.1 + vec3(0.2, 0.6, 0.7));
+    col += 0.07 * fcos(TAU * t *  31.1 + vec3(0.1, 0.6, 0.7));
+    col += 0.06 * fcos(TAU * t *  65.1 + vec3(0.0, 0.5, 0.8));
+    col += 0.06 * fcos(TAU * t * 115.1 + vec3(0.1, 0.4, 0.7));
+    col += 0.09 * fcos(TAU * t * 265.1 + vec3(1.1, 1.4, 2.7));
+    return col;
+}
+```
+
+### Variant 2: Hash-Driven Per-Tile Color Variation
+
+**Difference from base version**: Uses a hash function to generate a unique ID for each grid/tile, feeding the ID as the palette's t value to achieve "same palette but different color per tile".
+
+**Use cases**: Procedural tiles/brickwork/mosaics, Voronoi cell coloring, building facades.
+
+**Complete code**:
+```glsl
+// Hash function (sin-free version, avoids precision issues)
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// Usage in tile coloring
+vec2 tileId = floor(uv);
+vec3 tileColor = palette(hash12(tileId)); // Different color per tile
+```
+
+### Variant 3: Saturation-Preserving Improved RGB Interpolation
+
+**Difference from base version**: Detects saturation decay during RGB space interpolation and displaces colors away from the gray diagonal, achieving approximate perceptually uniform interpolation at very low cost (~15 instructions).
+
+**Principle**:
+1. Compute RGB linear interpolation result `ic`
+2. Compute the difference between expected saturation `mix(getsat(a), getsat(b), x)` and actual saturation `getsat(ic)`
+3. Find the direction away from the gray diagonal `dir`
+4. Compensate saturation loss along that direction
+
+**Complete code**:
+```glsl
+float getsat(vec3 c) {
+    float mi = min(min(c.x, c.y), c.z);
+    float ma = max(max(c.x, c.y), c.z);
+    return (ma - mi) / (ma + 1e-7);
+}
+
+vec3 iLerp(vec3 a, vec3 b, float x) {
+    vec3 ic = mix(a, b, x) + vec3(1e-6, 0.0, 0.0);
+    float sd = abs(getsat(ic) - mix(getsat(a), getsat(b), x));
+    vec3 dir = normalize(vec3(2.0*ic.x - ic.y - ic.z,
+                              2.0*ic.y - ic.x - ic.z,
+                              2.0*ic.z - ic.y - ic.x));
+    float lgt = dot(vec3(1.0), ic);
+    float ff = dot(dir, normalize(ic));
+    ic += 1.5 * dir * sd * ff * lgt; // 1.5 = DSP_STR, tunable
+    return clamp(ic, 0.0, 1.0);
+}
+```
+
+### Variant 4: Circular Hue Interpolation (HSV/Lch Space)
+
+**Difference from base version**: When interpolating in color spaces with a circular hue dimension, the hue wraparound from 0.9 to 0.1 crossing through 1.0/0.0 must be handled, otherwise interpolation takes the "long way" (e.g., red -> magenta -> blue -> cyan -> green -> yellow -> red instead of directly red -> orange -> yellow).
+
+**Complete code**:
+```glsl
+// HSV space circular hue interpolation (hue range [0,1])
+vec3 lerpHSV(vec3 a, vec3 b, float x) {
+    float hue = (mod(mod((b.x - a.x), 1.0) + 1.5, 1.0) - 0.5) * x + a.x;
+    return vec3(hue, mix(a.yz, b.yz, x));
+}
+
+// Lch space circular hue interpolation (hue range [0, 2pi])
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), TAU) + PI * 3.0, TAU) - PI;
+    return ang * x + b;
+}
+```
+
+### Variant 5: Additive Color Stacking (Glow/HDR Effects)
+
+**Difference from base version**: Instead of selecting a single color, additively stack palette colors from multiple iterations, producing natural HDR glow effects. Requires tone mapping.
+
+**Use cases**: Fractal glow, halos, laser effects, particle systems, volumetric light.
+
+**Complete code**:
+```glsl
+vec3 finalColor = vec3(0.0);
+for (int i = 0; i < 4; i++) {
+    vec3 col = palette(length(uv) + float(i) * 0.4 + iTime * 0.4);
+    float glow = pow(0.01 / abs(sdfValue), 1.2); // Inverse-distance glow
+    finalColor += col * glow; // Additive stacking, naturally produces HDR
+}
+finalColor = finalColor / (1.0 + finalColor); // Reinhard tonemap
+```
+
+## Performance Optimization Details
+
+### 1. Branchless HSV/HSL Conversion
+Use vectorized `mod`/`abs`/`clamp` operations instead of if-else. All implementations above are already branchless. Branching is expensive on GPUs (especially divergent branches within a warp/wavefront); branchless versions ensure all threads follow the same execution path.
+
+### 2. Band-Limited Filtering for Multi-Harmonic Palettes
+High-frequency cos layers produce Moire patterns at distance or small angles. Using `fwidth()` + `smoothstep` for automatic attenuation costs only ~2 extra instructions to eliminate aliasing. `fwidth()` leverages hardware partial derivative computation at nearly zero cost.
+
+### 3. Lch Pipeline Cost Analysis
+The complete RGB -> XYZ -> Lab -> Lch pipeline requires ~57 instructions, including matrix multiplication, pow, atan, etc. If you only need "slightly better than RGB" interpolation, use `iLerp` (improved RGB, ~15 instructions) instead of the full Lch pipeline for an excellent quality/performance ratio.
+
+### 4. sRGB Gamma Approximation
+The exact piecewise linear sRGB conversion requires branching. In most visual scenarios, `pow(c, 2.2)` / `pow(c, 1.0/2.2)` is sufficiently accurate (error < 0.4%) and allows better compiler optimization. The exact version uses `mix` + `step` for branchless implementation but costs a few extra instructions.
+
+### 5. Cosine Palette Vectorization
+`a + b * cos(TAU*(c*t+d))` compiles to 1 MAD + 1 COS + 1 MAD on the GPU, approximately 3-4 clock cycles, extremely efficient. All three channels (R/G/B) execute in parallel via SIMD.
+
+### 6. Texture sRGB Decoding
+If texture data is already stored as sRGB, use `pow(texture(...).rgb, vec3(2.2))` to decode to linear space before computation, avoiding color distortion from lighting in nonlinear space. In OpenGL/Vulkan, you can also use the `GL_SRGB8_ALPHA8` format for automatic hardware decoding.
+
+## Combination Suggestions in Detail
+
+### 1. Cosine Palette + SDF Raymarching
+The most classic combination. Use the normal direction, distance, or surface attributes of ray march hit points as palette t input, producing rich surface coloring.
+
+**Example**:
+```glsl
+// After SDF raymarching hit
+vec3 nor = calcNormal(pos);
+float t_palette = dot(nor, vec3(0.0, 1.0, 0.0)) * 0.5 + 0.5; // Normal y-component mapped to [0,1]
+vec3 col = palette(t_palette + iTime * 0.1);
+```
+
+### 2. HSL/HSV + Data Visualization
+Map iteration counts, distance values, or gradient directions to hue (H), encoding other dimensions via saturation/lightness. E.g., using different hues to mark each step in SDF trace visualization.
+
+**Example**:
+```glsl
+// Mandelbrot iteration coloring
+float h = float(iterations) / float(maxIterations);
+vec3 col = hsl2rgb(h, 0.8, 0.5);
+```
+
+### 3. Cosine Palette + Fractals/Noise
+Use `length(uv)` or `fbm(p)` output plus `iTime` as t, combined with additive stacking and inverse-distance glow, producing psychedelic dynamic color effects.
+
+**Example**:
+```glsl
+float n = fbm(uv * 3.0 + iTime * 0.2);
+vec3 col = palette(n + length(uv) * 0.5);
+```
+
+### 4. Blackbody Palette + Volume Rendering/Fire
+Map a temperature field (noise-driven or physically simulated) through `blackbodyPalette()` to color, producing physically plausible fire, lava, and stellar effects.
+
+**Example**:
+```glsl
+// In fire volume rendering
+float temperature = fbm(pos * 2.0 - vec3(0, iTime, 0)); // Noise-driven temperature field
+vec3 fireColor = blackbodyPalette(temperature);
+fireColor = tonemap_reinhard(fireColor); // HDR -> LDR
+```
+
+### 5. Linear Space Workflow + Any Palette Technique
+Regardless of which palette method is used, always follow: sRGB texture decode -> linear space shading/blending -> Reinhard tonemap -> sRGB encode as the complete pipeline, ensuring physically correct color computation.
+
+**Complete pipeline example**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // 1. Decode sRGB texture to linear space
+    vec3 texColor = pow(texture(iChannel0, uv).rgb, vec3(2.2));
+
+    // 2. Perform all shading computations in linear space
+    vec3 col = texColor * lighting;
+    col += palette(t) * emission;
+
+    // 3. Tone mapping (HDR -> LDR)
+    col = col / (1.0 + col);
+
+    // 4. sRGB encode
+    col = pow(col, vec3(1.0/2.2));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+### 6. Hash + Palette + Tiling System
+In procedural tiles/brickwork/mosaics, use `hash(tileID)` as palette input so each tile has a different color while maintaining an overall coordinated color scheme.
+
+**Complete example**:
+```glsl
+vec2 tileUV = fract(uv * 10.0);
+vec2 tileID = floor(uv * 10.0);
+
+// Base color per tile
+float h = hash12(tileID);
+vec3 tileColor = palette(h);
+
+// Internal tile pattern (e.g., circle)
+float d = length(tileUV - 0.5);
+float mask = smoothstep(0.4, 0.38, d);
+
+vec3 col = mix(vec3(0.05), tileColor, mask);
+```
--- a/skills/shader-dev/reference/csg-boolean-operations.md
+++ b/skills/shader-dev/reference/csg-boolean-operations.md
@@ -0,0 +1,466 @@
+# CSG Boolean Operations — Detailed Reference
+
+This document is a complete reference manual for [SKILL.md](SKILL.md), including step-by-step tutorials, mathematical derivations, variant details, and advanced usage.
+
+## Use Cases
+
+- **Geometric Modeling**: Build complex shapes from simple primitives (spheres, boxes, cylinders) through boolean combinations — nuts, buildings, mechanical parts, organic characters, etc.
+- **Ray Marching Scenes**: All SDF-based ray marching rendering relies on CSG to compose scenes
+- **Organic Forms**: Use smooth variants (smin/smax) to create natural transitions between shapes, suitable for character modeling (snails, elephants), clouds, terrain, etc.
+- **Architectural / Industrial Design**: Use subtraction to carve windows and doorways, intersection to cut shapes
+- **2D SDF Compositing**: Equally applicable to 2D scenes (cyberpunk clouds, UI shape compositing, etc.)
+
+## Prerequisites
+
+- GLSL basic syntax (`vec3`, `float`, `mix`, `clamp`, `min`, `max`)
+- SDF (Signed Distance Field) concept: the signed distance from each point in space to the nearest surface, with negative values indicating the interior
+- Basic SDF primitives: sphere `length(p) - r`, box `length(max(abs(p)-b, 0.0))`
+- Ray Marching basics: stepping from the camera along the view direction, using SDF values to determine step size
+
+## Core Principles in Detail
+
+The essence of CSG boolean operations is **per-point value operations on two distance fields**:
+
+| Operation | Math Expression | Meaning |
+|-----------|----------------|---------|
+| Union | `min(d1, d2)` | Take the nearest surface, keeping both shapes |
+| Intersection | `max(d1, d2)` | Take the farthest surface, keeping only the overlap |
+| Subtraction | `max(d1, -d2)` | Use d2's interior (negated) to cut d1 |
+
+**Hard booleans** produce sharp edges at the junction. **Smooth booleans** (smooth min/max) introduce a blend band in the transition region, "fusing" the two shapes together. The key parameter `k` controls the blend band width:
+
+- Larger `k` means wider, smoother transitions
+- Smaller `k` means closer to hard boolean sharp edges
+- `k = 0` degenerates to hard boolean
+
+Three mainstream smooth formulas, each with distinct characteristics:
+1. **Polynomial**: Most commonly used, fast to compute, natural transitions
+2. **Quadratic optimized**: More compact and mathematically elegant
+3. **Exponential**: Smoothest transitions but more expensive to compute
+
+## Implementation Steps in Detail
+
+### Step 1: Hard Boolean Operations
+
+**What**: Implement the three basic boolean operations — union, intersection, subtraction.
+
+**Why**: These are the foundation of all CSG operations. `min` selects the nearest surface to achieve union; `max` selects the farthest surface for intersection; negating the second operand and taking `max` with the first achieves subtraction (keeping the region of d1 that is not inside d2).
+
+```glsl
+// Union: keep both shapes
+float opUnion(float d1, float d2) {
+    return min(d1, d2);
+}
+
+// Intersection: keep only the overlapping region
+float opIntersection(float d1, float d2) {
+    return max(d1, d2);
+}
+
+// Subtraction: carve d2 out of d1
+float opSubtraction(float d1, float d2) {
+    return max(d1, -d2);
+}
+```
+
+### Step 2: Smooth Union — Polynomial Version
+
+**What**: Implement a union operation with a blend transition, producing rounded junctions between two shapes.
+
+**Why**: Hard `min` produces C0 continuity (sharp creases) at the SDF junction. Polynomial smooth min interpolates within the transition band where `|d1-d2| < k`, producing C1 continuity (smooth transitions). In the formula, `h` is the normalized blend factor, and the `k*h*(1-h)` term ensures the distance field correctly dips in the transition region (producing more accurate distance values than plain `mix`).
+
+```glsl
+// Polynomial smooth union
+// k: blend radius, typical values 0.05~0.5
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+```
+
+### Step 3: Smooth Subtraction and Smooth Intersection — Polynomial Version
+
+**What**: Extend the smooth union approach to subtraction and intersection.
+
+**Why**: Subtraction = intersection with an inverted SDF; intersection = inverted union of inverted inputs. The sign changes in the formulas reflect this duality. Note that subtraction uses `d2+d1` (not `d2-d1`), because d1 is negated in the operation.
+
+```glsl
+// Smooth subtraction: smoothly carve d2 out of d1
+float opSmoothSubtraction(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 + d1) / k, 0.0, 1.0);
+    return mix(d2, -d1, h) + k * h * (1.0 - h);
+}
+
+// Smooth intersection: smoothly keep the overlapping region
+float opSmoothIntersection(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) + k * h * (1.0 - h);
+}
+```
+
+### Step 4: Quadratic Optimized Smooth Operations
+
+**What**: Implement smin/smax using a more compact quadratic polynomial formula.
+
+**Why**: This version is mathematically equivalent but more concise with fewer branches. `h = max(k - abs(a-b), 0.0)` directly computes the influence within the transition band, being non-zero only when `|a-b| < k`. `h*h*0.25/k` is the quadratic correction term. smax can be derived directly through smin's duality: `smax(a,b,k) = -smin(-a,-b,k)`.
+
+```glsl
+// Quadratic optimized smooth union
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+// Quadratic optimized smooth intersection / smooth max
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// Subtraction via smax
+float sSub(float d1, float d2, float k) {
+    return smax(d1, -d2, k);
+}
+```
+
+### Step 5: Basic SDF Primitives
+
+**What**: Define the basic shape SDFs used for combination.
+
+**Why**: CSG needs operands. Spheres and boxes are the most common primitives; cylinders are often used for drilling holes.
+
+```glsl
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return length(max(d, 0.0)) + min(max(d.x, max(d.y, d.z)), 0.0);
+}
+
+float sdCylinder(vec3 p, float h, float r) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - vec2(r, h);
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+```
+
+### Step 6: CSG Combination for Scene Construction
+
+**What**: Combine primitives with boolean operations to build complex geometry.
+
+**Why**: The power of CSG lies in combination. Classic example: intersecting a sphere with a cube yields a rounded cube, then subtracting three cylinders produces a nut shape.
+
+```glsl
+float mapScene(vec3 p) {
+    // Primitives
+    float cube = sdBox(p, vec3(1.0));
+    float sphere = sdSphere(p, 1.2);
+    float cylX = sdCylinder(p.yzx, 2.0, 0.4); // Along X axis
+    float cylY = sdCylinder(p.xyz, 2.0, 0.4); // Along Y axis
+    float cylZ = sdCylinder(p.zxy, 2.0, 0.4); // Along Z axis
+
+    // CSG combination: (Cube ∩ Sphere) - three cylinders
+    float shape = opIntersection(cube, sphere);
+    float holes = opUnion(cylX, opUnion(cylY, cylZ));
+    return opSubtraction(shape, holes);
+}
+```
+
+### Step 7: Organic Body Modeling with Smooth CSG
+
+**What**: Use smin/smax with different k values to blend multiple ellipsoids/capsules into organic characters.
+
+**Why**: Different body parts need different blend amounts — large k values for broad connections (torso-legs), small k values for fine details (eyes-head). This is the core technique for organic character modeling with smooth CSG.
+
+```glsl
+float mapCreature(vec3 p) {
+    // Torso
+    float body = sdSphere(p, 0.5);
+
+    // Head — larger blend radius
+    float head = sdSphere(p - vec3(0.0, 0.6, 0.3), 0.25);
+    float d = smin(body, head, 0.15);
+
+    // Limbs — medium blend radius
+    float leg = sdCylinder(p - vec3(0.2, -0.5, 0.0), 0.3, 0.08);
+    d = smin(d, leg, 0.08);
+
+    // Eye sockets — small blend radius for smooth subtraction
+    float eye = sdSphere(p - vec3(0.05, 0.75, 0.4), 0.05);
+    d = smax(d, -eye, 0.02);
+
+    return d;
+}
+```
+
+### Step 8: Ray Marching Main Loop
+
+**What**: Render the SDF scene using the sphere tracing algorithm.
+
+**Why**: SDF scenes cannot be rendered with traditional rasterization. Ray Marching is needed: cast a ray from each pixel, advance by the current point's distance to the nearest surface (i.e., the SDF value) at each step, until close enough to a surface or out of range.
+
+```glsl
+float rayMarch(vec3 ro, vec3 rd, float maxDist) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = mapScene(p);
+        if (d < SURF_DIST) return t;
+        t += d;
+        if (t > maxDist) break;
+    }
+    return -1.0; // No hit
+}
+```
+
+### Step 9: Normal Computation and Lighting
+
+**What**: Compute the surface normal by taking the finite-difference gradient of the SDF, then apply lighting.
+
+**Why**: The gradient direction of the SDF is the surface normal direction. Using tetrahedral sampling only requires 4 SDF samples, which is more efficient than the 6 needed for central differences.
+
+```glsl
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(0.001, -0.001);
+    return normalize(
+        e.xyy * mapScene(pos + e.xyy) +
+        e.yyx * mapScene(pos + e.yyx) +
+        e.yxy * mapScene(pos + e.yxy) +
+        e.xxx * mapScene(pos + e.xxx)
+    );
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Polynomial Smooth Union (Most Universal Version)
+
+Differs from the basic (quadratic optimized) version by using the `clamp + mix` form, which makes the code intent more intuitive. Mathematically approximately equivalent to the quadratic version, but with slight differences in the transition curve in extreme cases.
+
+```glsl
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+
+float opSmoothSubtraction(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 + d1) / k, 0.0, 1.0);
+    return mix(d2, -d1, h) + k * h * (1.0 - h);
+}
+
+float opSmoothIntersection(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) + k * h * (1.0 - h);
+}
+```
+
+### Variant 2: Exponential Smooth Union
+
+**Difference from the basic version**: Uses `exp` for implementation, with smoother transitions (C-infinity continuity vs polynomial's C1). However, `exp` is more expensive. Suitable for terrain modeling (e.g., craters). The parameter `k` has a different meaning — in the exponential version, larger `k` produces sharper transitions (opposite to polynomial). Used in RME4-Crater for volcano terrain blending.
+
+```glsl
+float sminExp(float a, float b, float k) {
+    float res = exp(-k * a) + exp(-k * b);
+    return -log(res) / k;
+}
+```
+
+### Variant 3: Smooth Operations with Color Blending
+
+**Difference from the basic version**: Blends material colors using the same blend factor during geometric fusion. This way, the material at the junction transitions naturally rather than showing an abrupt color boundary. Useful for color gradients between organic shape junctions (e.g., shell and body).
+
+```glsl
+// vec3 overloaded smax, blending colors simultaneously
+vec3 smax(vec3 a, vec3 b, float k) {
+    vec3 h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// Alternatively, a separated version: returns the blend factor to the caller
+float sminWithFactor(float a, float b, float k, out float blend) {
+    float h = clamp(0.5 + 0.5 * (b - a) / k, 0.0, 1.0);
+    blend = h;
+    return mix(b, a, h) - k * h * (1.0 - h);
+}
+// Usage example:
+// float blend;
+// float d = sminWithFactor(d1, d2, 0.1, blend);
+// vec3 color = mix(color2, color1, blend);
+```
+
+### Variant 4: Layered CSG Modeling (Architectural / Industrial Scenes)
+
+**Difference from the basic version**: Does not use smooth variants; instead uses multi-level nested hard boolean operations to build precise geometric structures. An additive-then-subtractive pattern — first build the overall form with union, then carve details (windows, doorways) with subtraction. Commonly used for architectural modeling.
+
+```glsl
+float sdBuilding(vec3 p) {
+    // Step 1: Additive phase — build walls
+    float walls = sdBox(p, vec3(1.0, 0.8, 1.0));
+
+    // Step 2: Additive — roof
+    vec3 roofP = p;
+    roofP.y -= 0.8;
+    float roof = sdBox(roofP, vec3(1.2, 0.3, 1.2));
+    float d = opUnion(walls, roof);
+
+    // Step 3: Subtractive phase — carve windows
+    vec3 winP = abs(p);                  // Exploit symmetry
+    winP -= vec3(1.01, 0.3, 0.4);
+    float window = sdBox(winP, vec3(0.1, 0.15, 0.12));
+    d = opSubtraction(d, window);
+
+    // Step 4: Hollow out the interior
+    float hollow = sdBox(p, vec3(0.95, 0.75, 0.95));
+    d = opSubtraction(d, hollow);
+
+    return d;
+}
+```
+
+### Variant 5: Large-Scale Organic Character Modeling
+
+**Difference from the basic version**: Extensively uses smin/smax (100+ calls), with different k values for different body parts to control blend amounts. Large k (0.1~0.3) for torso connections, small k (0.01~0.05) for detail areas. Complex organic characters can use over 100 smooth operations to sculpt a complete form.
+
+```glsl
+float mapCharacter(vec3 p) {
+    // Torso — main ellipsoid
+    float body = sdEllipsoid(p, vec3(0.5, 0.4, 0.6));
+
+    // Head — large blend, natural transition to neck
+    float head = sdEllipsoid(p - vec3(0.0, 0.5, 0.5), vec3(0.25));
+    float d = smin(body, head, 0.2);               // Large k: wide blend band
+
+    // Ears — medium blend
+    float ear = sdEllipsoid(p - vec3(0.3, 0.6, 0.3), vec3(0.15, 0.2, 0.05));
+    d = smin(d, ear, 0.08);
+
+    // Nostrils — small blend for smooth subtraction
+    float nostril = sdSphere(p - vec3(0.0, 0.4, 0.7), 0.03);
+    d = smax(d, -nostril, 0.02);                   // Small k: fine carving
+
+    return d;
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Bounding Volume Acceleration
+
+The biggest performance bottleneck in CSG scenes is `mapScene()` being called too many times (MAX_STEPS per pixel per frame). Use AABB bounding boxes to skip distant sub-scenes:
+
+```glsl
+float mapScene(vec3 p) {
+    float d = MAX_DIST;
+    // Only compute complex sub-scene when inside bounding sphere
+    float bound = length(p - vec3(2.0, 0.0, 0.0)) - 1.5;
+    if (bound < d) {
+        d = min(d, complexSubScene(p));
+    }
+    return d;
+}
+```
+
+Using `intersectAABB` to pre-test rays against AABBs can skip regions that cannot be hit.
+
+### 2. Reducing SDF Sample Count
+
+- Use tetrahedral sampling for normal computation (4 calls) instead of central differences (6 calls)
+- Use `t += d * 0.9` to slightly reduce step size, preventing overshoot-induced penetration
+
+### 3. smin/smax Selection
+
+| Method | Performance | Accuracy | Recommended Use |
+|--------|-------------|----------|----------------|
+| Quadratic optimized | Fastest | Good | General first choice |
+| Polynomial clamp | Fast | Good | When a separate blend factor is needed |
+| Exponential | Slower | Best | Terrain, when extremely smooth transitions are needed |
+
+### 4. Avoiding k=0 with smin
+
+When `k` is zero, the quadratic optimized version causes a division-by-zero error. Always ensure `k > 0`, or fall back to hard boolean when k approaches zero:
+
+```glsl
+float safeSmin(float a, float b, float k) {
+    if (k < 0.0001) return min(a, b);
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+```
+
+### 5. Symmetry Exploitation
+
+For symmetric shapes, use `abs()` to fold coordinates and only define one side. Useful for symmetric windows, limbs, and other mirrored features:
+
+```glsl
+vec3 q = vec3(p.xy, abs(p.z)); // Mirror along Z axis
+```
+
+## Combination Suggestions in Detail
+
+### 1. CSG + Domain Repetition
+
+CSG shapes can be infinitely repeated in space via `mod()` or `fract()`, suitable for mechanical arrays, architectural railings, etc.:
+
+```glsl
+float mapRepeated(vec3 p) {
+    vec3 q = p;
+    q.x = mod(q.x + 1.0, 2.0) - 1.0; // Repeat every 2 units along X axis
+    return mapSinglePiston(q);
+}
+```
+
+### 2. CSG + Procedural Displacement
+
+Add noise displacement on top of SDF results to give smooth CSG shapes surface detail textures, adding a flowing or organic appearance:
+
+```glsl
+float mapWithDisplacement(vec3 p) {
+    float base = smin(body, limb, 0.1);
+    float noise = 0.02 * sin(10.0 * p.x) * sin(10.0 * p.y) * sin(10.0 * p.z);
+    return base + noise;
+}
+```
+
+### 3. CSG + Procedural Texturing
+
+Use smin's blend factor to blend not just geometry but also material IDs or colors, achieving cross-shape material gradients:
+
+```glsl
+vec2 mapWithMaterial(vec3 p) {
+    float d1 = sdSphere(p, 0.5);
+    float d2 = sdBox(p - vec3(0.3), vec3(0.3));
+    float blend;
+    float d = sminWithFactor(d1, d2, 0.1, blend);
+    float matId = mix(1.0, 2.0, blend); // Blend material ID
+    return vec2(d, matId);
+}
+```
+
+### 4. CSG + 2D SDF
+
+CSG is not limited to 3D. In 2D scenes, smooth union can similarly create organic shapes, like stylized cloud effects:
+
+```glsl
+float sdCloud2D(vec2 p) {
+    float d = sdBox(p, vec2(0.5, 0.1));
+    d = opSmoothUnion(d, length(p - vec2(-0.3, 0.1)) - 0.15, 0.1);
+    d = opSmoothUnion(d, length(p - vec2(0.1, 0.15)) - 0.12, 0.1);
+    d = opSmoothUnion(d, length(p - vec2(0.3, 0.08)) - 0.1, 0.1);
+    return d;
+}
+```
+
+### 5. CSG + Animation
+
+By binding CSG parameters (k values, primitive positions, primitive radii) to `iTime`, you can achieve dynamic shape deformation and blend animations:
+
+```glsl
+float mapAnimated(vec3 p) {
+    float k = 0.1 + 0.15 * sin(iTime);            // Dynamic blend radius
+    float r = 0.3 + 0.1 * sin(iTime * 2.0);       // Dynamic radius
+    float d1 = sdSphere(p, 0.5);
+    float d2 = sdSphere(p - vec3(0.8 * sin(iTime), 0.0, 0.0), r);
+    return smin(d1, d2, k);
+}
+```
--- a/skills/shader-dev/reference/domain-repetition.md
+++ b/skills/shader-dev/reference/domain-repetition.md
@@ -0,0 +1,436 @@
+# Domain Repetition and Spatial Folding — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax, `vec2/vec3/mat2` operations
+- Behavior of built-in functions like `mod()`, `fract()`, `abs()`, `atan()`
+- Signed Distance Field (SDF) concept — a function returning the distance from a point to the nearest surface
+- Basic principles of Ray Marching
+- 2D rotation matrix `mat2(cos(a), sin(a), -sin(a), cos(a))`
+
+## Core Principles in Detail
+
+The essence of domain repetition is **coordinate transformation**: before computing the SDF, the point `p`'s coordinates are folded/mapped into a finite "fundamental domain," so that every point in infinite space maps to the same cell. The SDF function only needs to evaluate coordinates within this single cell, and the result automatically repeats across all of space.
+
+**Three fundamental operations:**
+
+| Operation | Formula | Effect |
+|-----------|---------|--------|
+| **mod repetition** | `p = mod(p + period/2, period) - period/2` | Infinite translational repetition along an axis |
+| **abs mirroring** | `p = abs(p)` | Mirror symmetry across an axis plane |
+| **Rotational folding** | `angle = mod(atan(p.y, p.x), TAU/N); p = rotate(p, -angle)` | N-fold rotational symmetry |
+
+**Key mathematics:**
+
+- `mod(x, c)` maps x to the `[0, c)` range, providing periodicity
+- `abs(x)` folds the negative half-space onto the positive half-space, providing reflective symmetry
+- `fract(x) = x - floor(x)` is equivalent to `mod(x, 1.0)`, providing normalized periodicity
+
+## Step-by-Step Details
+
+### Step 1: Basic Cartesian Domain Repetition (mod Repetition)
+
+**What**: Infinitely repeat 3D space along one or more axes via translation.
+
+**Why**: `mod(p, c) - c/2` constrains coordinates to the `[-c/2, c/2)` range, dividing space into an infinite number of cells of size `c`, where each cell has identical coordinates. The SDF only needs to be defined within a single cell.
+
+**Code**:
+```glsl
+// Standard 3D domain repetition (centered version)
+// period is the size of each cell
+vec3 domainRepeat(vec3 p, vec3 period) {
+    return mod(p + period * 0.5, period) - period * 0.5;
+}
+
+// Usage example: infinitely repeat a box
+float map(vec3 p) {
+    vec3 q = domainRepeat(p, vec3(4.0)); // Repeat every 4 units
+    return sdBox(q, vec3(0.5));          // One box per cell
+}
+```
+
+> This `pos = mod(pos-2., 4.) -2.;` is this exact pattern — period=4, offset=2, perfectly centered. `p1.x = mod(p1.x-5., 10.) - 5.;` follows the same logic (period=10, centered at origin).
+
+### Step 2: Symmetric Fold Repetition (abs-mod Hybrid)
+
+**What**: On top of mod repetition, use `abs()` to give each cell mirror symmetry, eliminating seams at cell boundaries.
+
+**Why**: Plain `mod` repetition has coordinate discontinuity at cell boundaries (jumping from `+c/2` to `-c/2`), which can cause visible seams. `abs(tile - mod(p, tile*2))` makes coordinates fold back and forth within each tile from 0 to tile to 0, ensuring continuity at boundaries (equivalent to a "triangle wave").
+
+**Code**:
+```glsl
+// Symmetric fold (triangle wave mapping)
+// tile is the half-period length, full period is tile*2
+vec3 symmetricFold(vec3 p, float tile) {
+    return abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+}
+
+// Usage: classic tiling fold
+vec3 p = from + s * dir * 0.5;
+p = abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+```
+
+> The core line `p = abs(vec3(tile)-mod(p,vec3(tile*2.)));` is this pattern. `tpos.xz=abs(.5-mod(tpos.xz,1.));` is the 2D version of the same pattern (tile=0.5, period=1).
+
+### Step 3: Angular Domain Repetition (Polar Coordinate Folding)
+
+**What**: Divide space into N equal rotational sectors around an axis, achieving a kaleidoscope effect.
+
+**Why**: After converting coordinates to polar form, applying `mod(angle, TAU/N)` folds the full 360 degrees into a single `TAU/N` sector. Rotating the coordinates back makes all sectors share the same SDF.
+
+**Code**:
+```glsl
+// Angular domain repetition
+// p: xz plane coordinates, count: repetition count
+// Returns rotated coordinates (folded into the first sector)
+vec2 pmod(vec2 p, float count) {
+    float angle = atan(p.x, p.y) + PI / count;
+    float sector = TAU / count;
+    angle = floor(angle / sector) * sector;
+    return p * rot(-angle);  // rot is a 2D rotation matrix
+}
+
+// Usage: 5-fold rotational symmetry
+vec3 p1 = p;
+p1.xy = pmod(p1.xy, 5.0); // 5-fold symmetry in the xy plane
+```
+
+> The `pmod()` function implements this pattern. An alternative `amod()` function follows the same idea but uses `inout` parameters to directly modify coordinates and returns the sector index (for coloring variants).
+
+### Step 4: fract Domain Folding (For Fractal Iteration)
+
+**What**: Use `fract()` in fractal iteration loops to repeatedly fold coordinates back into the `[0,1)` range, combined with scaling to achieve self-similar structures.
+
+**Why**: `-1.0 + 2.0*fract(0.5*p+0.5)` maps p to the `[-1, 1)` range (centered fract). Each iteration divides space into 8 sub-cells (in 3D), each recursively undergoing the same operation. Combined with the scaling factor `k = s/dot(p,p)` (spherical inversion), this produces fractal hierarchical structure.
+
+**Code**:
+```glsl
+// Core loop of an Apollonian fractal
+float map(vec3 p, float s) {
+    float scale = 1.0;
+    vec4 orb = vec4(1000.0); // Orbit trap for coloring
+
+    for (int i = 0; i < 8; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // Centered fract fold
+
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));  // Orbit capture
+
+        float k = s / r2;    // Spherical inversion scaling
+        p *= k;
+        scale *= k;
+    }
+
+    return 0.25 * abs(p.y) / scale; // Distance must be divided by accumulated scale
+}
+```
+
+> `-1.0 + 2.0*fract(0.5*p+0.5)` is equivalent to `mod(p+1, 2) - 1`, mapping p to [-1,1).
+
+### Step 5: Iterative abs Folding (IFS / Kali-set)
+
+**What**: Repeatedly execute `p = abs(p) - offset` inside a loop, combined with rotation and scaling, to generate fractal symmetric structures.
+
+**Why**: `abs(p)` folds space into the positive octant, `-offset` translates the origin, then `abs()` folds again... each iteration adds another layer of symmetry. This is one implementation of an Iterated Function System (IFS). Combined with rotation, it produces extremely rich fractal structures.
+
+**Code**:
+```glsl
+// IFS abs folding fractal
+float ifsBox(vec3 p) {
+    for (int i = 0; i < 5; i++) {
+        p = abs(p) - 1.0;        // Fold + offset
+        p.xy *= rot(iTime * 0.3); // Rotation adds complexity
+        p.xz *= rot(iTime * 0.1);
+    }
+    return sdBox(p, vec3(0.4, 0.8, 0.3));
+}
+
+// Kali-set variant: uses dot(p,p) scaling
+vec2 de(vec3 pos) {
+    vec3 tpos = pos;
+    tpos.xz = abs(0.5 - mod(tpos.xz, 1.0)); // mod repetition first, then IFS
+    vec4 p = vec4(tpos, 1.0);                // w component tracks scaling
+    for (int i = 0; i < 7; i++) {
+        p.xyz = abs(p.xyz) - vec3(-0.02, 1.98, -0.02);
+        p = p * (2.0) / clamp(dot(p.xyz, p.xyz), 0.4, 1.0)
+            - vec4(0.5, 1.0, 0.4, 0.0);
+        p.xz *= rot(0.416);  // Intra-iteration rotation
+    }
+    return vec2(length(max(abs(p.xyz)-vec3(0.1,5.0,0.1), 0.0)) / p.w, 0.0);
+}
+```
+
+> Note that the `de()` variant uses the `vec4`'s w component to accumulate the scaling factor (`p.w`), and the final distance is divided by `p.w` to maintain SDF validity.
+
+### Step 6: Reflection Folding (Polyhedral Symmetry)
+
+**What**: Fold space into the fundamental domain of a polyhedron (such as an icosahedron) through a set of reflection planes.
+
+**Why**: Regular polyhedra have multiple symmetry planes. Reflecting along each symmetry plane via `p = p - 2*dot(p,n)*n` folds all of space into a "fundamental domain" (1/60th of the entire polyhedron for an icosahedron). Geometry only needs to be defined within this fundamental domain.
+
+**Code**:
+```glsl
+// Plane reflection
+float pReflect(inout vec3 p, vec3 planeNormal, float offset) {
+    float t = dot(p, planeNormal) + offset;
+    if (t < 0.0) {
+        p = p - (2.0 * t) * planeNormal;
+    }
+    return sign(t);
+}
+
+// Icosahedral folding
+void pModIcosahedron(inout vec3 p) {
+    // nc is the third fold plane normal (the first two are the xz and yz planes)
+    vec3 nc = vec3(-0.5, -cos(PI/5.0), sqrt(0.75 - cos(PI/5.0)*cos(PI/5.0)));
+    p = abs(p);          // xz and yz plane reflections
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+}
+```
+
+> Full icosahedral symmetry group is achieved through alternating `abs()` and `pReflect()`.
+
+### Step 7: Toroidal / Cylindrical Domain Warping (displaceLoop)
+
+**What**: Bend planar space into cylindrical or toroidal topology.
+
+**Why**: `displaceLoop` converts Cartesian coordinates `(x, z)` into `(distance_to_center - R, angle)`, "rolling" a plane into a cylinder/torus of radius R. The angular dimension can then undergo `amod` for angular repetition.
+
+**Code**:
+```glsl
+// Toroidal domain warp: bend the xz plane into a torus
+vec2 displaceLoop(vec2 p, float radius) {
+    return vec2(length(p) - radius, atan(p.y, p.x));
+}
+
+// Usage example: architectural ring corridor
+vec3 pDonut = p;
+pDonut.x += donutRadius;
+pDonut.xz = displaceLoop(pDonut.xz, donutRadius);
+pDonut.z *= donutRadius; // Unwrap angle to linear length
+// Now pDonut is "flattened" ring coordinates, ready for linear repetition
+```
+
+> The `displaceLoop` function bends an architectural scene into a ring structure.
+
+### Step 8: 1D Centered Domain Repetition (with Cell ID)
+
+**What**: Perform centered mod repetition along one axis and return the current cell number.
+
+**Why**: Cell IDs can be used to assign different random properties (color, size, rotation, etc.) to each cell's geometry, breaking the uniformity of perfect repetition.
+
+**Code**:
+```glsl
+// 1D centered domain repetition, returns cell index
+float pMod1(inout float p, float size) {
+    float halfsize = size * 0.5;
+    float c = floor((p + halfsize) / size); // Cell index
+    p = mod(p + halfsize, size) - halfsize; // Centered local coordinate
+    return c;
+}
+
+// Usage: repeat along x axis and get cell ID
+float cellID = pMod1(p.x, 2.0);
+float salt = fract(sin(cellID * 127.1) * 43758.5453); // Random seed
+```
+
+> This is a standard domain repetition library function. A simpler `repeat()` function follows the same pattern (version without returning the index).
+
+## Common Variants in Detail
+
+### 1. Volumetric Glow Rendering
+
+Unlike standard ray marching, this does not check for surface hits. Instead, it accumulates a "distance-to-brightness" contribution at each step.
+
+**Difference from the basic version**: No normal computation or traditional shading needed. Each step accumulates glow via `exp(-dist * k)`.
+
+**Key modified code**:
+```glsl
+// Replace hit detection in raymarch with glow accumulation
+float acc = 0.0;
+float t = 0.0;
+for (int i = 0; i < 99; i++) {
+    vec3 pos = ro + rd * t;
+    float dist = map(pos);
+    dist = max(abs(dist), 0.02);     // Prevent division by zero, abs allows passing through surfaces
+    acc += exp(-dist * 3.0);          // Adjustable: decay coefficient controls glow sharpness
+    t += dist * 0.5;                  // Adjustable: step scale (<1 means denser sampling)
+}
+vec3 col = vec3(acc * 0.01, acc * 0.011, acc * 0.012);
+```
+
+> This volumetric glow rendering strategy is commonly used in fractal domain repetition shaders.
+
+### 2. Single-Axis / Dual-Axis Selective Repetition
+
+Repeat along only certain axes while keeping others unchanged. Suitable for corridors, columns, and other directional scenes.
+
+**Difference from the basic version**: Does not use `vec3` full-axis repetition; only applies mod to the needed components.
+
+**Key modified code**:
+```glsl
+// Repeat only along x and z axes, y axis unrepeated
+float map(vec3 pos) {
+    vec3 q = pos;
+    q.xz = mod(q.xz + 2.0, 4.0) - 2.0; // Only xz repeated
+    // q.y retains original value, providing finite height
+    return sdBox(q, vec3(0.3, 0.5, 0.3));
+}
+```
+
+### 3. Fractal fract Domain Folding (Apollonian Type)
+
+Uses `fract()` instead of `mod()` for iterative folding, combined with scaling and orbit trapping to create fractals.
+
+**Difference from the basic version**: Repeatedly applies fract+scaling in a loop rather than a one-time mod; uses orbit trap coloring.
+
+**Key modified code**:
+```glsl
+float scale = 1.0;
+for (int i = 0; i < 8; i++) {
+    p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // fract fold
+    float r2 = dot(p, p);
+    float k = 1.2 / r2;                      // Adjustable: scaling parameter
+    p *= k;
+    scale *= k;
+}
+return 0.25 * abs(p.y) / scale;
+```
+
+### 4. Multi-Level Nested Repetition
+
+Apply angular repetition within a sector, then linear repetition within each sector, or vice versa.
+
+**Difference from the basic version**: Domain repetition operations are nested across multiple levels, each providing a different spatial organization.
+
+**Key modified code**:
+```glsl
+// Outer level: angular repetition
+float indexX = amod(p.xz, segments); // Divide into N sectors
+p.x -= radius;
+// Inner level: linear repetition
+p.y = repeat(p.y, cellSize);         // Repeat along y axis
+// Random seed for each cell
+float salt = rng(vec2(indexX, floor(p.y / cellSize)));
+```
+
+> This kind of nesting is commonly used in architectural scene shaders.
+
+### 5. Bounded Domain Repetition (Finite Repetition)
+
+Use `clamp` to limit the mod cell index, achieving a finite number of repetitions.
+
+**Difference from the basic version**: Uses `clamp` to restrict the cell index to `[-N, N]`, repeating only `2N+1` times.
+
+**Key modified code**:
+```glsl
+// Finite domain repetition: repeat at most N times along each axis
+vec3 domainRepeatLimited(vec3 p, float size, vec3 limit) {
+    return p - size * clamp(floor(p / size + 0.5), -limit, limit);
+}
+
+// Usage: repeat 5 times along x, 3 times each along y and z
+vec3 q = domainRepeatLimited(p, 2.0, vec3(2.0, 1.0, 1.0));
+```
+
+## Performance Optimization Deep Dive
+
+### Bottleneck 1: High Iteration Count in Fractal Loops
+
+**Problem**: When IFS or fract folding loops iterate too many times, the `map()` function slows down, and `map()` is called at every step during ray marching.
+
+**Optimization**:
+- Reduce fractal iteration count (5-8 iterations are usually sufficient)
+- Use the `vec4`'s w component to track the scaling factor, avoiding extra scaling variables
+- Set upper and lower bounds in `clamp(dot(p,p), min, max)` to prevent numerical blowup
+
+### Bottleneck 2: mod Repetition Causing Inaccurate Distance Fields
+
+**Problem**: The SDF after domain repetition may be inaccurate at cell boundaries (geometry in adjacent cells may be closer), causing ray marching overshoot or extra steps.
+
+**Optimization**:
+- Ensure geometry fits entirely within the cell (radius < period/2)
+- Use a smaller step factor (`t += d * 0.5` instead of `t += d`)
+- For volumetric glow rendering, use `max(abs(d), minDist)` to prevent excessively small step sizes
+
+### Bottleneck 3: Compilation Time from Nested Repetition
+
+**Problem**: Multi-level nested domain repetition and fractal loops can cause very long shader compilation times.
+
+**Optimization**:
+- Pre-compute constant expressions in `map()`
+- Avoid `normalize()` inside loops (manually divide by length instead)
+- Use the loop version for normal computation instead of unrolled version to reduce compiler inlining
+
+### Bottleneck 4: Sampling Rate for Volumetric Glow Rendering
+
+**Problem**: Volumetric glow rendering requires dense sampling along the ray.
+
+**Optimization**:
+- Increase step size with distance: `t += dist * (0.3 + t * 0.02)`
+- Reduce sampling density for distant regions; the distance decay `exp(-totdist)` naturally hides precision loss
+- Use a `distfading` multiplier to gradually attenuate distant contributions (e.g., `fade *= distfading`)
+
+## Combination Suggestions with Complete Code
+
+### 1. Domain Repetition + Ray Marching
+
+**The most basic and most common combination.** Domain repetition defines the geometric spatial structure; ray marching handles rendering. This is the most fundamental combination in SDF rendering.
+
+### 2. Domain Repetition + Orbit Trap Coloring
+
+Record intermediate values during the fractal iteration loop (e.g., `min(orb, abs(p))`), used to color fractal structures. Avoids the high cost of normal computation + lighting on fractal surfaces.
+
+**Combination approach**:
+```glsl
+vec4 orb = vec4(1000.0);
+for (...) {
+    p = fold(p);
+    orb = min(orb, vec4(abs(p), dot(p,p)));
+}
+// Use orb values for color mapping
+vec3 color = mix(vec3(1,0.8,0.2), vec3(1,0.55,0), clamp(orb.y * 6.0, 0.0, 1.0));
+```
+
+### 3. Domain Repetition + Toroidal / Polar Coordinate Warping
+
+First use `displaceLoop` to bend space into a toroidal topology, then perform linear and angular repetition in the flattened coordinates. Suitable for creating ring corridors, donut buildings, etc.
+
+**Combination approach**:
+```glsl
+p.xz = displaceLoop(p.xz, R);  // Bend into ring
+p.z *= R;                       // Angle to length
+amod(p.xz, N);                  // Angular repetition
+p.y = repeat(p.y, cellSize);    // Linear repetition
+```
+
+### 4. Domain Repetition + Noise / Random Variants
+
+Generate pseudo-random numbers from cell IDs to inject variation into each repeated cell (size, rotation, color offset), breaking the uniformity.
+
+**Combination approach**:
+```glsl
+float cellID = pMod1(p.x, size);
+float salt = fract(sin(cellID * 127.1) * 43758.5453);
+// Use salt to modulate geometric parameters
+float boxSize = 0.3 + 0.2 * salt;
+```
+
+### 5. Domain Repetition + Polar Coordinate Spiral Transform
+
+Use `cartToPolar` / `polarToCart` coordinate transforms combined with `pMod1` for repetition along spiral paths. Suitable for DNA double helices, springs, threads, etc.
+
+**Combination approach**:
+```glsl
+p = cartToPolar(p);         // Convert to polar coordinates
+p.y *= radius;               // Unwrap angle to length
+// Repeat along spiral line
+vec2 closest = closestPointOnRepeatedLine(vec2(lead, radius*TAU), p.xy);
+p.xy -= closest;             // Local coordinates
+```
--- a/skills/shader-dev/reference/domain-warping.md
+++ b/skills/shader-dev/reference/domain-warping.md
@@ -0,0 +1,419 @@
+# Domain Warping — Detailed Reference
+
+This document contains the complete step-by-step tutorial, mathematical derivations, and advanced usage for domain warping techniques. See [SKILL.md](SKILL.md) for the condensed version.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform variables, built-in functions (`mix`, `smoothstep`, `fract`, `floor`, `sin`, `dot`)
+- **Vector Math**: dot product, matrix multiplication, 2D rotation matrix
+- **Noise Function Concepts**: understanding the basic principle of value noise (lattice interpolation)
+- **fBM (Fractal Brownian Motion)**: superposition of multiple noise layers at different frequencies/amplitudes
+- **ShaderToy Environment**: meaning of `iTime`, `iResolution`, `fragCoord`
+
+## Implementation Steps in Detail
+
+### Step 1: Hash Function
+
+**What**: Implement a hash function that maps 2D integer coordinates to a pseudo-random float.
+
+**Why**: This is the foundation of noise functions — producing deterministic "random" values at each lattice point. The `sin-dot` trick compresses 2D input to 1D then takes the fractional part, using sin's high-frequency oscillation to produce a chaotic distribution.
+
+**Code**:
+```glsl
+float hash(vec2 p) {
+    p = fract(p * 0.6180339887); // Golden ratio pre-perturbation
+    p *= 25.0;
+    return fract(p.x * p.y * (p.x + p.y));
+}
+```
+
+> Note: The classic `fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453)` version can also be used, but the sin-free version above is more stable in precision on some GPUs.
+
+### Step 2: Value Noise
+
+**What**: Implement 2D value noise — take hash values at integer lattice points and interpolate between them with Hermite smoothing.
+
+**Why**: Value noise is the simplest continuous noise, producing smooth, jump-free output suitable as the foundation for fBM. Hermite interpolation `f*f*(3.0-2.0*f)` ensures the derivative is zero at lattice points, avoiding the angular appearance of linear interpolation.
+
+**Code**:
+```glsl
+float noise(vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    f = f * f * (3.0 - 2.0 * f); // Hermite smooth interpolation
+
+    return mix(
+        mix(hash(i + vec2(0.0, 0.0)), hash(i + vec2(1.0, 0.0)), f.x),
+        mix(hash(i + vec2(0.0, 1.0)), hash(i + vec2(1.0, 1.0)), f.x),
+        f.y
+    );
+}
+```
+
+### Step 3: fBM (Fractal Brownian Motion)
+
+**What**: Superpose multiple noise layers at different frequencies/amplitudes to create fractal noise with self-similar properties.
+
+**Why**: A single noise layer is too uniform. fBM superimposes multiple "octaves" to simulate nature's fractal structures. Each layer doubles in frequency (lacunarity ~ 2.0), halves in amplitude (persistence = 0.5), and uses a rotation matrix to break lattice alignment.
+
+**Code**:
+```glsl
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80); // Rotation ~36.87°, for decorrelation
+
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875; // Normalize: sum of all amplitudes
+}
+```
+
+> Using lacunarity values of 2.01~2.04 rather than exact 2.0 is to **avoid visual artifacts caused by lattice regularity**. This is a widely adopted trick in classic implementations.
+
+### Step 4: Domain Warping (Core)
+
+**What**: Use fBM output as a coordinate offset, recursively nesting to form multi-level warping.
+
+**Why**: This is the core of the entire technique. `fbm(p)` generates a scalar field; adding it to the coordinate `p` is equivalent to "pulling and stretching space according to the noise field's shape." Multi-level nesting makes the deformation more complex and organic — each warping level operates in space already deformed by the previous level.
+
+**Code**:
+```glsl
+float pattern(vec2 p) {
+    return fbm(p + fbm(p + fbm(p)));
+}
+```
+
+This single line is the classic three-level domain warping. It can be decomposed for understanding:
+
+```glsl
+float pattern(vec2 p) {
+    float warp1 = fbm(p);           // Level 1: noise in original space
+    float warp2 = fbm(p + warp1);   // Level 2: noise in first-level warped space
+    float result = fbm(p + warp2);  // Level 3: final value in second-level warped space
+    return result;
+}
+```
+
+### Step 5: Time Animation
+
+**What**: Inject `iTime` into specific fBM octaves so the warp field evolves over time.
+
+**Why**: Directly offsetting all octaves causes uniform translation, lacking organic feel. The classic approach is to inject time only in the lowest frequency (first layer) and highest frequency (last layer) — low frequency drives overall flow, high frequency adds detail variation.
+
+**Code**:
+```glsl
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p + iTime);  // Lowest frequency with time: slow overall flow
+    p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p + sin(iTime)); // Highest frequency with time: subtle detail motion
+    return f / 0.96875;
+}
+```
+
+### Step 6: Coloring
+
+**What**: Map the scalar output of the warp field to colors.
+
+**Why**: Domain warping outputs a scalar field (0~1 range) that needs to be mapped to visually meaningful colors. The classic method uses a `mix` chain — interpolating between multiple preset colors using the warp value.
+
+**Code**:
+```glsl
+vec3 palette(float t) {
+    vec3 col = vec3(0.2, 0.1, 0.4);                              // Deep purple base
+    col = mix(col, vec3(0.3, 0.05, 0.05), t);                    // Dark red
+    col = mix(col, vec3(0.9, 0.9, 0.9), t * t);                  // White at high values
+    col = mix(col, vec3(0.0, 0.2, 0.4), smoothstep(0.6, 0.8, t));// Blue highlights
+    return col * t * 2.0;                                         // Overall brightness modulation
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Multi-Resolution Layered Warping
+
+**Difference from the basic version**: Uses different octave counts for different warping layers — coarse layers use 4 octaves (fast, low frequency), detail layers use 6 octaves (fine, high frequency). Outputs `vec2` for two-dimensional displacement rather than scalar offset. Intermediate variables participate in coloring, producing richer color gradients.
+
+**Key modified code**:
+```glsl
+// 4-octave fBM (coarse layer)
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;
+}
+
+// 6-octave fBM (fine layer)
+float fbm6(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875;
+}
+
+// vec2 output version (independent displacement per axis)
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));
+}
+vec2 fbm6_2(vec2 p) {
+    return vec2(fbm6(p + vec2(9.2)), fbm6(p + vec2(5.7)));
+}
+
+// Layered warping chain
+float func(vec2 q, out vec2 o, out vec2 n) {
+    q += 0.05 * sin(vec2(0.11, 0.13) * iTime + length(q) * 4.0);
+    o = 0.5 + 0.5 * fbm4_2(q);           // Level 1: coarse displacement
+    o += 0.02 * sin(vec2(0.13, 0.11) * iTime * length(o));
+    n = fbm6_2(4.0 * o);                  // Level 2: fine displacement
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p); // Level 3: final scalar field
+    f = mix(f, f * f * f * 3.5, f * abs(n.x)); // Contrast enhancement
+    return f;
+}
+
+// Coloring uses intermediate variables o, n
+vec3 col = vec3(0.2, 0.1, 0.4);
+col = mix(col, vec3(0.3, 0.05, 0.05), f);
+col = mix(col, vec3(0.9, 0.9, 0.9), dot(n, n));         // n magnitude drives white
+col = mix(col, vec3(0.5, 0.2, 0.2), 0.5 * o.y * o.y);   // o.y drives brown
+col = mix(col, vec3(0.0, 0.2, 0.4), 0.5 * smoothstep(1.2, 1.3, abs(n.y) + abs(n.x)));
+col *= f * 2.0;
+```
+
+### Variant 2: Turbulence / Ridge Warping (Electric Arc / Plasma Effect)
+
+**Difference from the basic version**: Takes the absolute value of noise `abs(noise - 0.5)` inside fBM, producing sharp ridge textures instead of smooth waves. Dual-axis independent fBM displacement (separate x/y offsets) combined with reverse time drift creates turbulence.
+
+**Key modified code**:
+```glsl
+// Turbulence / ridged fBM
+float fbm_ridged(vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        rz += abs((noise(p) - 0.5) * 2.0) / z; // abs() produces ridge folding
+        z *= 2.0;
+        p *= 2.0;
+    }
+    return rz;
+}
+
+// Dual-axis independent displacement
+float dualfbm(vec2 p) {
+    vec2 p2 = p * 0.7;
+    // Opposite time drift in two directions creates turbulence
+    vec2 basis = vec2(
+        fbm_ridged(p2 - iTime * 0.24),  // x axis drifts left
+        fbm_ridged(p2 + iTime * 0.26)   // y axis drifts right
+    );
+    basis = (basis - 0.5) * 0.2;         // Scale to small displacement
+    p += basis;
+    return fbm_ridged(p * makem2(iTime * 0.03)); // Slow overall rotation
+}
+
+// Electric arc coloring (division creates high-contrast light/dark)
+vec3 col = vec3(0.2, 0.1, 0.4) / rz;
+```
+
+### Variant 3: Domain Warping with Pseudo-3D Lighting
+
+**Difference from the basic version**: Estimates screen-space normals from the warp field using finite differences, then applies directional lighting, giving the 2D warp field a 3D relief appearance. Combined with color inversion and square compression to produce a characteristic dark tone.
+
+**Key modified code**:
+```glsl
+// Screen-space normal estimation (finite differences)
+float e = 2.0 / iResolution.y; // Sample spacing = 1 pixel
+vec3 nor = normalize(vec3(
+    pattern(p + vec2(e, 0.0)) - shade,  // df/dx
+    2.0 * e,                             // Constant y (controls normal tilt)
+    pattern(p + vec2(0.0, e)) - shade    // df/dy
+));
+
+// Dual-component lighting
+vec3 lig = normalize(vec3(0.9, 0.2, -0.4));
+float dif = clamp(0.3 + 0.7 * dot(nor, lig), 0.0, 1.0);
+vec3 lin = vec3(0.70, 0.90, 0.95) * (nor.y * 0.5 + 0.5);  // Hemisphere ambient light
+lin += vec3(0.15, 0.10, 0.05) * dif;                         // Warm diffuse
+
+col *= 1.2 * lin;
+col = 1.0 - col;       // Color inversion
+col = 1.1 * col * col;  // Square compression, increases dark contrast
+```
+
+### Variant 4: Flow Field Iterative Warping (Gas Giant Planet Effect)
+
+**Difference from the basic version**: Instead of directly nesting fBM, computes the fBM gradient field and iteratively advances coordinates via Euler integration. Simulates fluid advection, producing vortex-like planetary atmospheric banding.
+
+**Key modified code**:
+```glsl
+#define ADVECT_ITERATIONS 5 // Adjustable: iteration count, more = more pronounced vortices
+
+// Compute fBM gradient (finite differences)
+vec2 field(vec2 p) {
+    float t = 0.2 * iTime;
+    p.x += t;
+    float n = fbm(p, t);
+    float e = 0.25;
+    float nx = fbm(p + vec2(e, 0.0), t);
+    float ny = fbm(p + vec2(0.0, e), t);
+    return vec2(n - ny, nx - n) / e; // 90° rotated gradient = streamline direction
+}
+
+// Iterative flow field advection
+vec3 distort(vec2 p) {
+    for (float i = 0.0; i < float(ADVECT_ITERATIONS); i++) {
+        p += field(p) / float(ADVECT_ITERATIONS);
+    }
+    return vec3(fbm(p, 0.0)); // Sample at the advected coordinates
+}
+```
+
+### Variant 5: 3D Volumetric Domain Warping (Explosion / Fireball Effect)
+
+**Difference from the basic version**: Extends domain warping from 2D to 3D, using 3D fBM to displace a sphere's distance field, then rendering via sphere tracing or volumetric ray marching. Produces volcanic eruptions, solar surface, and other volumetric effects.
+
+**Key modified code**:
+```glsl
+#define NOISE_FREQ 4.0     // Adjustable: noise frequency
+#define NOISE_AMP -0.5     // Adjustable: displacement amplitude (negative = inward bulging feel)
+
+// 3D rotation matrix (for decorrelation)
+mat3 m3 = mat3(0.00, 0.80, 0.60,
+              -0.80, 0.36,-0.48,
+              -0.60,-0.48, 0.64);
+
+// 3D value noise
+float noise3D(vec3 p) {
+    vec3 fl = floor(p);
+    vec3 fr = fract(p);
+    fr = fr * fr * (3.0 - 2.0 * fr);
+    float n = fl.x + fl.y * 157.0 + 113.0 * fl.z;
+    return mix(mix(mix(hash(n+0.0),   hash(n+1.0),   fr.x),
+                   mix(hash(n+157.0), hash(n+158.0), fr.x), fr.y),
+               mix(mix(hash(n+113.0), hash(n+114.0), fr.x),
+                   mix(hash(n+270.0), hash(n+271.0), fr.x), fr.y), fr.z);
+}
+
+// 3D fBM
+float fbm3D(vec3 p) {
+    float f = 0.0;
+    f += 0.5000 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.2500 * noise3D(p); p = m3 * p * 2.03;
+    f += 0.1250 * noise3D(p); p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.03125 * abs(noise3D(p)); // Last layer uses abs for added detail
+    return f / 0.9375;
+}
+
+// Sphere distance field + domain warping displacement
+float distanceFunc(vec3 p, out float displace) {
+    float d = length(p) - 0.5; // Sphere SDF
+    displace = fbm3D(p * NOISE_FREQ + vec3(0, -1, 0) * iTime);
+    d += displace * NOISE_AMP;  // fBM displaces the surface
+    return d;
+}
+```
+
+## Performance Optimization Deep Dive
+
+### Bottleneck Analysis
+
+The main performance bottleneck of domain warping is **repeated noise sampling**. Three warping levels times 6 octaves = 18 noise samples per pixel, plus finite differences for lighting (2 additional full warping computations), totaling up to **54 noise samples/pixel**.
+
+### Optimization Techniques
+
+1. **Reduce octave count**: Using 4 octaves instead of 6 shows little visual difference but improves performance by ~33%
+   ```glsl
+   // Use 4 octaves for coarse layers, only 6 octaves for fine layers
+   ```
+
+2. **Reduce warping depth**: Two-level warping `fbm(p + fbm(p))` already produces organic results, saving ~33% performance over three levels
+
+3. **Use sin-product noise instead of value noise**: `sin(p.x)*sin(p.y)` is completely branch-free with no memory access, suitable for mobile
+   ```glsl
+   float noise(vec2 p) {
+       return sin(p.x) * sin(p.y); // Minimal version, no hash needed
+   }
+   ```
+
+4. **GPU built-in derivatives instead of finite differences**: Saves 2 extra full warping computations
+   ```glsl
+   // Use dFdx/dFdy instead of manual finite differences (slightly lower quality but 3x faster)
+   vec3 nor = normalize(vec3(dFdx(shade) * iResolution.x, 6.0, dFdy(shade) * iResolution.y));
+   ```
+
+5. **Texture noise**: Pre-bake noise textures and use `texture()` instead of procedural noise, converting computation to memory reads
+   ```glsl
+   float noise(vec2 x) {
+       return texture(iChannel0, x * 0.01).x;
+   }
+   ```
+
+6. **LOD adaptation**: Reduce octave count for distant pixels
+   ```glsl
+   int octaves = int(mix(float(NUM_OCTAVES), 2.0, length(uv) / 5.0));
+   ```
+
+7. **Supersampling strategy**: Only use 2x2 supersampling when anti-aliasing is needed (4x performance cost)
+   ```glsl
+   #if HW_PERFORMANCE == 0
+   #define AA 1
+   #else
+   #define AA 2
+   #endif
+   ```
+
+## Combination Suggestions with Complete Code Examples
+
+### Combining with Ray Marching
+The scalar field generated by domain warping can serve directly as an SDF displacement function, deforming smooth geometry into organic forms. Used for flames, explosions, alien creatures, etc.
+```glsl
+float sdf(vec3 p) {
+    return length(p) - 1.0 + fbm3D(p * 4.0) * 0.3;
+}
+```
+
+### Combining with Polar Coordinate Transform
+Perform domain warping in polar coordinate space to produce vortices, nebulae, spirals, and other effects.
+```glsl
+vec2 polar = vec2(length(uv), atan(uv.y, uv.x));
+float shade = pattern(polar);
+```
+
+### Combining with Cosine Color Palette
+The cosine palette `a + b*cos(2*pi*(c*t+d))` is more flexible than a fixed mix chain. By adjusting four vec3 parameters, you can quickly switch color schemes.
+```glsl
+vec3 palette(float t) {
+    vec3 a = vec3(0.5); vec3 b = vec3(0.5);
+    vec3 c = vec3(1.0); vec3 d = vec3(0.0, 0.33, 0.67);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Combining with Post-Processing Effects
+- **Bloom/Glow**: Blur and overlay high-brightness areas to enhance glow effects
+- **Tone Mapping**: `col = col / (1.0 + col)` to compress HDR range
+- **Chromatic Aberration**: Sample the warp field at offset positions for R/G/B channels separately
+```glsl
+float r = pattern(uv + vec2(0.003, 0.0));
+float g = pattern(uv);
+float b = pattern(uv - vec2(0.003, 0.0));
+```
+
+### Combining with Particle Systems / Geometry
+The domain warping scalar field can drive particle velocity fields, mesh vertex displacement, or UV animation deformation — not limited to pure fragment shader usage.
--- a/skills/shader-dev/reference/fluid-simulation.md
+++ b/skills/shader-dev/reference/fluid-simulation.md
@@ -0,0 +1,425 @@
+# Fluid Simulation — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisite knowledge, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+### GLSL Basics
+- `texture`/`texelFetch` sampling, `iChannel0` buffer feedback, multi-pass rendering
+- ShaderToy multi-buffer architecture: data flow between Buffer A/B/C/D
+
+### Vector Calculus Basics
+- Gradient: the spatial rate of change of a scalar field, pointing in the direction of greatest increase
+- Divergence: the "source/sink" strength of a vector field
+- Curl: the local rotational strength of a vector field
+- Laplacian: the second derivative of a scalar field, measuring deviation from the neighborhood mean
+
+### Data Encoding Paradigm
+Understanding the paradigm of "encoding physical quantities into texture RGBA channels":
+- `.xy` = velocity
+- `.z` = pressure / density
+- `.w` = passive scalar, e.g., ink concentration
+
+## Implementation Steps in Detail
+
+### Step 1: Data Encoding and Buffer Layout
+
+**What**: Encode fluid physical quantities into the RGBA channels of a texture.
+
+**Why**: GPU textures serve as the storage medium for fluid state. Each pixel is a grid cell, with channels storing different physical quantities, enabling full fluid state persistence.
+
+**Code**:
+```glsl
+// Data layout convention:
+// .xy = velocity field
+// .z  = pressure / density
+// .w  = passive scalar, e.g., ink concentration
+
+// Sampling macro — simplify neighborhood access
+#define T(p) texture(iChannel0, (p) / iResolution.xy)
+
+// Get current pixel and its four neighbors
+vec4 c = T(p);                    // center
+vec4 n = T(p + vec2(0, 1));       // north
+vec4 e = T(p + vec2(1, 0));       // east
+vec4 s = T(p - vec2(0, 1));       // south
+vec4 w = T(p - vec2(1, 0));       // west
+```
+
+### Step 2: Discrete Differential Operators
+
+**What**: Compute gradient, Laplacian, divergence, and curl over a 3x3 pixel neighborhood.
+
+**Why**: These operators are the foundation for discretizing the Navier-Stokes equations. A 3x3 stencil is more isotropic than a simple cross stencil, reducing grid-direction artifacts.
+
+**Code**:
+```glsl
+// ===== Laplacian =====
+// Weighted 3x3 stencil: center weight _K0, edge weight _K1, corner weight _K2
+const float _K0 = -20.0 / 6.0;  // adjustable: center weight
+const float _K1 =   4.0 / 6.0;  // adjustable: edge weight
+const float _K2 =   1.0 / 6.0;  // adjustable: corner weight
+
+vec4 laplacian = _K0 * c
+    + _K1 * (n + e + s + w)
+    + _K2 * (T(p+vec2(1,1)) + T(p+vec2(-1,1)) + T(p+vec2(1,-1)) + T(p+vec2(-1,-1)));
+
+// ===== Gradient =====
+// Central difference with diagonal correction
+vec4 dx = (e - w) / 2.0;
+vec4 dy = (n - s) / 2.0;
+
+// ===== Divergence =====
+float div = dx.x + dy.y;  // ∂vx/∂x + ∂vy/∂y
+
+// ===== Curl / Vorticity =====
+float curl = dx.y - dy.x;  // ∂vy/∂x - ∂vx/∂y
+```
+
+### Step 3: Initial Frame and Noise
+
+**What**: Initialize the fluid state and inject a small amount of noise to avoid symmetry lock.
+
+**Why**: If the initial state is entirely zero (zero velocity), the fluid equations will maintain this symmetric state and never move. Adding a small amount of random noise breaks the symmetry, allowing turbulence to develop naturally.
+
+**Code**:
+```glsl
+if (iFrame < 10) {
+    vec2 uv = p / iResolution.xy;
+    // Position-based pseudo-random noise
+    float noise = fract(sin(dot(uv, vec2(12.9898, 78.233))) * 43758.5453);
+    // velocity.xy = small noise, pressure.z = 1.0, ink.w = small amount
+    fragColor = vec4(noise * 1e-4, noise * 1e-4, 1.0, noise * 0.1);
+    return;
+}
+```
+
+### Step 4: Semi-Lagrangian Advection
+
+**What**: Trace backward along the velocity field and sample from the upstream position to update the current pixel.
+
+**Why**: This is the standard method for handling the `-(v·∇)v` advection term. Direct forward advection on an Eulerian grid leads to instability, while the semi-Lagrangian method is unconditionally stable — it won't blow up regardless of time step size.
+
+**Code**:
+```glsl
+#define DT 0.15  // adjustable: time step, larger = faster fluid motion but may reduce accuracy
+
+// Core: backward tracing — find the "upstream" position by tracing backward along velocity
+// Then sample from the upstream position, effectively "transporting" the upstream state here
+vec4 advected = T(p - DT * c.xy);
+
+// Only advect velocity and passive scalar (ink), preserve local pressure
+c.xyw = advected.xyw;
+```
+
+### Step 5: Viscous Diffusion
+
+**What**: Apply Laplacian diffusion to the velocity field to simulate viscosity.
+
+**Why**: Corresponds to the `ν∇²v` term. Viscosity smooths the velocity field, dissipating small-scale vortices. The parameter `ν` controls whether the fluid behaves like "water" (low viscosity) or "honey" (high viscosity).
+
+**Code**:
+```glsl
+#define NU 0.5     // adjustable: kinematic viscosity coefficient. 0.01=water, 1.0=syrup
+#define KAPPA 0.1  // adjustable: passive scalar (ink) diffusion coefficient
+
+c.xy  += DT * NU * laplacian.xy;     // velocity diffusion
+c.w   += DT * KAPPA * laplacian.w;   // ink diffusion
+```
+
+### Step 6: Pressure Projection
+
+**What**: Compute the gradient of the pressure field and subtract it from the velocity field to enforce the incompressibility constraint.
+
+**Why**: This is the core of Helmholtz-Hodge decomposition — decomposing the velocity field into a divergence-free part (what we want) and a curl-free part. By projecting out the divergence component via `v = v - K·∇p`, we ensure `∇·v ≈ 0`. In ShaderToy, the per-frame buffer feedback itself constitutes an implicit Jacobi iteration.
+
+**Code**:
+```glsl
+#define K 0.2  // adjustable: pressure correction strength. Too large causes oscillation, too small yields poor incompressibility
+
+// Pressure is stored in the .z channel
+// Use pressure gradient to correct velocity, eliminating divergence
+c.xy -= K * vec2(dx.z, dy.z);
+
+// Mass conservation: update density/pressure based on divergence (Euler method)
+c.z -= DT * (dx.z * c.x + dy.z * c.y + div * c.z);
+```
+
+### Step 7: External Forces and Mouse Interaction
+
+**What**: Inject velocity and ink into the fluid based on mouse input.
+
+**Why**: The external force term `f` is the entry point for user interaction. The typical approach is to apply a Gaussian-decaying velocity impulse and ink injection near the mouse position.
+
+**Code**:
+```glsl
+// Mouse interaction — drag to inject velocity and ink
+if (iMouse.z > 0.0) {
+    vec2 mousePos = iMouse.xy;
+    vec2 mouseDelta = iMouse.xy - iMouse.zw;  // drag direction
+
+    float dist = length(p - mousePos);
+    float influence = exp(-dist * dist / 50.0);  // adjustable: 50.0 controls influence radius
+
+    c.xy += DT * influence * mouseDelta;  // inject velocity
+    c.w  += DT * influence;                // inject ink
+}
+```
+
+### Step 8: Boundary Conditions and Numerical Stability
+
+**What**: Handle boundary pixels, clamp numerical ranges, and apply dissipation.
+
+**Why**: Without boundary conditions, the fluid "leaks" off-screen; without dissipation, fluid energy accumulates indefinitely, causing numerical explosion.
+
+**Code**:
+```glsl
+// Boundary condition: zero velocity at edge pixels (no-slip)
+if (p.x < 1.0 || p.y < 1.0 ||
+    iResolution.x - p.x < 1.0 || iResolution.y - p.y < 1.0) {
+    c.xyw *= 0.0;
+}
+
+// IMPORTANT: Ink decay: must use multiplicative decay; subtractive decay causes saturation in high-concentration areas and overly fast decay in low-concentration areas
+c.w *= 0.995;  // 0.5% decay per frame, adjustable [0.99=fast dissipation, 0.999=persistent]
+
+// Numerical clamping (prevent explosion)
+c = clamp(c, vec4(-5, -5, 0.5, 0), vec4(5, 5, 3, 5));
+```
+
+### Step 9: Visualization Rendering (Image Pass)
+
+**What**: Map physical quantities from the buffer to visible colors.
+
+**Why**: Raw physical data (velocity, pressure) needs artistic color mapping to produce visual effects. Common techniques include: mapping velocity direction to hue, pressure to brightness, and overlaying ink concentration.
+
+**Code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec4 c = texture(iChannel0, uv);
+
+    // IMPORTANT: Color base must be bright enough! 0.5+0.5*cos produces bright colors in [0,1] range
+    // Never use extremely dark base colors like vec3(0.02, 0.01, 0.08) — multiplied by ink, they become invisible
+    vec3 col = 0.5 + 0.5 * cos(atan(c.y, c.x) + vec3(0.0, 2.1, 4.2));
+    // IMPORTANT: Use smoothstep instead of linear division to preserve gradient variation
+    float ink = smoothstep(0.0, 2.0, c.w);
+    col *= ink;
+
+    // IMPORTANT: Background color must be visible to the eye (RGB at least > 5/255 ≈ 0.02), otherwise users think the page is all black
+    col = max(col, vec3(0.02, 0.012, 0.035));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Rotational Self-Advection
+
+**Difference from base version**: Instead of pressure projection, uses multi-scale rotational sampling to achieve natural divergence-free advection. Simpler computation, suitable for purely decorative fluid effects.
+
+**Core idea**: Compute local rotation (curl) at different scales, then use rotationally offset sampling positions for advection.
+
+**Key code**:
+```glsl
+#define RotNum 3           // adjustable: rotational sample count [3-7], more = more precise
+#define angRnd 1.0         // adjustable: rotational randomness [0-1]
+
+const float ang = 2.0 * 3.14159 / float(RotNum);
+mat2 m = mat2(cos(ang), sin(ang), -sin(ang), cos(ang));
+
+// Compute rotation amount at a given scale
+float getRot(vec2 uv, float sc) {
+    float ang2 = angRnd * randS(uv).x * ang;
+    vec2 p = vec2(cos(ang2), sin(ang2));
+    float rot = 0.0;
+    for (int i = 0; i < RotNum; i++) {
+        vec2 p2 = p * sc;
+        vec2 v = texture(iChannel0, fract(uv + p2)).xy - vec2(0.5);
+        rot += cross(vec3(v, 0.0), vec3(p2, 0.0)).z / dot(p2, p2);
+        p = m * p;
+    }
+    return rot / float(RotNum);
+}
+
+// Main loop: multi-scale advection accumulation
+vec2 v = vec2(0);
+float sc = 1.0 / max(iResolution.x, iResolution.y);
+for (int level = 0; level < 20; level++) {
+    if (sc > 0.7) break;
+    vec2 p = vec2(cos(ang2), sin(ang2));
+    for (int i = 0; i < RotNum; i++) {
+        vec2 p2 = p * sc;
+        float rot = getRot(uv + p2, sc);
+        v += p2.yx * rot * vec2(-1, 1);
+        p = m * p;
+    }
+    sc *= 2.0;  // next scale
+}
+fragColor = texture(iChannel0, fract(uv + v * 3.0 / iResolution.x));
+```
+
+### Variant 2: Vorticity Confinement
+
+**Difference from base version**: Adds vorticity confinement force on top of the base solver to prevent small vortices from dissipating too quickly due to numerical diffusion. Suitable for smoke, fire, and other scenes that need rich detail.
+
+**Core idea**: Compute the gradient direction of the vorticity field (the direction where vorticity concentrates), then apply a restoring force along that direction.
+
+**Key code**:
+```glsl
+#define VORT_STRENGTH 0.01  // adjustable: vorticity confinement strength [0.001 - 0.1]
+
+// Compute gradient of vorticity magnitude (points toward increasing vorticity)
+float curl_c = curl_at(uv);                    // current vorticity
+float curl_n = abs(curl_at(uv + vec2(0, texel.y)));
+float curl_s = abs(curl_at(uv - vec2(0, texel.y)));
+float curl_e = abs(curl_at(uv + vec2(texel.x, 0)));
+float curl_w = abs(curl_at(uv - vec2(texel.x, 0)));
+
+vec2 eta = normalize(vec2(curl_e - curl_w, curl_n - curl_s) + 1e-5);
+
+// Vorticity confinement force = ε * (η × ω)
+vec2 conf = VORT_STRENGTH * vec2(eta.y, -eta.x) * curl_c;
+c.xy += DT * conf;
+```
+
+### Variant 3: Viscous Fingering / Reaction-Diffusion Style
+
+**Difference from base version**: No advection; instead uses rotation-driven self-amplification and Laplacian diffusion to produce organic patterns resembling reaction-diffusion. Suitable for abstract art generation.
+
+**Core idea**: Compute a rotation angle from curl, apply 2D rotation to velocity components, and combine with Laplacian diffusion and divergence feedback.
+
+**Key code**:
+```glsl
+const float cs = 0.25;   // adjustable: curl → rotation angle scaling
+const float ls = 0.24;   // adjustable: Laplacian diffusion strength
+const float ps = -0.06;  // adjustable: divergence-pressure feedback strength
+const float amp = 1.0;   // adjustable: self-amplification coefficient (>1 enhances patterns)
+const float pwr = 0.2;   // adjustable: curl exponent (controls rotation sensitivity)
+
+// Compute rotation angle from curl
+float sc = cs * sign(curl) * pow(abs(curl), pwr);
+
+// Temporary velocity (with diffusion and divergence feedback)
+float ta = amp * uv.x + ls * lapl.x + norm.x * sp + uv.x * sd;
+float tb = amp * uv.y + ls * lapl.y + norm.y * sp + uv.y * sd;
+
+// Rotate velocity components
+float a = ta * cos(sc) - tb * sin(sc);
+float b = ta * sin(sc) + tb * cos(sc);
+
+fragColor = clamp(vec4(a, b, div, 1), -1.0, 1.0);
+```
+
+### Variant 4: Gaussian Kernel SPH Particle Fluid
+
+**Difference from base version**: Completely abandons grid advection, instead using Gaussian kernel functions to estimate density and velocity at each grid point. Minimal (about 20 lines of core code), suitable for rapid prototyping and teaching.
+
+**Core idea**: For all pixels in the neighborhood, perform mass-weighted velocity blending using Gaussian weights based on velocity + displacement. This is essentially a grid-based approximation of SPH.
+
+**Key code**:
+```glsl
+#define RADIUS 7    // adjustable: search radius [3-10], larger = slower but smoother
+
+vec4 r = vec4(0);
+for (vec2 i = vec2(-RADIUS); ++i.x < float(RADIUS);)
+    for (i.y = -float(RADIUS); ++i.y < float(RADIUS);) {
+        vec2 v = texelFetch(iChannel0, ivec2(i + fragCoord), 0).xy;  // neighbor velocity
+        float mass = texelFetch(iChannel0, ivec2(i + fragCoord), 0).z; // neighbor mass
+        float w = exp(-dot(v + i, v + i)) / 3.14;  // Gaussian kernel weight
+        r += mass * w * vec4(mix(v + v + i, v, mass), 1, 1);
+    }
+r.xy /= r.z + 1e-6;  // mass-weighted average velocity
+```
+
+### Variant 5: Lagrangian Vortex Particle Method
+
+**Difference from base version**: Instead of solving on a grid, tracks discrete vortex particles with their positions and vorticities. Uses the Biot-Savart law to compute the velocity field directly from the vorticity distribution. Suitable for precise simulation of a small number of vortices.
+
+**Core idea**: Each particle carries a position and vorticity. Induced velocity is computed through N-body summation. Uses Heun (semi-implicit) time integration for improved accuracy.
+
+**Key code**:
+```glsl
+#define N 20                     // adjustable: N×N particles
+#define STRENGTH 1e3 * 0.25      // adjustable: vorticity strength scaling
+
+// Biot-Savart velocity computation (similar to 2D vortex 1/r decay)
+vec2 F = vec2(0);
+for (int j = 0; j < N; j++)
+    for (int i = 0; i < N; i++) {
+        float w = vorticity(i, j);          // particle vorticity
+        vec2 d = particle_pos(i, j) - my_pos;
+        float l = dot(d, d);
+        if (l > 1e-5)
+            F += vec2(-d.y, d.x) * w / l;  // Biot-Savart: v = ω × r / |r|²
+    }
+velocity = STRENGTH * F;
+position += velocity * dt;
+```
+
+## Performance Optimization Details
+
+### Bottleneck 1: Neighborhood Sample Count
+- The basic 5-point stencil (cross) is fastest but has poor isotropy
+- A 3x3 stencil (9 samples) is the best balance between accuracy and performance
+- The `N×N` search radius in the SPH variant is extremely expensive; anything above 7 becomes slow
+- **Optimization**: Use `texelFetch` instead of `texture` (skips filtering), or use `textureLod` to lock the mip level
+
+### Bottleneck 2: Multi-Pass Overhead
+- Classic solvers need 2-4 buffer passes (velocity, pressure, vorticity, visualization)
+- **Optimization**: Merge multiple steps into a single pass. Pressure projection can leverage inter-frame feedback as implicit Jacobi iteration, eliminating the need for dedicated iteration passes
+- For decorative effects that don't require strict incompressibility, rotational self-advection (Variant 1) can completely eliminate pressure projection
+
+### Bottleneck 3: Advection Accuracy vs. Performance
+- Single-step advection loses detail in high-velocity regions
+- **Optimization**: Multi-step advection (`ADVECTION_STEPS = 3`) uses 3 small steps instead of 1 large step, at the cost of 3x the sampling
+- Compromise: pre-compute offsets then uniformly subdivide sampling (avoid recalculating offsets at each step)
+
+### Bottleneck 4: Mipmap as Alternative to Multi-Scale Traversal
+- Multi-scale fluid requires computation at different spatial scales. The brute-force approach is multiple large-radius samples
+- **Optimization**: Leverage GPU-generated mipmaps for O(1) multi-scale reads, using `textureLod(channel, uv, mip)` to directly read at different scales
+
+### General Tips
+- Add tiny noise on the initial frame (`1e-6 * noise`) to avoid symmetry lock caused by numerical precision issues
+- Use `fract(uv + offset)` for periodic boundaries (torus topology), eliminating boundary check branches
+- Multiply the pressure field by a near-1 decay factor (e.g., `0.9999`) to prevent pressure drift
+
+## Combination Suggestions
+
+### 1. Fluid + Normal Map Lighting
+Treat the fluid velocity/density field as a height map, compute normals, and apply Phong/GGX lighting to produce a liquid metal visual effect.
+```glsl
+// Compute normals from the density field
+vec2 dxy = vec2(
+    texture(buf, uv + vec2(tx, 0)).z - texture(buf, uv - vec2(tx, 0)).z,
+    texture(buf, uv + vec2(0, ty)).z - texture(buf, uv - vec2(0, ty)).z
+);
+vec3 normal = normalize(vec3(-BUMP * dxy, 1.0));
+// Then plug into Phong/GGX lighting calculation
+```
+
+### 2. Fluid + Particle Tracing
+Scatter passive particles in the fluid velocity field, updating particle positions each frame according to the flow velocity. Suitable for visualizing streamlines and creating ink diffusion effects.
+```glsl
+// Particle position update (in a separate buffer)
+vec2 pos = texture(particleBuf, id).xy;
+vec2 vel = texture(fluidBuf, pos / iResolution.xy).xy;
+pos += vel * dt;
+pos = mod(pos, iResolution.xy);  // periodic boundary
+```
+
+### 3. Fluid + Color Advection
+Store RGB colors in extra channels or buffers and perform semi-Lagrangian advection synchronized with the velocity field, producing colorful ink mixing effects.
+
+### 4. Fluid + Audio Response
+Map audio spectrum low-frequency energy to force intensity and high frequencies to vorticity injection, creating music-driven fluid visualization.
+```glsl
+float bass = texture(iChannel1, vec2(0.05, 0.0)).x;   // low frequency
+float treble = texture(iChannel1, vec2(0.8, 0.0)).x;   // high frequency
+// Low frequency → thrust, high frequency → vortex disturbance
+c.xy += bass * radialForce + treble * randomVortex;
+```
+
+### 5. Fluid + 3D Volume Rendering
+Extend 2D fluid to 3D (using 2D texture slice packing to store 3D voxels) and render semi-transparent volumes via ray marching. Suitable for clouds and explosion effects.
--- a/skills/shader-dev/reference/fractal-rendering.md
+++ b/skills/shader-dev/reference/fractal-rendering.md
@@ -0,0 +1,525 @@
+# Fractal Rendering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, mathematical derivations, variant descriptions, in-depth performance analysis, and complete combination example code.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform, varying, built-in functions (`dot`, `length`, `normalize`, `abs`, `fract`)
+- **Complex Number Arithmetic**: representing complex numbers as `vec2`, multiplication `(a+bi)(c+di) = (ac-bd, ad+bc)`
+- **Vector Math**: dot product, cross product, matrix transforms
+- **Ray Marching Basics** (required for 3D fractals): stepping along a ray, using distance fields for collision detection
+- **Coordinate Normalization**: mapping pixel coordinates to the `[-1, 1]` range
+
+## Core Principles in Detail
+
+The essence of fractal rendering is **visualization of iterative systems**. Core algorithm patterns fall into three categories:
+
+### 1. Escape-Time Algorithm
+
+For each point `c` on the complex plane, repeatedly iterate `Z <- Z^2 + c`, counting the number of steps needed for Z to escape (`|Z| > R`). More steps means closer to the fractal boundary.
+
+**Distance Estimation** computes the precise distance from a point to the fractal by simultaneously tracking the derivative `Z'`:
+```
+Z  <- Z^2 + c       (value iteration)
+Z' <- 2*Z*Z' + 1    (derivative iteration)
+d(c) = |Z|*log|Z| / |Z'|  (Hubbard-Douady potential function)
+```
+Distance estimation produces smoother coloring than pure escape-time step counting, and is a prerequisite for ray marching in 3D fractals.
+
+### 2. Iterated Function System (IFS)
+
+Apply a set of transforms (folding `abs()`, scaling `Scale`, offset `Offset`) to points in space, iterating repeatedly to produce self-similar structures. Core steps of KIFS (Kaleidoscopic IFS) commonly used in 3D:
+```
+p = abs(p)                          // Fold (symmetrize)
+sort p.xyz descending               // Sort (select symmetry axis)
+p = Scale * p - Offset * (Scale-1)  // Scale and offset
+```
+
+### 3. Spherical Inversion Fractal
+
+Apollonian-type fractals use `fract()` for space folding + spherical inversion `p *= s/dot(p,p)`:
+```
+p = -1.0 + 2.0 * fract(0.5*p + 0.5)   // Fold space to [-1,1]
+r^2 = dot(p, p)
+k = s / r^2                             // Inversion factor
+p *= k; scale *= k                       // Spherical inversion
+```
+
+All 3D fractals are rendered using **Sphere Tracing (Ray Marching)**: stepping along the view ray by the distance field value at each step, until close enough to the surface.
+
+## Implementation Steps in Detail
+
+### Step 1: Coordinate Normalization
+
+**What**: Map pixel coordinates to standard coordinates centered on the screen with aspect ratio correction.
+
+**Why**: All fractal calculations must be performed in mathematical space, independent of pixel resolution.
+
+```glsl
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+// p now has y range [-1,1], x scaled by aspect ratio
+```
+
+### Step 2: 2D Fractal — Mandelbrot Escape-Time Iteration
+
+**What**: For each pixel point as complex number `c`, iterate `Z <- Z^2 + c` while tracking the derivative.
+
+**Why**: Escape time produces fractal structure; derivative tracking enables distance estimation coloring.
+
+```glsl
+float distanceToMandelbrot(in vec2 c) {
+    vec2 z  = vec2(0.0);
+    vec2 dz = vec2(0.0);  // Derivative
+    float m2 = 0.0;
+
+    for (int i = 0; i < MAX_ITER; i++) {
+        if (m2 > BAILOUT * BAILOUT) break;
+
+        // Z' -> 2*Z*Z' + 1 (complex derivative chain rule)
+        dz = 2.0 * vec2(z.x*dz.x - z.y*dz.y,
+                         z.x*dz.y + z.y*dz.x) + vec2(1.0, 0.0);
+
+        // Z -> Z^2 + c (complex squaring)
+        z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
+
+        m2 = dot(z, z);
+    }
+
+    // Distance estimation: d(c) = |Z|*log|Z| / |Z'|
+    return 0.5 * sqrt(dot(z,z) / dot(dz,dz)) * log(dot(z,z));
+}
+```
+
+### Step 3: 3D Fractal — Distance Field Function (Mandelbulb Example)
+
+**What**: Implement the Mandelbulb power-N iteration using spherical coordinates, returning a distance estimate.
+
+**Why**: 3D fractals cannot be directly colored via escape-time on pixels; they require distance fields for ray marching.
+
+```glsl
+float mandelbulb(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0;  // Derivative (distance scaling factor)
+    float r;
+
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > BAILOUT) break;
+
+        // Convert to spherical coordinates
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+
+        // Derivative: dr -> power * r^(power-1) * dr + 1
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+
+        // z -> z^power + p (spherical coordinate exponentiation)
+        r = pow(r, POWER);
+        theta *= POWER;
+        phi *= POWER;
+        z = r * vec3(cos(theta)*cos(phi),
+                      sin(theta)*cos(phi),
+                      sin(phi)) + p;
+    }
+
+    // Distance estimation
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### Step 4: 3D Fractal — IFS Distance Field (Menger Sponge Example)
+
+**What**: Construct a KIFS fractal distance field through fold-sort-scale-offset iteration.
+
+**Why**: IFS fractals produce self-similar structures through spatial transforms rather than numerical iteration; distance is tracked via `Scale^(-n)` scaling.
+
+```glsl
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));  // Infinite tiling
+    float d = 1000.0;
+
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);                              // Fold
+        if (z.x < z.y) z.xy = z.yx;             // Sort
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0); // Scale + offset
+        if (z.z < -0.5 * OFFSET.z * (SCALE - 1.0))
+            z.z += OFFSET.z * (SCALE - 1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n) - 1.0));
+    }
+
+    return d - 0.001;
+}
+```
+
+### Step 5: 3D Fractal — Spherical Inversion Distance Field (Apollonian Type)
+
+**What**: Construct an Apollonian fractal using fract folding + spherical inversion iteration, while recording orbit traps.
+
+**Why**: Spherical inversion `p *= s/dot(p,p)` produces sphere packing structures; orbit traps provide color and AO information.
+
+```glsl
+vec4 orb;  // Global orbit trap
+
+float apollonianDE(vec3 p, float s) {
+    float scale = 1.0;
+    orb = vec4(1000.0);
+
+    for (int i = 0; i < INVERSION_ITER; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5);  // Fold space to [-1,1]
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));          // Record orbit trap
+        float k = s / r2;                          // Inversion factor
+        p *= k;
+        scale *= k;
+    }
+
+    return 0.25 * abs(p.y) / scale;
+}
+```
+
+### Step 6: Ray Marching (Sphere Tracing)
+
+**What**: Step along the ray direction, advancing by the distance field value at each step, until hitting the surface.
+
+**Why**: The distance field guarantees safe stepping (won't pass through the surface), and is the standard method for rendering implicit 3D fractals.
+
+```glsl
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.01;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        float precis = PRECISION * t;  // Relax precision with distance
+        float h = map(ro + rd * t);
+        if (h < precis || t > MAX_DIST) break;
+        t += h * FUDGE_FACTOR;         // fudge < 1.0 improves safety
+    }
+    return (t > MAX_DIST) ? -1.0 : t;
+}
+```
+
+### Step 7: Normal Calculation (Finite Differences)
+
+**What**: Sample the distance field gradient around the hit point as the surface normal.
+
+**Why**: Implicit surfaces have no analytical normals and require numerical approximation. Tetrahedral sampling (4-tap) saves 1/3 of the cost compared to central differences (6-tap).
+
+```glsl
+// 6-tap central difference method (more intuitive)
+vec3 calcNormal_6tap(vec3 pos) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(pos + e.xyy) - map(pos - e.xyy),
+        map(pos + e.yxy) - map(pos - e.yxy),
+        map(pos + e.yyx) - map(pos - e.yyx)));
+}
+
+// 4-tap tetrahedral method (more efficient, recommended)
+vec3 calcNormal_4tap(vec3 pos, float t) {
+    float precis = 0.001 * t;
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx));
+}
+```
+
+### Step 8: Shading and Lighting
+
+**What**: Compute Lambertian diffuse + ambient + AO for hit surfaces.
+
+**Why**: Lighting gives 3D fractals depth and material quality. Orbit trap values (`orb`) can serve both as color mapping and as simple AO.
+
+```glsl
+vec3 shade(vec3 pos, vec3 nor, vec3 rd, vec4 trap) {
+    vec3 light1 = normalize(LIGHT_DIR);
+    float diff = clamp(dot(light1, nor), 0.0, 1.0);
+    float amb  = 0.7 + 0.3 * nor.y;
+    float ao   = pow(clamp(trap.w * 2.0, 0.0, 1.0), 1.2); // Orbit trap AO
+
+    vec3 brdf = vec3(0.4) * amb * ao      // Ambient
+              + vec3(1.0) * diff * ao;     // Diffuse
+
+    // Map material color from orbit trap
+    vec3 rgb = vec3(1.0);
+    rgb = mix(rgb, vec3(1.0, 0.8, 0.2), clamp(6.0*trap.y, 0.0, 1.0));
+    rgb = mix(rgb, vec3(1.0, 0.55, 0.0), pow(clamp(1.0-2.0*trap.z, 0.0, 1.0), 8.0));
+
+    return rgb * brdf;
+}
+```
+
+### Step 9: Camera Setup
+
+**What**: Build a look-at camera matrix, converting pixel coordinates to 3D ray directions.
+
+**Why**: All 3D fractal ray marching requires a unified camera framework to generate rays.
+
+```glsl
+void setupCamera(vec2 uv, vec3 ro, vec3 ta, float cr,
+                 out vec3 rd) {
+    vec3 cw = normalize(ta - ro);                   // forward
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);          // roll
+    vec3 cu = normalize(cross(cw, cp));              // right
+    vec3 cv = normalize(cross(cu, cw));              // up
+    rd = normalize(uv.x * cu + uv.y * cv + 2.0 * cw); // FOV ~ 2.0
+}
+```
+
+## Common Variants in Detail
+
+### 1. 2D Mandelbrot (Distance Estimation Coloring)
+
+Difference from base version (3D Apollonian): pure 2D computation, no ray marching needed, uses complex iteration + distance coloring.
+
+```glsl
+// Replace entire mainImage
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0*fragCoord - iResolution.xy) / iResolution.y;
+
+    // Animated zoom
+    float tz = 0.5 - 0.5*cos(0.225*iTime);
+    float zoo = pow(0.5, 13.0*tz);
+    vec2 c = vec2(-0.05, 0.6805) + p * zoo; // Tunable: zoom center point
+
+    // Iteration
+    vec2 z = vec2(0.0), dz = vec2(0.0);
+    for (int i = 0; i < 300; i++) { // Tunable: iteration count
+        if (dot(z,z) > 1024.0) break;
+        dz = 2.0*vec2(z.x*dz.x-z.y*dz.y, z.x*dz.y+z.y*dz.x) + vec2(1.0,0.0);
+        z  = vec2(z.x*z.x-z.y*z.y, 2.0*z.x*z.y) + c;
+    }
+
+    float d = 0.5*sqrt(dot(z,z)/dot(dz,dz))*log(dot(z,z));
+    d = clamp(pow(4.0*d/zoo, 0.2), 0.0, 1.0); // Tunable: 0.2 controls contrast
+    fragColor = vec4(vec3(d), 1.0);
+}
+```
+
+### 2. Mandelbulb Power-N (3D Spherical Coordinate Fractal)
+
+Difference from base version: uses spherical coordinate trigonometric functions instead of spherical inversion, with a tunable `POWER` parameter controlling the fractal shape.
+
+```glsl
+#define POWER 8.0   // Tunable: 2-16, higher = more complex structure
+#define FRACTAL_ITER 4  // Tunable: 2-8, more = more detail
+
+float mandelbulbDE(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0;
+    float r;
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > 2.0) break;
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+        r = pow(r, POWER);
+        theta *= POWER;
+        phi   *= POWER;
+        z = r * vec3(cos(theta)*cos(phi), sin(theta)*cos(phi), sin(phi)) + p;
+    }
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### 3. Menger Sponge (KIFS Folding Type)
+
+Difference from base version: uses abs() folding + conditional sorting instead of spherical inversion, producing regular geometric fractals.
+
+```glsl
+#define SCALE 3.0                           // Tunable: scaling factor, 2.0-4.0
+#define OFFSET vec3(0.92858,0.92858,0.32858) // Tunable: offset vector, changes shape
+#define IFS_ITER 7                          // Tunable: iteration count
+
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));  // Infinite tiling
+    float d = 1000.0;
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);
+        if (z.x < z.y) z.xy = z.yx;    // Conditional sorting
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0);
+        if (z.z < -0.5*OFFSET.z*(SCALE-1.0))
+            z.z += OFFSET.z*(SCALE-1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n)-1.0));
+    }
+    return d - 0.001;
+}
+```
+
+### 4. Quaternion Julia Set
+
+Difference from base version: uses quaternion algebra `Z <- Z^2 + c` (4D), Julia sets use a fixed `c` parameter instead of per-point `c`, visualized by taking 3D cross-sections.
+
+```glsl
+// Quaternion squaring
+vec4 qsqr(vec4 a) {
+    return vec4(a.x*a.x - a.y*a.y - a.z*a.z - a.w*a.w,
+                2.0*a.x*a.y, 2.0*a.x*a.z, 2.0*a.x*a.w);
+}
+
+float juliaDE(vec3 p, vec4 c) {
+    vec4 z = vec4(p, 0.0);
+    float md2 = 1.0;
+    float mz2 = dot(z, z);
+
+    for (int i = 0; i < 11; i++) { // Tunable: iteration count
+        md2 *= 4.0 * mz2;         // |dz| -> 2*|z|*|dz|
+        z = qsqr(z) + c;          // z -> z^2 + c
+        mz2 = dot(z, z);
+        if (mz2 > 4.0) break;
+    }
+
+    return 0.25 * sqrt(mz2 / md2) * log(mz2);
+}
+// Animated Julia parameter c:
+// vec4 c = 0.45*cos(vec4(0.5,3.9,1.4,1.1) + time*vec4(1.2,1.7,1.3,2.5)) - vec4(0.3,0,0,0);
+```
+
+### 5. Minimal IFS Field (2D, No Ray Marching)
+
+Difference from base version: pure 2D implementation, only ~20 lines of code, using `abs(p)/dot(p,p) + offset` for iteration, producing a density field through weighted accumulation.
+
+```glsl
+float field(vec3 p) {
+    float strength = 7.0 + 0.03 * log(1.e-6 + fract(sin(iTime) * 4373.11));
+    float accum = 0.0, prev = 0.0, tw = 0.0;
+    for (int i = 0; i < 32; ++i) {  // Tunable: iteration count
+        float mag = dot(p, p);
+        p = abs(p) / mag + vec3(-0.5, -0.4, -1.5); // Tunable: offset values change shape
+        float w = exp(-float(i) / 7.0);             // Tunable: 7.0 controls decay
+        accum += w * exp(-strength * pow(abs(mag - prev), 2.3));
+        tw += w;
+        prev = mag;
+    }
+    return max(0.0, 5.0 * accum / tw - 0.7);
+}
+// Sample field() directly on fragCoord as brightness/color
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The core bottleneck in fractal rendering is **nested loops**: outer ray marching steps x inner fractal iterations. A single pixel may execute `200 steps x 8 iterations = 1600` distance field evaluations.
+
+### Optimization Techniques
+
+#### 1. Reduce Ray Marching Steps
+Lower `MAX_STEPS` from 200 to 60-100, compensating precision loss with a fudge factor (0.7-0.9).
+```glsl
+t += h * 0.7; // Fudge factor < 1.0, allows larger steps but reduces penetration risk
+```
+
+#### 2. Adaptive Precision
+Relax the collision threshold as distance increases; far objects don't need pixel-level precision.
+```glsl
+float precis = 0.001 * t; // Precision grows linearly with distance
+```
+
+#### 3. Early Exit
+In fractal iteration, break immediately once `|z|^2 > bailout`.
+```glsl
+if (m2 > 4.0) break; // Don't continue useless iterations
+```
+
+#### 4. Reduce Iteration Count
+Fractal iteration counts (`INVERSION_ITER`, `IFS_ITER`) reduced from 8 to 4-5 have minimal visual impact but significant performance gains.
+
+#### 5. Use 4-Tap Instead of 6-Tap for Normals
+The tetrahedral method requires only 4 `map()` calls instead of 6, saving 33% normal computation cost.
+
+#### 6. AA Downgrade
+Use `#define AA 1` during development, switch to `AA 2` for release. `AA 3` has massive performance impact (9x overhead).
+
+#### 7. Distance Field Scaling
+For non-unit-sized fractals, scale the space first then scale the distance value to avoid precision issues.
+```glsl
+float z1 = 2.0;
+return mandelbulb(p / z1) * z1;
+```
+
+#### 8. Avoid `pow()` Inside Loops
+`pow(r, power)` in Mandelbulb is expensive; low powers (e.g., 2, 3) can be manually expanded instead.
+
+## Combination Suggestions
+
+### 1. Fractal + Volumetric Lighting
+
+Accumulate scattered light passing through fractal gaps during ray marching, producing "god rays" effects.
+
+```glsl
+// Accumulate additionally in ray march loop
+float glow = 0.0;
+for (...) {
+    float h = map(ro + rd*t);
+    glow += exp(-10.0 * h); // Closer to surface = larger contribution
+    ...
+}
+col += glowColor * glow * 0.01;
+```
+
+### 2. Fractal + Post-Processing (Tone Mapping / FXAA)
+
+3D fractals have rich high-frequency detail, prone to aliasing. Use ACES Tone Mapping + sRGB correction + FXAA post-processing.
+
+```glsl
+// ACES tone mapping
+vec3 aces_approx(vec3 v) {
+    v = max(v, 0.0) * 0.6;
+    float a=2.51, b=0.03, c=2.43, d=0.59, e=0.14;
+    return clamp((v*(a*v+b))/(v*(c*v+d)+e), 0.0, 1.0);
+}
+col = aces_approx(col);
+col = pow(col, vec3(1.0/2.4)); // sRGB gamma
+```
+
+### 3. Fractal + Transparent Refraction (Multi-Bounce Refraction)
+
+Used for "crystal ball" effects on volumetric fractals like Mandelbulb. Uses negative distance fields for reverse ray marching inside, combined with Beer's law absorption.
+
+```glsl
+// Invert distance field for interior stepping
+float dfactor = isInside ? -1.0 : 1.0;
+float d = dfactor * map(ro + rd * t);
+// Beer's law light absorption
+ragg *= exp(-st * beer); // beer = negative color vector
+// Refraction direction
+vec3 refr = refract(rd, sn, isInside ? 1.0/ior : ior);
+```
+
+### 4. Fractal + Orbit Trap Texture Mapping
+
+Orbit trap values can be mapped to HSV color space for rich coloring, or mapped as self-emission for glowing fractal effects.
+
+```glsl
+vec3 hsv2rgb(vec3 c) {
+    vec4 K = vec4(1.0, 2.0/3.0, 1.0/3.0, 3.0);
+    vec3 p = abs(fract(c.xxx + K.xyz) * 6.0 - K.www);
+    return c.z * mix(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y);
+}
+// Map orbit trap to HSV
+vec3 col = hsv2rgb(vec3(trap.x * 0.5, 0.9, 0.8));
+```
+
+### 5. Fractal + Soft Shadow
+
+Perform an additional ray march from the fractal surface toward the light source, accumulating the minimum `h/t` ratio to generate soft shadows.
+
+```glsl
+float softshadow(vec3 ro, vec3 rd, float mint, float k) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 64; i++) {
+        float h = map(ro + rd*t);
+        res = min(res, k * h / t); // Larger k = harder shadows
+        if (res < 0.001) break;
+        t += clamp(h, 0.01, 0.5);
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
--- a/skills/shader-dev/reference/lighting-model.md
+++ b/skills/shader-dev/reference/lighting-model.md
@@ -0,0 +1,639 @@
+# Lighting Models Detailed Reference
+
+This document is a detailed supplementary reference to [SKILL.md](SKILL.md), covering prerequisite knowledge, in-depth explanations for each step, complete descriptions of variants, performance optimization analysis, and full code examples for combination suggestions.
+
+---
+
+## Prerequisites
+
+### Vector Math Fundamentals
+- **Dot product**: `dot(A, B) = |A||B|cos(θ)`, used to compute the angular relationship between two vectors. Lighting models heavily use dot products such as N·L, N·V, N·H, V·H
+- **Cross product**: `cross(A, B)` returns a vector perpendicular to both A and B, used to build camera coordinate systems and tangent spaces
+- **normalize**: Scales a vector to unit length; lighting calculations require all direction vectors to be normalized
+- **reflect**: `reflect(I, N) = I - 2.0 * dot(N, I) * N`, computes the reflection of incident vector I about normal N
+
+### GLSL Fundamentals
+- **uniform / varying**: uniforms are global constants (e.g., iTime, iResolution); varyings are interpolated from vertex to fragment
+- **Key built-in functions**:
+  - `clamp(x, min, max)` — clamp to range
+  - `mix(a, b, t)` — linear interpolation `a*(1-t) + b*t`
+  - `pow(base, exp)` — exponentiation, used for specular falloff
+  - `exp(x)` / `exp2(x)` — exponential functions, used for attenuation and Beer's Law
+  - `smoothstep(edge0, edge1, x)` — Hermite smooth interpolation
+
+### Basic Computer Graphics Concepts
+- **Normal (N)**: Unit vector pointing outward from the surface, determines lighting intensity
+- **View Direction (V)**: Unit vector from the surface point toward the camera
+- **Light Direction (L)**: Unit vector from the surface point toward the light source
+- **Half Vector (H)**: `normalize(V + L)`, the core of the Blinn-Phong model
+- **Reflect Vector (R)**: `reflect(-L, N)`, used in the classic Phong model
+
+### Raymarching Basics (Recommended)
+- **SDF (Signed Distance Function)**: Returns the signed distance from a point to the nearest surface
+- **Normal computation (finite differences)**: Approximates the gradient (i.e., normal direction) by computing small-offset differences of the SDF along the x, y, and z axes
+- **March**: Advances along the ray direction by the distance returned by the SDF until hitting a surface or exceeding the range
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Scene Foundation (UV, Camera, Raymarching)
+
+**What**: Establish the standard ShaderToy framework — UV coordinates, camera ray, SDF scene, normal computation.
+
+**Why**: Lighting calculations require normal N, view direction V, and light direction L as inputs, all of which depend on scene geometry. Without correct normals and direction vectors, no lighting model can work.
+
+**Details**:
+- UV coordinates are typically normalized as `(2.0 * fragCoord - iResolution.xy) / iResolution.y` to ensure correct aspect ratio
+- The camera uses a look-at matrix: forward direction `ww`, right direction `uu`, up direction `vv`
+- SDF normals use six-point central difference, which is more accurate than forward difference
+- The epsilon value in `e = vec2(0.001, 0.0)` affects normal accuracy: too large blurs details, too small introduces noise
+
+**Code**:
+```glsl
+// Compute normal from SDF scene (finite differences) — standard technique
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// Prepare basic vectors needed for lighting
+vec3 N = calcNormal(pos);           // Surface normal
+vec3 V = -rd;                        // View direction (reverse of ray)
+vec3 L = normalize(lightPos - pos);  // Light direction (point light)
+// Or directional light: vec3 L = normalize(vec3(0.6, 0.8, -0.5));
+```
+
+### Step 2: Lambert Diffuse
+
+**What**: Compute basic diffuse lighting — the foundation of all lighting models.
+
+**Why**: Lambert's law describes the ideal diffuse behavior of rough surfaces — brightness is proportional to cos(angle of incidence). This is the most fundamental physically-based lighting model, assuming light enters the surface and is scattered uniformly.
+
+**Details**:
+- `max(0.0, dot(N, L))` uses `max(0,...)` to avoid negative values (backface lighting)
+- Energy-conserving Lambertian diffuse requires dividing by PI, since Lambert BRDF = albedo/PI and the integrated irradiance = PI * L_incoming
+- Half-Lambert (`NdotL * 0.5 + 0.5`) is a technique invented by Valve that maps [-1,1] to [0,1], giving backlit areas some brightness; commonly used for character rendering and SSS approximation
+- Many ocean shaders use a similar wrapped diffuse pattern
+
+**Code**:
+```glsl
+// Basic Lambert diffuse
+float NdotL = max(0.0, dot(N, L));
+vec3 diffuse = albedo * lightColor * NdotL;
+
+// Energy-conserving version (albedo/PI)
+vec3 diffuse_conserved = albedo / PI * lightColor * NdotL;
+
+// Half-Lambert variant (wrapped dot product)
+// Reduces over-darkening on backlit faces, commonly used for SSS approximation
+float halfLambert = NdotL * 0.5 + 0.5;
+vec3 diffuse_wrapped = albedo * lightColor * halfLambert;
+```
+
+### Step 3: Blinn-Phong Specular
+
+**What**: Add specular highlights based on the half vector.
+
+**Why**: Blinn-Phong is more computationally efficient and physically plausible than classic Phong. The half vector H is the average direction of V and L; the highlight is brightest when H aligns with N. Blinn-Phong also behaves more realistically at grazing angles compared to Phong.
+
+**Details**:
+- Half vector H = normalize(V + L), which avoids the reflect computation needed by Phong's reflect(-L, N)
+- Shininess controls highlight concentration: 4.0 gives a very rough surface feel, 256.0 approaches a mirror
+- The normalization factor `(shininess + 8.0) / (8.0 * PI)` ensures total reflected energy remains constant when changing shininess (energy conservation)
+- Based on the standard half vector method used in many raymarching shaders
+
+**Code**:
+```glsl
+// Blinn-Phong specular (standard half vector method)
+vec3 H = normalize(V + L);
+float NdotH = max(0.0, dot(N, H));
+
+// Empirical model: directly use shininess exponent
+float SHININESS = 32.0;  // Adjustable: 4.0 (rough) ~ 256.0 (mirror-like)
+float spec = pow(NdotH, SHININESS);
+
+// With energy-conserving normalization factor
+// Normalization factor (s+8)/(8*PI) ensures total energy is preserved when changing shininess
+float normFactor = (SHININESS + 8.0) / (8.0 * PI);
+float spec_normalized = normFactor * pow(NdotH, SHININESS);
+
+vec3 specular = lightColor * spec_normalized;
+```
+
+### Step 4: Fresnel-Schlick Approximation
+
+**What**: Compute reflectance based on viewing angle — reflectance increases at grazing angles ("edge brightening" effect).
+
+**Why**: All real materials approach 100% reflectance at grazing angles. This is a fundamental physical phenomenon (Fresnel effect). The Schlick approximation uses a fifth-power curve to simulate this, and is a core component of all PBR pipelines. This is a ubiquitous formula in real-time rendering.
+
+**Details**:
+- F0 is the reflectance at normal incidence (looking straight at the surface)
+- Dielectrics (plastic, water, etc.): F0 is approximately 0.02~0.04; most light is scattered (diffuse)
+- Metals: F0 uses the material's baseColor, since metals have virtually no diffuse reflection
+- `mix(vec3(0.04), baseColor, metallic)` is the unified metallic workflow, interpolating between dielectrics and metals
+- Using V·H for the Cook-Torrance BRDF specular term
+- Using N·V for environment reflections, rim lighting, etc.
+- A widely used approximation in both real-time and offline rendering pipelines.
+
+**Code**:
+```glsl
+// Fresnel-Schlick approximation (standard formulation)
+vec3 fresnelSchlick(vec3 F0, float cosTheta) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+// Dielectrics (plastic, water, etc.): F0 approximately 0.02~0.04
+vec3 F0_dielectric = vec3(0.04);
+
+// Metals: F0 uses the material's baseColor
+vec3 F0_metal = baseColor;
+
+// Unified metallic workflow
+vec3 F0 = mix(vec3(0.04), baseColor, metallic);
+
+// Compute Fresnel using V·H (for specular BRDF)
+float VdotH = max(0.0, dot(V, H));
+vec3 F = fresnelSchlick(F0, VdotH);
+
+// Alternatively, compute Fresnel using N·V (for environment reflections, rim light)
+// Optional: pow(fGloss, 20.0) factor for gloss adjustment
+float NdotV = max(0.0, dot(N, V));
+vec3 F_env = F0 + (1.0 - F0) * pow(1.0 - NdotV, 5.0);
+```
+
+### Step 5: GGX Normal Distribution Function (D Term)
+
+**What**: Compute the probability distribution of microfacet normals aligning with the half vector.
+
+**Why**: The GGX (Trowbridge-Reitz) distribution has a wider "long tail" highlight, closer to real materials than the Beckmann distribution. This is the core term in PBR pipelines that determines highlight shape and size. This is the standard GGX formula used across PBR implementations.
+
+**Details**:
+- Roughness must be squared first (`a = roughness * roughness`); this is Disney's mapping from perceptual roughness to alpha
+- `a2 = a * a` is the alpha^2 term in the GGX formula
+- When roughness = 0.0, D approaches a delta function (perfect mirror); when roughness = 1.0, it approaches a uniform distribution
+- The denominator `PI * denom * denom` ensures the distribution function integrates to 1 over the hemisphere
+- The standard GGX formula used across PBR implementations
+
+**Code**:
+```glsl
+// GGX/Trowbridge-Reitz normal distribution function (standard formulation)
+float distributionGGX(float NdotH, float roughness) {
+    float a = roughness * roughness;  // Note: roughness must be squared first!
+    float a2 = a * a;
+    float denom = NdotH * NdotH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * denom * denom);
+}
+
+// Roughness parameter guide:
+// roughness = 0.0 → perfect mirror (D approaches delta function)
+// roughness = 0.5 → medium roughness
+// roughness = 1.0 → fully rough (D approaches uniform distribution)
+```
+
+### Step 6: Geometric Occlusion Function (G Term)
+
+**What**: Compute the mutual shadowing and masking between microfacets.
+
+**Why**: Not all correctly-oriented microfacets can be "seen" by both the light and the view simultaneously — the G term corrects for this occlusion loss. The microfacet model assumes the surface is composed of countless tiny flat surfaces that can occlude each other (shadowing and masking).
+
+**Details**:
+- The Smith method decomposes G into two independent terms for the light direction (G1_L) and view direction (G1_V)
+- **Schlick-GGX**: `k = (roughness+1)^2 / 8` for direct lighting, `k = roughness^2 / 2` for IBL
+- **Height-Correlated Smith**: More physically accurate, accounts for height correlation of microfacets; directly returns the visibility term `G/(4*NdotV*NdotL)`
+- **Simplified approximation** (G1V): Most compact implementation, suitable for code golf or extremely performance-constrained scenarios
+- Three common implementations with different accuracy/performance tradeoffs
+
+**Code**:
+```glsl
+// Smith method: decompose G into two independent G1 terms for light and view directions
+
+// Method 1: Schlick-GGX (separated implementation)
+// The clearest pedagogical implementation
+float geometrySchlickGGX(float NdotV, float roughness) {
+    float r = roughness + 1.0;
+    float k = (r * r) / 8.0;  // For direct lighting: k = (r+1)^2/8
+    return NdotV / (NdotV * (1.0 - k) + k);
+}
+
+float geometrySmith(float NdotV, float NdotL, float roughness) {
+    float ggx1 = geometrySchlickGGX(NdotV, roughness);
+    float ggx2 = geometrySchlickGGX(NdotL, roughness);
+    return ggx1 * ggx2;
+}
+
+// Method 2: Height-Correlated Smith (visibility term form)
+// More physically accurate, directly returns G/(4*NdotV*NdotL), i.e., the "visibility term"
+float visibilitySmith(float NdotV, float NdotL, float roughness) {
+    float a2 = roughness * roughness;
+    float gv = NdotL * sqrt(NdotV * (NdotV - NdotV * a2) + a2);
+    float gl = NdotV * sqrt(NdotL * (NdotL - NdotL * a2) + a2);
+    return 0.5 / max(gv + gl, 0.00001);
+}
+
+// Method 3: Simplified approximation (compact G1V helper)
+// Most compact implementation
+float G1V(float dotNV, float k) {
+    return 1.0 / (dotNV * (1.0 - k) + k);
+}
+// Usage: float vis = G1V(NdotL, k) * G1V(NdotV, k); where k = roughness/2
+```
+
+### Step 7: Assembling the Cook-Torrance BRDF
+
+**What**: Combine the D, F, and G terms into a complete specular reflection BRDF.
+
+**Why**: The Cook-Torrance microfacet model is currently the most widely used physically-based specular reflection model in real-time rendering. It is based on microfacet theory, modeling the surface as countless tiny perfect mirrors.
+
+**Details**:
+- Full formula: `f_specular = D * F * G / (4 * NdotV * NdotL)`
+- When using `visibilitySmith` (which returns `G/(4*NdotV*NdotL)`), there is no need to manually divide by the denominator
+- When using the standard `geometrySmith` (which returns G), you must explicitly divide by `4 * NdotV * NdotL`
+- `max(4.0 * NdotV * NdotL, 0.001)` prevents division by zero
+- Based on the standard Cook-Torrance BRDF formulation
+
+**Code**:
+```glsl
+// Complete Cook-Torrance BRDF assembly
+// Standard Cook-Torrance BRDF assembly
+vec3 cookTorranceBRDF(vec3 N, vec3 V, vec3 L, float roughness, vec3 F0) {
+    vec3 H = normalize(V + L);
+
+    float NdotL = max(0.0, dot(N, L));
+    float NdotV = max(0.0, dot(N, V));
+    float NdotH = max(0.0, dot(N, H));
+    float VdotH = max(0.0, dot(V, H));
+
+    // D: Normal distribution
+    float D = distributionGGX(NdotH, roughness);
+
+    // F: Fresnel
+    vec3 F = fresnelSchlick(F0, VdotH);
+
+    // G: Geometric occlusion (using visibility term form, which includes the 4*NdotV*NdotL denominator)
+    float Vis = visibilitySmith(NdotV, NdotL, roughness);
+
+    // Assembly (Vis version already divides by 4*NdotV*NdotL)
+    vec3 specular = D * F * Vis;
+
+    // Or using the standard G term form:
+    // float G = geometrySmith(NdotV, NdotL, roughness);
+    // vec3 specular = (D * F * G) / max(4.0 * NdotV * NdotL, 0.001);
+
+    return specular * NdotL;
+}
+```
+
+### Step 8: Multi-Light Accumulation and Final Compositing
+
+**What**: Blend diffuse and specular reflections with energy conservation, and accumulate contributions from multiple lights.
+
+**Why**: Real scenes contain multiple light sources (sun, sky, ground bounce, etc.). Energy conservation must be maintained between diffuse and specular: energy that has been reflected (F) should not participate in diffuse reflection.
+
+**Details**:
+- `kD = (1.0 - F) * (1.0 - metallic)` implements energy conservation:
+  - `(1.0 - F)` ensures already-reflected light does not participate in diffuse
+  - `(1.0 - metallic)` ensures metals have no diffuse (metals' free electrons absorb all refracted light)
+- Sky light uses `0.5 + 0.5 * N.y` to approximate hemisphere integration — the more upward the normal, the brighter
+- Back/rim light uses wrapped diffuse from the opposite direction of the sun to provide fill lighting
+- Based on multi-light architecture patterns common in PBR raymarching shaders
+
+**Code**:
+```glsl
+// Complete multi-light PBR lighting accumulation
+// Multi-light PBR architecture
+
+vec3 shade(vec3 pos, vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 diffuseColor = albedo * (1.0 - metallic);  // Metals have no diffuse
+    vec3 color = vec3(0.0);
+
+    // --- Main light (sun) ---
+    vec3 sunDir = normalize(vec3(0.6, 0.8, -0.5));
+    vec3 sunColor = vec3(1.0, 0.95, 0.85) * 2.0;
+
+    vec3 H = normalize(V + sunDir);
+    float NdotL = max(0.0, dot(N, sunDir));
+    float NdotV = max(0.0, dot(N, V));
+    float VdotH = max(0.0, dot(V, H));
+
+    vec3 F = fresnelSchlick(F0, VdotH);
+    vec3 kD = (1.0 - F) * (1.0 - metallic);  // Energy conservation
+
+    // Diffuse contribution
+    color += kD * diffuseColor / PI * sunColor * NdotL;
+    // Specular contribution
+    color += cookTorranceBRDF(N, V, sunDir, roughness, F0) * sunColor;
+
+    // --- Sky light (hemisphere light approximation) ---
+    // Sky light (hemisphere light approximation)
+    vec3 skyColor = vec3(0.2, 0.5, 1.0) * 0.3;
+    float skyDiffuse = 0.5 + 0.5 * N.y;  // Simple hemisphere integration approximation
+    color += diffuseColor * skyColor * skyDiffuse;
+
+    // --- Back light / rim light ---
+    // Back-light / fill light term
+    vec3 backDir = normalize(vec3(-sunDir.x, 0.0, -sunDir.z));
+    float backDiffuse = clamp(dot(N, backDir) * 0.5 + 0.5, 0.0, 1.0);
+    color += diffuseColor * vec3(0.25, 0.15, 0.1) * backDiffuse;
+
+    return color;
+}
+```
+
+### Step 9: Ambient Occlusion (AO)
+
+**What**: Approximate the reduction of indirect lighting in surface crevices due to geometric occlusion.
+
+**Why**: Scenes without AO appear overly "flat" and lack spatial depth. In raymarching scenes, the SDF can be used to efficiently compute AO — sample several points along the normal direction and compare the SDF distance with the ideal distance.
+
+**Details**:
+- Principle: Step gradually away from the surface along the normal, querying the SDF value at each sample point. If the SDF value is less than the sample distance h, nearby occluding geometry is present
+- `sca *= 0.95` gradually decreases the weight of farther sample points
+- The multiplier in `3.0 * occ` controls AO intensity (adjustable)
+- AO affects both diffuse and specular, but in different ways:
+  - Diffuse: multiply directly by the AO value
+  - Specular: use `pow(NdotV + ao, roughness^2) - 1 + ao` for more subtle attenuation
+- Based on the standard SDF ambient occlusion technique
+
+**Code**:
+```glsl
+// AO computation for raymarching scenes (standard SDF-based technique)
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// Using AO (AO affects both diffuse and specular)
+float ao = calcAO(pos, N);
+diffuseLight *= ao;
+// More subtle specular AO:
+specularLight *= clamp(pow(NdotV + ao, roughness * roughness) - 1.0 + ao, 0.0, 1.0);
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Classic Phong (Non-PBR)
+
+**Difference from base version**: Uses the reflection vector `R = reflect(-L, N)` instead of the half vector; no D/F/G decomposition.
+
+**Use cases**: Quick prototyping, retro-style rendering, performance-constrained scenarios. The Phong model has the lowest computational cost but does not satisfy energy conservation, and highlights disappear at grazing angles (the opposite of real materials).
+
+**Key code**:
+```glsl
+// Classic Phong reflection model
+vec3 R = reflect(-L, N);
+float spec = pow(max(0.0, dot(R, V)), 32.0);
+vec3 color = albedo * lightColor * NdotL    // diffuse
+           + lightColor * spec;              // specular
+```
+
+### Variant 2: Point Light Attenuation
+
+**Difference from base version**: Adds distance attenuation, suitable for point light / spotlight scenarios. The base version assumes directional light (sun), while point light intensity decreases with distance.
+
+**Use cases**: Indoor scenes, multiple point lights, close-range light effects.
+
+**Details**:
+- Physically correct attenuation should be `1/distance²`, but in practice `1/(1 + k1*d + k2*d²)` avoids infinite brightness at close range
+- k1 (linear attenuation): 0.01~0.5, k2 (quadratic attenuation): 0.001~0.1
+- Alternatively, use physical attenuation with a maximum intensity cap: `min(1.0/(d*d), maxIntensity)`
+
+**Key code**:
+```glsl
+// Point light attenuation (standard pattern)
+float dist = length(lightPos - pos);
+float attenuation = 1.0 / (1.0 + dist * 0.1 + dist * dist * 0.01);
+// k1: linear attenuation coefficient (adjustable 0.01~0.5)
+// k2: quadratic attenuation coefficient (adjustable 0.001~0.1)
+color *= attenuation;
+```
+
+### Variant 3: IBL (Image-Based Lighting)
+
+**Difference from base version**: Uses environment maps instead of analytic light sources, split into diffuse SH (spherical harmonics) and specular split-sum parts.
+
+**Use cases**: Scenes requiring realistic environmental lighting reflections. IBL can capture complex lighting environments (e.g., HDRI panoramas), producing very natural lighting effects.
+
+**Details**:
+- Diffuse IBL uses spherical harmonics (SH) to precompute the low-frequency component of environmental lighting
+- Specular IBL uses Epic Games' split-sum approximation: splits the BRDF integral into environment map LOD lookup + precomputed BRDF integration lookup table
+- `EnvBRDFApprox` is Unreal Engine 4's approximation, avoiding the need for a precomputed LUT texture
+- `textureLod(envMap, R, roughness * 7.0)` uses mipmap levels to simulate blurred reflections on rough surfaces
+- Based on the SH + EnvBRDFApprox method common in PBR pipelines
+
+**Key code**:
+```glsl
+// IBL approximation (SH + EnvBRDFApprox method)
+// Diffuse IBL: spherical harmonics
+vec3 diffuseIBL = diffuseColor * SHIrradiance(N);
+
+// Specular IBL: Unreal's EnvBRDFApprox approximation
+vec3 EnvBRDFApprox(vec3 specColor, float roughness, float NdotV) {
+    vec4 c0 = vec4(-1, -0.0275, -0.572, 0.022);
+    vec4 c1 = vec4(1, 0.0425, 1.04, -0.04);
+    vec4 r = roughness * c0 + c1;
+    float a004 = min(r.x * r.x, exp2(-9.28 * NdotV)) * r.x + r.y;
+    vec2 AB = vec2(-1.04, 1.04) * a004 + r.zw;
+    return specColor * AB.x + AB.y;
+}
+vec3 R = reflect(-V, N);
+vec3 envColor = textureLod(envMap, R, roughness * 7.0).rgb;
+vec3 specularIBL = EnvBRDFApprox(F0, roughness, NdotV) * envColor;
+```
+
+### Variant 4: Subsurface Scattering Approximation (SSS)
+
+**Difference from base version**: Simulates light passing through translucent materials (e.g., skin, wax, water surfaces).
+
+**Use cases**: Water surfaces, skin, candles, leaves, and other translucent materials. SSS makes thin parts appear brighter and more translucent.
+
+**Details**:
+- **Method 1 (SDF probing)**: Probes the SDF value along the light direction into the material interior. If the SDF value is much smaller than the probe distance, the material is thicker at that point and transmits less light; otherwise it transmits more
+- **Method 2 (Henyey-Greenstein phase function)**: Describes the directional distribution of light scattering in a medium. Parameter g controls forward/backward scattering: g > 0 for forward scattering (e.g., skin), g < 0 for backward scattering
+- Combines SDF-based interior probing with Henyey-Greenstein phase function
+
+**Key code**:
+```glsl
+// SSS approximation (SDF-based interior probing)
+// Method 1: SDF-based interior probing
+float subsurface(vec3 pos, vec3 L) {
+    float sss = 0.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.05 + float(i) * 0.1;
+        float d = map(pos + L * h);  // Probe along light direction into interior
+        sss += max(0.0, h - d);      // Thinner areas transmit more light
+    }
+    return clamp(1.0 - sss * 4.0, 0.0, 1.0);
+}
+
+// Method 2: Henyey-Greenstein phase function
+float HenyeyGreenstein(float cosTheta, float g) {
+    float g2 = g * g;
+    return (1.0 - g2) / (pow(1.0 + g2 - 2.0 * g * cosTheta, 1.5) * 4.0 * PI);
+}
+float sssAmount = HenyeyGreenstein(dot(V, L), 0.5);
+color += sssColor * sssAmount * NdotL;
+```
+
+### Variant 5: Beer's Law Water Lighting
+
+**Difference from base version**: Simulates the exponential attenuation of light in water/transparent media.
+
+**Use cases**: Water surfaces, underwater scenes, glass, juice, and other transparent/translucent media. The Beer-Lambert law describes the exponential decay of light intensity as it travels through a medium.
+
+**Details**:
+- `exp2(-opticalDepth * extinctColor)` implements wavelength-dependent exponential attenuation
+- Different color channels have different attenuation coefficients, producing the characteristic color of water (blue/green transmits the most)
+- In `extinctColor = 1.0 - vec3(0.5, 0.4, 0.1)`, the vec3 controls the absorption rate per channel
+- Inscattering simulates multiple scattering of light inside the water body, giving deep water its inherent color
+- `1.0 - exp(-depth * 0.1)` is a simplified inscattering model
+- Based on the Beer-Lambert law for wavelength-dependent attenuation
+
+**Key code**:
+```glsl
+// Beer's Law light attenuation
+vec3 waterExtinction(float depth) {
+    float opticalDepth = depth * 6.0;  // Adjustable: controls attenuation rate
+    vec3 extinctColor = 1.0 - vec3(0.5, 0.4, 0.1);  // Adjustable: water absorption color
+    return exp2(-opticalDepth * extinctColor);
+}
+
+// Usage: underwater object color multiplied by attenuation
+vec3 underwaterColor = objectColor * waterExtinction(depth);
+// Add water inscattering
+vec3 inscatter = waterDiffuse * (1.0 - exp(-depth * 0.1));
+underwaterColor += inscatter;
+```
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### 1. Avoiding the Cost of pow(x, 5.0)
+
+The `pow` function on some GPUs is implemented as `exp2(5.0 * log2(x))`, involving two transcendental functions. Manually unrolling into a multiplication chain is more efficient:
+
+```glsl
+// Efficient implementation of Schlick Fresnel
+float x = 1.0 - cosTheta;
+float x2 = x * x;
+float x5 = x2 * x2 * x;  // Faster than pow(x, 5.0)
+vec3 F = F0 + (1.0 - F0) * x5;
+```
+
+### 2. Merging G and the Denominator (Visibility Term)
+
+Using `V_SmithGGX` to directly return `G / (4 * NdotV * NdotL)` avoids computing G separately and then dividing. This not only eliminates one division but also avoids numerical instability when `4 * NdotV * NdotL` is near zero. The Height-Correlated Smith version is also more physically accurate.
+
+### 3. AO Sample Count
+
+- 5 samples are sufficient for most scenes
+- Distant objects can use as few as 3 (since details are not visible)
+- The upper bound of sample step h (`0.12 * i / 4.0`) controls the AO influence range: increasing it detects larger-scale occlusion but requires more samples
+- The decay rate `sca *= 0.95` is also adjustable: smaller values make AO more concentrated near the surface
+
+### 4. Soft Shadow Optimization
+
+- Using `clamp(h, 0.02, 0.2)` to limit step size: minimum step 0.02 prevents getting stuck near the surface, maximum step 0.2 prevents skipping thin geometry
+- Shadow ray maxSteps can be lower than the primary ray (14~24 steps is usually enough), since shadows don't need precise hit points
+- The 8.0 in `8.0 * h / t` controls shadow softness: higher values produce harder shadows, lower values softer ones. This is an intuitive penumbra size control
+
+### 5. Simplified IBL
+
+- Without a cubemap, use a simple sky color gradient as a substitute for environment mapping
+- `mix(groundColor, skyColor, R.y * 0.5 + 0.5)` is the cheapest "environment reflection"
+- A `pow(max(0, dot(R, sunDir)), 64.0)` in the sun direction can be added to simulate the sun's specular reflection
+
+### 6. Branch Culling
+
+When NdotL <= 0, the surface faces away from the light source, and all specular calculations (D, F, G) can be skipped:
+
+```glsl
+// Skip entire specular computation when NdotL <= 0
+if (NdotL > 0.0) {
+    // ... D, F, G computation ...
+}
+```
+
+Note: Branch efficiency on GPUs depends on the coherence of pixels within the same warp/wavefront. If large areas face away from the light, this branch is effective; if the branch condition switches frequently between adjacent pixels, it may actually be slower.
+
+---
+
+## Combination Suggestions in Detail
+
+### Lighting + Raymarching
+
+Raymarching scenes are the most common host for lighting models. Normals are obtained via SDF finite differences, and AO and shadows directly leverage SDF queries.
+
+Key integration points:
+- `calcNormal` provides normal N
+- `calcAO` leverages SDF for ambient occlusion
+- `softShadow` leverages SDF for soft shadows
+- Material IDs can be passed through the return value of the `map` function
+
+### Lighting + Volumetric Rendering
+
+Volumetric effects like clouds, smoke, and fog require Beer's Law attenuation and phase functions (e.g., Henyey-Greenstein). PBR surface lighting integrates naturally with volumetric cloud lighting.
+
+Key integration points:
+- Volumetric rendering uses ray marching to step through the volume
+- Each step accumulates density and applies Beer's Law attenuation
+- Lighting uses the Henyey-Greenstein phase function instead of a BRDF
+- The final result is alpha-blended with the surface rendering output
+
+### Lighting + Normal Maps / Procedural Normals
+
+Normals don't have to come from the SDF. Procedural normals generated by FBM noise (e.g., ocean wave normals, water surface normals) can be passed directly to lighting functions, producing rich surface detail.
+
+Key integration points:
+- Procedural normals work by perturbing the base normal: `N = normalize(N + perturbation)`
+- FBM noise frequency and amplitude control the coarseness and strength of detail
+- SDF normals and procedural normals can be combined for macro shape + micro detail
+
+### Lighting + Post-Processing
+
+Tone mapping and gamma correction are essential parts of a PBR pipeline. HDR lighting values must be mapped to the [0,1] LDR range for correct display:
+
+```glsl
+// ACES — currently the most popular tone mapping
+col = (col * (2.51 * col + 0.03)) / (col * (2.43 * col + 0.59) + 0.14);
+
+// Reinhard — simplest tone mapping
+col = col / (col + 1.0);
+
+// Gamma correction — convert from linear space to sRGB
+col = pow(col, vec3(1.0 / 2.2));
+```
+
+Note: All lighting calculations must be performed in linear space; gamma correction is only applied at final output.
+
+### Lighting + Reflections
+
+Multi-layer reflections or environment reflections query the scene again in the `reflect(rd, N)` direction, blending the reflected color into the final result weighted by Fresnel.
+
+```glsl
+// Basic reflection pattern
+vec3 R = reflect(rd, N);
+vec3 reflColor = traceScene(pos + N * 0.01, R);  // Offset to avoid self-intersection
+vec3 F = fresnelSchlick(F0, NdotV);
+color = mix(color, reflColor, F);
+```
+
+A common water surface rendering approach combines refraction + reflection + Fresnel blending:
+- Reflection direction `reflect(rd, N)` queries the sky/scene
+- Refraction direction `refract(rd, N, 1.0/1.33)` queries the underwater scene
+- Fresnel coefficient blends between reflection and refraction
--- a/skills/shader-dev/reference/matrix-transform.md
+++ b/skills/shader-dev/reference/matrix-transform.md
@@ -0,0 +1,535 @@
+# Matrix Transforms & Camera — Detailed Reference
+
+This document is the complete detailed version of [SKILL.md](SKILL.md), covering step-by-step tutorials, mathematical derivations, detailed explanations, and advanced usage.
+
+## Prerequisites
+
+- **Vector Fundamentals**: Meaning of `vec2/vec3/vec4`, dot product `dot()`, cross product `cross()`, `normalize()`
+- **Matrix Fundamentals**: Column-major storage of `mat2/mat3/mat4` in GLSL, semantics of matrix multiplication `m * v`
+- **Coordinate Systems**: NDC (Normalized Device Coordinates), screen-space to world-space mapping, aspect ratio correction
+- **Trigonometry**: Relationship between `sin()`/`cos()` and rotation
+- **ShaderToy Built-in Variables**: `iResolution`, `iTime`, `iMouse`, `fragCoord`
+
+## Core Principles
+
+The essence of matrix transforms is **coordinate system transformation**. In ShaderToy's ray marching pipeline, transformation matrices serve two key roles:
+
+1. **Camera Matrix**: Converts screen pixel coordinates to ray directions in world space (view-to-world)
+2. **Object Transform Matrix**: Converts sampling points from world space to the object's local space (world-to-local, i.e., "domain transform")
+
+### Key Mathematical Formulas
+
+**2D Rotation Matrix** (rotation by angle θ around the origin):
+
+```
+R(θ) = | cos θ  -sin θ |
+       | sin θ   cos θ |
+```
+
+**3D Single-Axis Rotation** (rotation around Y axis as example):
+
+```
+Ry(θ) = | cos θ   0   sin θ |
+        |   0     1     0   |
+        | -sin θ  0   cos θ |
+```
+
+**Rodrigues' Rotation Formula** (rotation by angle θ around arbitrary axis **k**):
+
+```
+R = cos θ · I + (1 - cos θ) · k⊗k + sin θ · K
+```
+where K is the skew-symmetric matrix of axis vector k.
+
+**LookAt Camera** (looking from eye toward target):
+
+```
+forward = normalize(target - eye)
+right   = normalize(cross(forward, worldUp))
+up      = cross(right, forward)
+viewMatrix = mat3(right, up, forward)
+```
+
+**Perspective Ray Generation**:
+
+```
+rayDir = normalize(camMatrix * vec3(uv, focalLength))
+```
+
+where `uv` is the aspect-ratio-corrected screen coordinate, and `focalLength` controls the field of view (larger values produce smaller FOV).
+
+## Implementation Steps
+
+### Step 1: Screen Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert pixel coordinates `fragCoord` to normalized UV coordinates centered at the screen center, with Y-axis pointing up and correct aspect ratio.
+
+**Why**: All subsequent ray generation depends on correctly normalized screen coordinates. Without aspect ratio correction, circles would become ellipses.
+
+**Code**:
+```glsl
+// Method A: range [-aspect, aspect] x [-1, 1] (most common)
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Method B: step-by-step approach (equivalent)
+vec2 uv = fragCoord / iResolution.xy * 2.0 - 1.0;
+uv.x *= iResolution.x / iResolution.y;
+```
+
+### Step 2: Building Rotation Matrices
+
+**What**: Choose the appropriate rotation matrix construction method based on requirements.
+
+**Why**: Rotation is the core of all 3D transforms. Different scenarios suit different rotation representations.
+
+**Method A: 2D Rotation (mat2)**
+
+The simplest form, commonly used for two-plane rotations in camera orbits:
+```glsl
+mat2 rot2D(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c); // Note GLSL column-major order
+}
+```
+
+**Method B: 3D Single-Axis Rotation (mat3)**
+
+Separate X/Y/Z axis rotation functions that can be freely combined:
+```glsl
+mat3 rotX(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(1, 0, 0,  0, c, s,  0, -s, c);
+}
+mat3 rotY(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, 0, s,  0, 1, 0,  -s, 0, c);
+}
+mat3 rotZ(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, s, 0,  -s, c, 0,  0, 0, 1);
+}
+```
+
+**Method C: Euler Angles to mat3**
+
+Build a complete rotation matrix from three angles (yaw/pitch/roll) in one step:
+```glsl
+mat3 fromEuler(vec3 ang) {
+    vec2 a1 = vec2(sin(ang.x), cos(ang.x));
+    vec2 a2 = vec2(sin(ang.y), cos(ang.y));
+    vec2 a3 = vec2(sin(ang.z), cos(ang.z));
+    mat3 m;
+    m[0] = vec3( a1.y*a3.y + a1.x*a2.x*a3.x,
+                  a1.y*a2.x*a3.x + a3.y*a1.x,
+                 -a2.y*a3.x);
+    m[1] = vec3(-a2.y*a1.x, a1.y*a2.y, a2.x);
+    m[2] = vec3( a3.y*a1.x*a2.x + a1.y*a3.x,
+                  a1.x*a3.x - a1.y*a3.y*a2.x,
+                  a2.y*a3.y);
+    return m;
+}
+```
+
+**Method D: Rodrigues Arbitrary-Axis Rotation (mat3)**
+
+Rotation around any normalized axis, based on Rodrigues' formula:
+```glsl
+mat3 rotationMatrix(vec3 axis, float angle) {
+    axis = normalize(axis);
+    float s = sin(angle);
+    float c = cos(angle);
+    float oc = 1.0 - c;
+    return mat3(
+        oc*axis.x*axis.x + c,          oc*axis.x*axis.y - axis.z*s, oc*axis.z*axis.x + axis.y*s,
+        oc*axis.x*axis.y + axis.z*s,   oc*axis.y*axis.y + c,        oc*axis.y*axis.z - axis.x*s,
+        oc*axis.z*axis.x - axis.y*s,   oc*axis.y*axis.z + axis.x*s, oc*axis.z*axis.z + c
+    );
+}
+```
+
+### Step 3: Building a LookAt Camera
+
+**What**: Construct a view-to-world matrix from the camera position (eye) and look-at target (target).
+
+**Why**: LookAt is the most intuitive camera definition — just specify "where to stand" and "where to look", and the matrix automatically computes three orthogonal basis vectors.
+
+**Classic setCamera (mat3)**:
+```glsl
+// cr = camera roll, usually pass 0.0
+// Returns mat3 that transforms local ray direction to world space
+mat3 setCamera(in vec3 ro, in vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);                   // forward
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);           // world up with roll
+    vec3 cu = normalize(cross(cw, cp));               // right
+    vec3 cv = normalize(cross(cu, cw));               // up
+    return mat3(cu, cv, cw);
+}
+```
+
+**Gram-Schmidt Orthogonalization Version (mat3)**:
+
+Projects out the component of camUp along camDir to ensure strict orthogonality:
+```glsl
+vec3 camDir   = normalize(target - camPos);
+vec3 camUp    = normalize(camUp - dot(camDir, camUp) * camDir); // Gram-Schmidt
+vec3 camRight = normalize(cross(camDir, camUp));
+```
+
+**mat4 LookAt (with translation)**:
+
+Returns a 4x4 matrix with the camera world position stored in the 4th column. Suitable for scenarios requiring homogeneous coordinates:
+```glsl
+mat4 LookAt(vec3 pos, vec3 target, vec3 up) {
+    vec3 dir = normalize(target - pos);
+    vec3 x = normalize(cross(dir, up));
+    vec3 y = cross(x, dir);
+    return mat4(vec4(x, 0), vec4(y, 0), vec4(dir, 0), vec4(pos, 1));
+}
+```
+
+### Step 4: Generating Perspective Rays
+
+**What**: Transform normalized screen coordinates through the camera matrix into world-space ray directions.
+
+**Why**: Perspective projection simulates the near-large far-small effect by appending a fixed Z component (focal length) after the UV. Larger focal length means smaller FOV.
+
+**Method A: mat3 Camera + normalize**:
+```glsl
+// focalLength controls FOV: 1.0 ≈ 90°, 2.0 ≈ 53°, 4.0 ≈ 28°
+#define FOCAL_LENGTH 2.0 // Adjustable: focal length, larger = narrower FOV
+mat3 cam = setCamera(ro, ta, 0.0);
+vec3 rd = cam * normalize(vec3(uv, FOCAL_LENGTH));
+```
+
+**Method B: Manual Basis Vector Combination**:
+```glsl
+// FieldOfView controls ray divergence
+#define FOV 1.0 // Adjustable: field of view scale factor
+vec3 rd = normalize(camDir + (uv.x * camRight + uv.y * camUp) * FOV);
+```
+
+**Method C: mat4 Camera + Homogeneous Coordinates**:
+```glsl
+// Direction vectors use w=0, positions use w=1
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+### Step 5: Mouse-Interactive Camera
+
+**What**: Map `iMouse` input to camera orbit angles.
+
+**Why**: An interactive camera is a fundamental need for debugging and showcasing 3D shaders. Mapping mouse X to horizontal rotation and Y to pitch angle is the most universal pattern.
+
+**Spherical Coordinate Orbit Camera**:
+```glsl
+#define CAM_DIST 5.0     // Adjustable: camera-to-origin distance
+#define CAM_HEIGHT 1.0   // Adjustable: default height offset
+
+vec2 mouse = iMouse.xy / iResolution.xy;
+float angleH = mouse.x * 6.2832;         // Horizontal: 0 ~ 2π
+float angleV = mouse.y * 3.1416 - 1.5708; // Vertical: -π/2 ~ π/2
+
+// Use auto-rotation when mouse is not clicked
+if (iMouse.z <= 0.0) {
+    angleH = iTime * 0.5;
+    angleV = 0.3;
+}
+
+vec3 ro = vec3(
+    CAM_DIST * cos(angleH) * cos(angleV),
+    CAM_DIST * sin(angleV) + CAM_HEIGHT,
+    CAM_DIST * sin(angleH) * cos(angleV)
+);
+vec3 ta = vec3(0.0, 0.0, 0.0); // Look-at target
+```
+
+**Euler Angle Driven Camera**:
+```glsl
+vec3 ang = vec3(0.0, 0.2, iTime * 0.3); // Default animation
+if (iMouse.z > 0.0) {
+    ang = vec3(0.0, clamp(2.0 - iMouse.y * 0.01, 0.0, 3.1416), iMouse.x * 0.01);
+}
+mat3 rot = fromEuler(ang);
+vec3 ori = vec3(0.0, 0.0, 2.8) * rot;
+vec3 dir = normalize(vec3(uv, -2.0)) * rot;
+```
+
+### Step 6: SDF Object Domain Transforms (Translation, Rotation, Scaling)
+
+**What**: In the ray marching distance function, apply inverse transforms to sampling points to achieve object translation/rotation/scaling.
+
+**Why**: The SDF domain transform principle is "transform the space, not the object" — inversely transforming the sampling point into the object's local coordinate system to evaluate distance is equivalent to transforming the object itself.
+
+**Basic Transforms**:
+```glsl
+// ===== Translation: offset the sampling point =====
+float sdTranslated = sdSphere(p - vec3(2.0, 0.0, 0.0), 1.0);
+
+// ===== Rotation: transform sampling point with rotation matrix =====
+// Note: for orthogonal matrices (rotations), inverse = transpose
+float sdRotated = sdBox(rotY(0.5) * p, vec3(1.0));
+
+// ===== Scaling: divide by scale factor, multiply back into distance =====
+#define SCALE 2.0 // Adjustable: object scale factor
+float sdScaled = sdSphere(p / SCALE, 1.0) * SCALE;
+```
+
+**SRT Combination (Scale → Rotate → Translate)**:
+
+mat4 version, using opTx for domain transform:
+```glsl
+mat4 Loc4(vec3 d) {
+    d *= -1.0;
+    return mat4(1,0,0,d.x, 0,1,0,d.y, 0,0,1,d.z, 0,0,0,1);
+}
+
+mat4 transposeM4(in mat4 m) {
+    return mat4(
+        vec4(m[0].x, m[1].x, m[2].x, m[3].x),
+        vec4(m[0].y, m[1].y, m[2].y, m[3].y),
+        vec4(m[0].z, m[1].z, m[2].z, m[3].z),
+        vec4(m[0].w, m[1].w, m[2].w, m[3].w)
+    );
+}
+
+vec3 opTx(vec3 p, mat4 m) {
+    return (transposeM4(m) * vec4(p, 1.0)).xyz;
+}
+
+// Usage example: translate to (3,0,0), then rotate 45° around Y axis
+mat4 xform = Rot4Y(0.785) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), vec3(1.0));
+```
+
+### Step 7: Quaternion Rotation (Advanced)
+
+**What**: Use quaternions for rotation around arbitrary axes, suitable for joint animation and other scenarios requiring frequent rotation composition.
+
+**Why**: Quaternions avoid gimbal lock, and interpolation (slerp) is more natural than matrices. The double cross product formula `p + 2·cross(q.xyz, cross(q.xyz, p) + q.w·p)` is the most computationally efficient quaternion rotation implementation.
+
+```glsl
+// Axis-angle → quaternion
+vec4 axisAngleToQuat(vec3 axis, float angleDeg) {
+    float half_angle = angleDeg * 3.14159265 / 360.0; // degrees to half-radians
+    vec2 sc = sin(vec2(half_angle, half_angle + 1.5707963));
+    return vec4(normalize(axis) * sc.x, sc.y);
+}
+
+// Quaternion rotation (double cross product form)
+vec3 quatRotate(vec3 pos, vec3 axis, float angleDeg) {
+    vec4 q = axisAngleToQuat(axis, angleDeg);
+    return pos + 2.0 * cross(q.xyz, cross(q.xyz, pos) + q.w * pos);
+}
+
+// Usage example: hierarchical rotation in joint animation
+vec3 limbPos = quatRotate(p - shoulderOffset, vec3(1,0,0), swingAngle);
+float d = sdEllipsoid(limbPos, limbSize);
+```
+
+## Variant Details
+
+### Variant 1: Orthographic Projection Camera
+
+**Difference from basic version**: Ray direction is fixed (parallel rays); different pixel sampling is achieved by changing the ray origin position. Suitable for 2D-style rendering, engineering drawings, isometric views.
+
+**Key modified code**:
+```glsl
+// Replace the perspective ray generation section
+#define ORTHO_SIZE 5.0 // Adjustable: orthographic view size
+
+mat3 cam = setCamera(ro, ta, 0.0);
+// Orthographic: offset origin, fixed direction
+vec3 rd = cam * vec3(0.0, 0.0, 1.0);  // Fixed direction
+ro += cam * vec3(uv * ORTHO_SIZE, 0.0); // Offset origin
+```
+
+### Variant 2: Full Euler Angle Rotation Camera
+
+**Difference from basic version**: Does not use LookAt; instead builds the rotation matrix directly from three Euler angles. Suitable for first-person perspective or scenarios requiring roll.
+
+**Key modified code**:
+```glsl
+mat3 fromEuler(vec3 ang) {
+    vec2 a1 = vec2(sin(ang.x), cos(ang.x));
+    vec2 a2 = vec2(sin(ang.y), cos(ang.y));
+    vec2 a3 = vec2(sin(ang.z), cos(ang.z));
+    mat3 m;
+    m[0] = vec3(a1.y*a3.y+a1.x*a2.x*a3.x, a1.y*a2.x*a3.x+a3.y*a1.x, -a2.y*a3.x);
+    m[1] = vec3(-a2.y*a1.x, a1.y*a2.y, a2.x);
+    m[2] = vec3(a3.y*a1.x*a2.x+a1.y*a3.x, a1.x*a3.x-a1.y*a3.y*a2.x, a2.y*a3.y);
+    return m;
+}
+
+// In mainImage:
+vec3 ang = vec3(pitch, yaw, roll);
+mat3 rot = fromEuler(ang);
+vec3 ori = vec3(0.0, 0.0, 3.0) * rot;
+vec3 rd = normalize(vec3(uv, -2.0)) * rot;
+```
+
+### Variant 3: Quaternion Joint Rotation
+
+**Difference from basic version**: Uses quaternions instead of matrices for rotation in domain transforms, suitable for hierarchical joint animation (multi-limbed biological systems).
+
+**Key modified code**:
+```glsl
+vec4 axisAngleToQuat(vec3 axis, float angleDeg) {
+    float ha = angleDeg * 3.14159265 / 360.0;
+    vec2 sc = sin(vec2(ha, ha + 1.5707963));
+    return vec4(normalize(axis) * sc.x, sc.y);
+}
+
+vec3 quatRotate(vec3 p, vec3 axis, float angleDeg) {
+    vec4 q = axisAngleToQuat(axis, angleDeg);
+    return p + 2.0 * cross(q.xyz, cross(q.xyz, p) + q.w * p);
+}
+
+// Usage in scene:
+vec3 legP = quatRotate(p - hipOffset, vec3(1,0,0), legAngle);
+float dLeg = sdEllipsoid(legP, vec3(0.2, 0.6, 0.25));
+```
+
+### Variant 4: mat4 SRT Pipeline (Full 4x4 Transform)
+
+**Difference from basic version**: Uses `mat4` homogeneous coordinates to combine scale-rotate-translate into a single matrix, applying `opTx()` domain transform to sampling points. Suitable for complex scenes requiring management of many object transforms.
+
+**Key modified code**:
+```glsl
+mat4 Rot4Y(float a) {
+    float c = cos(a), s = sin(a);
+    return mat4(c,0,s,0, 0,1,0,0, -s,0,c,0, 0,0,0,1);
+}
+
+mat4 Loc4(vec3 d) {
+    d *= -1.0;
+    return mat4(1,0,0,d.x, 0,1,0,d.y, 0,0,1,d.z, 0,0,0,1);
+}
+
+mat4 transposeM4(mat4 m) {
+    return mat4(
+        vec4(m[0].x,m[1].x,m[2].x,m[3].x),
+        vec4(m[0].y,m[1].y,m[2].y,m[3].y),
+        vec4(m[0].z,m[1].z,m[2].z,m[3].z),
+        vec4(m[0].w,m[1].w,m[2].w,m[3].w));
+}
+
+vec3 opTx(vec3 p, mat4 m) {
+    return (transposeM4(m) * vec4(p, 1.0)).xyz;
+}
+
+// Usage: translate then rotate (note matrix multiplication order is right-to-left)
+mat4 xform = Rot4Y(angle) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), boxSize);
+```
+
+### Variant 5: Path Camera (Animated Flight)
+
+**Difference from basic version**: The camera moves along a predefined path (e.g., tunnel, racetrack), using `LookAt` to track a forward target point. Common in tunnel-type shaders.
+
+**Key modified code**:
+```glsl
+// Path function (can be replaced with any curve)
+vec2 pathCenter(float z) {
+    return vec2(sin(z * 0.17) * 3.0, sin(z * 0.1 + 4.0) * 2.0);
+}
+
+// In mainImage:
+float z_offset = iTime * 10.0; // Speed
+vec3 camPos = vec3(pathCenter(z_offset), 0.0);
+vec3 camTarget = vec3(pathCenter(z_offset + 5.0), 5.0);
+vec3 camUp = vec3(sin(iTime * 0.3), cos(iTime * 0.3), 0.0);
+
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+## Performance Optimization Details
+
+### 1. Precompute Trigonometric Functions
+
+Compute `sin/cos` of the same angle only once, store in `vec2`:
+```glsl
+// Bad: sin/cos each called once
+mat2(cos(a), sin(a), -sin(a), cos(a));
+
+// Good: compute both with sincos in one step
+vec2 sc = sin(vec2(a, a + 1.5707963)); // sin(a), cos(a)
+mat2(sc.y, sc.x, -sc.x, sc.y);
+```
+
+### 2. Prefer mat3 Over mat4
+
+If translation is not needed (pure rotation), always use `mat3` instead of `mat4`. `mat3*vec3` requires 7 fewer multiply-add operations than `mat4*vec4`.
+
+### 3. Inverse of Rotation Matrix = Transpose
+
+Orthogonal rotation matrix R satisfies `R⁻¹ = Rᵀ`. When the inverse transform is needed, directly use `transpose(m)` or swap the multiplication order `v * m` (equivalent to `transpose(m) * v`), avoiding general matrix inversion.
+
+### 4. Avoid Rebuilding Matrices Inside the SDF
+
+If the rotation angle does not depend on the sampling point `p`, move matrix construction outside the `map()` function or cache it in a global variable:
+```glsl
+// Bad: rebuild matrix on every map() call
+float map(vec3 p) {
+    mat3 r = rotY(iTime); // Recomputed per pixel × per step
+    return sdBox(r * p, vec3(1.0));
+}
+
+// Good: precompute in mainImage
+mat3 g_rot; // Global
+void mainImage(...) {
+    g_rot = rotY(iTime); // Computed only once
+    // ... rayMarch ...
+}
+float map(vec3 p) {
+    return sdBox(g_rot * p, vec3(1.0));
+}
+```
+
+### 5. Merge Consecutive Rotations
+
+The product of multiple rotation matrices is still a rotation matrix. Pre-multiply and store as a single matrix:
+```glsl
+// Bad: two matrix multiplications per sample
+p = rotX(a) * (rotY(b) * p);
+
+// Good: pre-multiply
+mat3 combined = rotX(a) * rotY(b);
+p = combined * p;
+```
+
+## Combination Suggestions
+
+### Combining with Ray Marching / SDF (Most Common)
+
+Matrix transforms are almost always used together with SDF ray marching. The camera matrix generates rays, and domain transform matrices place objects. This is the foundational pipeline for all 3D ShaderToy shaders.
+
+### Combining with Noise / fBm
+
+Use rotation matrices to apply domain warping to noise sampling coordinates, breaking axis-aligned regularity:
+```glsl
+mat3 rot = rotAxis(vec3(0,0,1), 0.5 * iTime);
+float n = fbm(rot * p);  // Rotate noise sampling direction
+```
+Using time-varying rotation matrices makes water surface noise look more natural.
+
+### Combining with Fractals / IFS
+
+Add rotation transforms within each iteration of a fractal to create more complex geometric patterns:
+```glsl
+for (int i = 0; i < Iterations; i++) {
+    z.xy = rot2D(angle) * z.xy; // Rotate each iteration
+    z = abs(z);
+    z = Scale * z - Offset * (Scale - 1.0);
+}
+```
+Embedding `mat2` rotation within IFS iterations produces more complex fractal geometry.
+
+### Combining with Lighting / Materials
+
+After normal computation, transform matrices can be used to convert normals from local space back to world space (for lighting calculations). For pure rotation matrices, the normal transform is identical to the vertex transform.
+
+### Combining with Post-Processing
+
+Camera parameters (such as FOV) can be used for depth of field calculations; `mat2` rotation can be used for screen-space chromatic aberration or motion blur direction.
--- a/skills/shader-dev/reference/multipass-buffer.md
+++ b/skills/shader-dev/reference/multipass-buffer.md
@@ -0,0 +1,571 @@
+# Multi-Pass Buffer Techniques — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, in-depth explanations of each step, complete variant descriptions, performance optimization analysis, and full combination code examples.
+
+## Prerequisites
+
+### GLSL Fundamentals
+
+- GLSL basic syntax: `uniform`, `varying`, `sampler2D`
+- ShaderToy execution model: `iChannel0-3` texture inputs, `iResolution`, `iTime`, `iFrame`, `iMouse`
+- Difference between `texture()` and `texelFetch()`:
+  - `texture()` performs interpolated sampling (bilinear filtering), suitable for continuous field sampling
+  - `texelFetch()` reads a specific texel exactly, without interpolation, suitable for data storage reads
+- `textureLod()` is used for explicit MIP level sampling, avoiding the blur caused by automatic MIP selection
+- Buffer A/B/C/D concept in ShaderToy: each buffer is an independent render pass that outputs to a corresponding texture, which can be read by other passes or itself via iChannel
+
+### Basic Math
+
+- Basic vector math and matrix transforms
+- Finite difference method: using neighboring pixels to approximate gradients and the Laplacian operator
+- Iterative mapping: the concept of `x(n+1) = f(x(n))`, the mathematical basis for self-feedback
+
+## Implementation Steps
+
+### Step 1: Establish a Minimal Self-Feedback Loop
+
+**What**: Create a Buffer that reads its own previous frame output, adds new content, and outputs the result. The Image pass simply displays the Buffer result.
+
+**Why**: This is the cornerstone of all multi-pass techniques. Once you understand self-feedback loops, fluid simulation, temporal accumulation, etc. are all extensions of this foundation. An initialization guard (`iFrame == 0` or `iFrame < N`) prevents reading uninitialized data.
+
+**iChannel Binding**: Buffer A's iChannel0 → Buffer A (self-feedback); Image's iChannel0 → Buffer A
+
+**Key Points**:
+- `exp(-33.0 / iResolution.y)` controls the decay rate; higher values produce faster decay
+- The `fragCoord + vec2(1.0, sin(iTime))` offset creates motion effects
+- The `iFrame < 4` guard ensures stable initial values for the first few frames
+
+### Step 2: Implement Self-Advection
+
+**What**: Building on self-feedback, interpret the buffer values as a velocity field and implement self-advection — each pixel offsets its sampling position based on the local velocity.
+
+**Why**: Self-advection is the core of all Eulerian grid fluid simulations. By accumulating rotational information across multiple scales through rotational sampling, rich vortex structures can be produced without a complete Navier-Stokes solver.
+
+**Parameter Tuning**:
+- `ROT_NUM` (rotation sample count): Affects the sampling accuracy of the rotation field; 5 is a good balance
+- `SCALE_NUM` (number of scale levels): Affects the detail level of vortices; 20 levels produce rich multi-scale structures
+- `bbMax = 0.7 * iResolution.y`: Adaptive loop termination threshold
+
+**Mathematical Principles**:
+- The `getRot` function samples the velocity field at ROT_NUM equally spaced angular directions around a given position
+- Computes the rotational component via `dot(velocity - 0.5, perpendicular)`
+- The multi-scale loop `b *= 2.0` progressively enlarges the sampling radius, capturing vortices at different scales
+
+### Step 3: Navier-Stokes Fluid Solver
+
+**What**: Implement velocity field solving based on the paper "Simple and fast fluids" (Guay, Colin, Egli, 2011), including advection, viscous forces, and vorticity confinement.
+
+**Why**: More physically accurate than pure rotational self-advection, supporting low-viscosity fluid simulation (e.g., smoke, fire). Vorticity is stored in the alpha channel to avoid extra buffer overhead.
+
+**Complete `solveFluid` Function Breakdown**:
+
+```glsl
+vec4 solveFluid(sampler2D smp, vec2 uv, vec2 w, float time, vec3 mouse, vec3 lastMouse) {
+    const float K = 0.2;   // Pressure coefficient: controls the strength of the incompressibility constraint
+    const float v = 0.55;  // Viscosity coefficient: high value = viscous fluid, low value = thin fluid
+
+    // Read four neighboring pixels (basis for central differencing)
+    vec4 data = textureLod(smp, uv, 0.0);
+    vec4 tr = textureLod(smp, uv + vec2(w.x, 0), 0.0);
+    vec4 tl = textureLod(smp, uv - vec2(w.x, 0), 0.0);
+    vec4 tu = textureLod(smp, uv + vec2(0, w.y), 0.0);
+    vec4 td = textureLod(smp, uv - vec2(0, w.y), 0.0);
+
+    // Density and velocity gradients (central differencing)
+    vec3 dx = (tr.xyz - tl.xyz) * 0.5;  // x-direction gradient
+    vec3 dy = (tu.xyz - td.xyz) * 0.5;  // y-direction gradient
+    vec2 densDif = vec2(dx.z, dy.z);     // Density gradient
+
+    // Density update: continuity equation ∂ρ/∂t + ∇·(ρv) = 0
+    data.z -= DT * dot(vec3(densDif, dx.x + dy.y), data.xyz);
+
+    // Viscous force (Laplacian operator): μ∇²v
+    // Discrete Laplacian = up + down + left + right - 4*center
+    vec2 laplacian = tu.xy + td.xy + tr.xy + tl.xy - 4.0 * data.xy;
+    vec2 viscForce = vec2(v) * laplacian;
+
+    // Advection: Semi-Lagrangian backtrace method
+    // Trace backward from the current position along the reverse velocity direction, sample previous step's value
+    data.xyw = textureLod(smp, uv - DT * data.xy * w, 0.0).xyw;
+
+    // External forces (mouse interaction)
+    vec2 newForce = vec2(0);
+    if (mouse.z > 1.0 && lastMouse.z > 1.0) {
+        // Mouse movement velocity as force direction
+        vec2 vv = clamp((mouse.xy * w - lastMouse.xy * w) * 400.0, -6.0, 6.0);
+        // Force magnitude inversely proportional to distance from mouse (similar to a point charge field)
+        newForce += 0.001 / (dot(uv - mouse.xy * w, uv - mouse.xy * w) + 0.001) * vv;
+    }
+
+    // Velocity update: v += dt * (viscous force - pressure gradient + external forces)
+    data.xy += DT * (viscForce - K / DT * densDif + newForce);
+    // Linear decay: simulates energy dissipation
+    data.xy = max(vec2(0), abs(data.xy) - 1e-4) * sign(data.xy);
+
+    // Vorticity Confinement
+    // Compute curl = ∂vy/∂x - ∂vx/∂y
+    data.w = (tr.y - tl.y - tu.x + td.x);
+    // Vorticity gradient direction
+    vec2 vort = vec2(abs(tu.w) - abs(td.w), abs(tl.w) - abs(tr.w));
+    // Normalize then multiply by vorticity value to produce a force that enhances vortices
+    vort *= VORTICITY_AMOUNT / length(vort + 1e-9) * data.w;
+    data.xy += vort;
+
+    // Top/bottom boundaries: soft decay to avoid hard edges
+    data.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+    // Numerical stability: clamp extreme values
+    data = clamp(data, vec4(vec2(-10), 0.5, -10.0), vec4(vec2(10), 3.0, 10.0));
+
+    return data;
+}
+```
+
+**RGBA Channel Packing Strategy**:
+- `xy` = velocity components (vx, vy)
+- `z` = density
+- `w` = vorticity (curl)
+
+A single vec4 carries the complete fluid state without needing extra buffers.
+
+### Step 4: Chained Buffers for Accelerated Simulation
+
+**What**: Execute the same simulation code in a chain through Buffer A → B → C, completing multiple simulation sub-steps per frame.
+
+**Why**: Each ShaderToy buffer executes only once per frame. By chaining identical code (A reads itself → B reads A → C reads B), three iterations are completed in a single frame, significantly increasing simulation speed without adding buffer count. Use the Common tab to avoid code duplication.
+
+**iChannel Binding**:
+- Buffer A: iChannel0 → Buffer C (reads previous frame's final result)
+- Buffer B: iChannel0 → Buffer A (reads current frame's first step result)
+- Buffer C: iChannel0 → Buffer B (reads current frame's second step result)
+
+**Mouse State Inter-Frame Transfer**:
+- `if (fragCoord.y < 1.0) data = iMouse;` writes the current frame's mouse state into the first row of pixels
+- `texelFetch(iChannel0, ivec2(0, 0), 0)` reads the previous frame's mouse state in the next frame
+- The delta between two frames' mouse positions gives mouse velocity, used to calculate the direction and magnitude of applied forces
+
+### Step 5: Separable Gaussian Blur Pipeline
+
+**What**: Use two Buffers to implement horizontal and vertical separable Gaussian blur.
+
+**Why**: A 2D Gaussian kernel can be separated into the product of two 1D kernels. An NxN kernel drops from N² samples to 2N. This is the standard implementation for Bloom, the diffusion term in reaction-diffusion, and various post-processing blurs.
+
+**iChannel Binding**: Buffer B: iChannel0 → Buffer A (source); Buffer C: iChannel0 → Buffer B (horizontal blur result)
+
+**Vertical blur complete code** (horizontal version in SKILL.md; vertical version symmetrically replaces the y-axis):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 pixelSize = 1.0 / iResolution.xy;
+    vec2 uv = fragCoord * pixelSize;
+
+    float v = pixelSize.y;
+    vec4 sum = vec4(0.0);
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 4.0*v))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 3.0*v))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 2.0*v))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 1.0*v))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y         ))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 1.0*v))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 2.0*v))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 3.0*v))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 4.0*v))) * 0.05;
+
+    fragColor = vec4(sum.xyz / 0.98, 1.0);
+}
+```
+
+**9-tap Weight Explanation**:
+- Weights [0.05, 0.09, 0.12, 0.15, 0.16, 0.15, 0.12, 0.09, 0.05] approximate a Gaussian distribution with sigma≈2.0
+- Total sum is 0.98, divided by 0.98 for normalization
+- `fract()` implements wrap addressing
+
+### Step 6: Structured State Storage (Texel-Addressed Registers)
+
+**What**: Use specific pixels in a Buffer as named registers to store non-image data (positions, velocities, scores, etc.).
+
+**Why**: GPUs have no global variables. By assigning semantic meaning to specific texel positions, arbitrary structured state can be persisted in a buffer. This enables complete game logic, particle system state, etc. to be implemented in shaders.
+
+**Design Pattern Details**:
+
+1. **Address Constants**: Use `const ivec2` to define the texel address for each state variable
+2. **Load Function**: `texelFetch(iChannel0, addr, 0)` for exact reads (no interpolation)
+3. **Store Function**: Use conditional assignment `fragColor = (px == addr) ? val : fragColor`, ensuring each pixel only writes data belonging to its own address
+4. **Region Storage**: `ivec4 rect` defines rectangular regions for grid-like data (e.g., brick matrices)
+5. **Discard outside data region**: `if (fragCoord.x > 14.0 || fragCoord.y > 14.0) discard;` skips unnecessary computation
+
+**Notes**:
+- `ivec2(fragCoord - 0.5)` ensures correct integer texel coordinates (fragCoord's center offset)
+- Initialization must set all state values when `iFrame == 0`
+- Default behavior `fragColor = loadValue(px)` keeps unmodified state unchanged
+
+### Step 7: Inter-Frame Mouse State Tracking
+
+**What**: Store the mouse position in specific pixels of a Buffer, and compute mouse movement delta by reading the previous frame's value.
+
+**Why**: ShaderToy does not directly provide mouse velocity. Storing the current frame's `iMouse` in a fixed pixel allows calculating the delta in the next frame. This is critical for fluid interaction — mouse velocity is needed to apply forces.
+
+**Comparison of Two Methods**:
+
+| Feature | Method 1 (First Row Pixel) | Method 2 (Fixed UV Region) |
+|---------|---------------------------|---------------------------|
+| Source | Chimera's Breath | Reaction-Diffusion |
+| Storage Location | `fragCoord.y < 1.0` | Fixed UV coordinate |
+| Read Method | `texelFetch(ch, ivec2(0,0), 0)` | `texture(ch, vec2(7.5/8, 2.5/8))` |
+| Advantage | Simple, suitable for fluids | Resolution-independent |
+| Disadvantage | Occupies the first row of pixels | Requires extra buffer channel |
+
+## Variant Details
+
+### Variant 1: Temporal Accumulation Anti-Aliasing (TAA)
+
+**Difference from basic version**: The Buffer does not perform physics simulation, but instead renders a jittered image and blends it with history frames to achieve supersampling. Uses YCoCg color space neighborhood clamping to prevent ghosting.
+
+**How It Works**:
+1. Buffer A renders the scene with sub-pixel level random jitter
+2. New frames are blended with history frames at a 10:90 ratio, accumulating supersampling over time
+3. The TAA buffer performs YCoCg neighborhood clamping: constraining the history frame color to the statistical range of the current frame's 3x3 neighborhood
+4. A 0.75 sigma clamping range balances ghost removal and detail preservation
+
+**Complete TAA Flow**:
+```
+Buffer A (render+jitter) → Buffer B (motion vectors, optional) → Buffer C (TAA blend) → Image
+```
+
+### Variant 2: Deferred Rendering G-Buffer Pipeline
+
+**Difference from basic version**: Buffers do not use self-feedback, but instead process in stages within a single frame: geometry → edge detection → post-processing.
+
+**G-Buffer Encoding Scheme**:
+- `col.xy`: View-space normal xy components (multiplied by camMat to convert to screen space)
+- `col.z`: Linear depth (normalized to [0,1])
+- `col.w`: Diffuse lighting + shadow information
+
+**Edge Detection Principle**:
+- The `checkSame` function compares normal and depth differences between adjacent pixels
+- `Sensitivity.x` controls normal edge sensitivity
+- `Sensitivity.y` controls depth edge sensitivity
+- Threshold 0.1 determines the edge detection criterion
+
+### Variant 3: HDR Bloom Post-Processing Pipeline
+
+**Difference from basic version**: Uses Buffers to build a MIP pyramid, achieving wide-range glow through multiple levels of downsampling and blur.
+
+**MIP Pyramid Packing Strategy**:
+- All MIP levels are packed into a single texture
+- `CalcOffset` computes the offset position of each level within the texture
+- Each level is half the size, with padding to prevent inter-level leakage
+
+**Complete Bloom Pipeline**:
+```
+Buffer A (scene render) → Buffer B (MIP pyramid) → Buffer C (horizontal blur) → Buffer D (vertical blur) → Image (compositing)
+```
+
+**Tone Mapping**:
+```glsl
+// Reinhard tone mapping
+color = pow(color, vec3(1.5));  // Gamma preprocessing
+color = color / (1.0 + color);  // Reinhard compression
+```
+
+### Variant 4: Reaction-Diffusion System
+
+**Difference from basic version**: Simulates chemical reaction-diffusion (e.g., Gray-Scott model). Diffusion is implemented via separable blur, and the reaction term is computed in the main buffer.
+
+**Gray-Scott Equations**:
+- `∂u/∂t = Du∇²u - uv² + F(1-u)` — Diffusion and reaction of chemical substance u
+- `∂v/∂t = Dv∇²v + uv² - (F+k)v` — Diffusion and reaction of chemical substance v
+- `Du`, `Dv` are diffusion coefficients, `F` is the feed rate, `k` is the kill rate
+
+**Implementation Strategy**:
+- The diffusion term is implemented via separable blur buffers (reusing the blur pipeline from Step 5)
+- The reaction term is computed in the main buffer
+- The offset of `uv_red` implements diffusion expansion
+- Random noise decay prevents pattern stagnation
+
+### Variant 5: Multi-Scale MIP Fluid
+
+**Difference from basic version**: Uses `textureLod` to explicitly sample different MIP levels, achieving O(n) complexity multi-scale computation (turbulence, vorticity confinement, Poisson solving), with each physical quantity in its own buffer.
+
+**Core Advantage**:
+- Traditional multi-scale computation requires O(N²) samples (sampling N neighbors at each scale)
+- MIP sampling leverages hardware automatic averaging; a single `textureLod` at high MIP levels is equivalent to a large-range mean
+- Total complexity drops to O(NUM_SCALES × 9) (3x3 neighborhood per scale)
+
+**Weight Function Choices**:
+- `1.0/float(i+1)`: Logarithmic decay, reduces large-scale influence
+- `1.0/float(1<<i)`: Exponential decay, rapidly suppresses large scales
+- Constant: Equal weight for all scales
+
+## In-Depth Performance Optimization
+
+### 1. Reduce Texture Samples
+
+**Separable Blur**:
+- Principle: The 2D Gaussian function G(x,y) = G(x) × G(y) can be separated into two 1D convolutions
+- An NxN kernel drops from N² to 2N samples
+- 9-tap example: 81 → 18 samples
+
+**Bilinear Tap Trick**:
+```glsl
+// Standard 9-tap: requires 9 samples
+// Bilinear optimization: achieves equivalent results with 5 samples using hardware interpolation
+// Key: place sample points between two texels, GPU hardware automatically computes weighted average
+float offset1 = 1.0 + weight2 / (weight1 + weight2);  // Offset encodes weight ratio
+vec4 s1 = texture(smp, uv + vec2(offset1, 0) * texelSize);
+// s1 is automatically the weighted average of texel[1] and texel[2]
+```
+
+**MIP Sampling Replaces Large Kernels**:
+- `textureLod(smp, uv, 3.0)` samples MIP level 3, equivalent to an 8×8 area mean
+- A single sample replaces 64 samples
+- Suitable for coarse-scale approximation in multi-scale computation
+
+### 2. Limit Computation Region
+
+**Data Region Discard**:
+```glsl
+// In a state storage shader, only the first 14×14 pixels store data
+// Remaining pixels are discarded, GPU skips subsequent computation
+if (fragCoord.x > 14.0 || fragCoord.y > 14.0) discard;
+```
+
+**Soft Boundaries**:
+```glsl
+// Use smoothstep instead of if-statements
+// Avoids branch divergence (warp divergence), more efficient on GPU
+data.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+// Smoothly decays to 0 in the y=0.48~0.52 range
+```
+
+### 3. Reduce Buffer Count
+
+**RGBA Channel Packing**:
+| Channel | Fluid Simulation | G-Buffer | Particle System |
+|---------|-----------------|----------|----------------|
+| R | Velocity x | Normal x | Position x |
+| G | Velocity y | Normal y | Position y |
+| B | Density | Depth | Lifetime |
+| A | Vorticity | Diffuse | Type ID |
+
+**Chained Sub-Steps**:
+- 3 buffers running identical code = 3 iterations per frame
+- Equivalent to 3x time step, but more stable (each step is still a small step)
+- Code is shared via the Common tab, zero maintenance cost
+
+### 4. Reduce Iteration/Sample Count
+
+**Adaptive Loop Termination**:
+```glsl
+// In multi-scale sampling, exit early when the sampling radius exceeds the effective range
+float bbMax = 0.7 * iResolution.y;
+bbMax *= bbMax;
+for (int l = 0; l < SCALE_NUM; l++) {
+    if (dot(b, b) > bbMax) break;  // Beyond screen range, no need to continue
+    // ...
+    b *= 2.0;
+}
+```
+
+**MIP Level Count Adjustment**:
+- `TURBULENCE_SCALES = 11`: Full multi-scale, highest quality
+- `TURBULENCE_SCALES = 7`: Removes the largest scales, minimal quality loss
+- `TURBULENCE_SCALES = 5`: Noticeable speedup, suitable for mobile
+
+### 5. Initialization Strategy
+
+**Progressive Initialization**:
+```glsl
+// Output stable initial values for the first 20 frames
+if (iFrame < 20) data = vec4(0.5, 0, 0, 0);
+```
+- Why not `iFrame == 0`? Because some buffers depend on the output of other buffers
+- 20 frames ensures all buffers complete initialization propagation
+
+**Tiny Noise Initialization**:
+```glsl
+if (iFrame == 0) fragColor = 1e-6 * noise;
+```
+- Avoids exact zero values causing `0/0` or `normalize(vec2(0))` problems
+- Tiny noise breaks symmetry, allowing vortices to develop naturally
+
+## Combination Examples with Complete Code
+
+### 1. Fluid Simulation + Lighting
+
+```glsl
+// Image: Compute gradient from fluid buffer as normal, apply Phong lighting
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float delta = 1.0 / iResolution.y;
+
+    // Compute fluid surface gradient
+    float valC = getVal(uv);
+    vec2 grad = vec2(
+        getVal(uv + vec2(delta, 0)) - getVal(uv - vec2(delta, 0)),
+        getVal(uv + vec2(0, delta)) - getVal(uv - vec2(0, delta))
+    ) / delta;
+
+    // Build normal (z=150 controls surface flatness)
+    vec3 normal = normalize(vec3(grad, 150.0));
+
+    // Lighting
+    vec3 lightDir = normalize(vec3(-1.0, -1.0, 2.0));
+    vec3 viewDir = vec3(0, 0, 1);
+
+    float diff = clamp(dot(normal, lightDir), 0.5, 1.0);
+    float spec = pow(clamp(dot(reflect(lightDir, normal), viewDir), 0.0, 1.0), 36.0);
+
+    vec3 baseColor = vec3(0.2, 0.4, 0.8);  // Water surface color
+    fragColor = vec4(baseColor * diff + vec3(1.0) * spec * 0.5, 1.0);
+}
+```
+
+### 2. Fluid Simulation + Color Advection
+
+```glsl
+// Color Buffer: Track a color field, advected by the velocity field
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+    float dt = 0.15;
+    float scale = 3.0;
+
+    // Read velocity field
+    vec2 velocity = textureLod(iChannel0, uv, 0.0).xy;
+
+    // Color advection: sample own previous frame in the reverse velocity direction
+    vec4 col = textureLod(iChannel1, uv - dt * velocity * w * scale, 0.0);
+
+    // Inject color at the emission point
+    vec2 emitPos = vec2(0.5, 0.5);
+    float dist = length(uv - emitPos);
+    float emitterStrength = 0.0025;
+    float epsilon = 0.0005;
+    col += emitterStrength / (epsilon + pow(dist, 1.75)) * dt * 0.12 * palette(iTime * 0.05);
+
+    // Color decay
+    float decay = 0.004;
+    col = max(col - (0.0001 + col * decay) * 0.5, 0.0);
+    col = clamp(col, 0.0, 5.0);
+
+    fragColor = col;
+}
+```
+
+### 3. Scene Rendering + Bloom + TAA Post-Processing Chain
+
+Four-Buffer pipeline:
+- **Buffer A**: Scene rendering (with sub-pixel jitter for TAA)
+- **Buffer B**: Brightness extraction + downsampling to build bloom pyramid
+- **Buffer C/D**: Separable Gaussian blur
+- **Image**: Bloom compositing + tone mapping + chromatic aberration + vignette
+
+```glsl
+// Image: Final compositing
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Original scene
+    vec3 scene = texture(iChannel0, uv).rgb;
+
+    // Multi-level bloom compositing
+    vec3 bloom = vec3(0);
+    bloom += Grab(uv, 1.0, CalcOffset(0.0)).rgb * 1.0;
+    bloom += Grab(uv, 2.0, CalcOffset(1.0)).rgb * 1.5;
+    bloom += Grab(uv, 4.0, CalcOffset(2.0)).rgb * 2.0;
+    bloom += Grab(uv, 8.0, CalcOffset(3.0)).rgb * 3.0;
+
+    // Compositing
+    vec3 color = scene + bloom * 0.08;
+
+    // Filmic tone mapping
+    color = pow(color, vec3(1.5));
+    color = color / (1.0 + color);
+
+    // Chromatic Aberration
+    float ca = 0.002;
+    color.r = texture(iChannel0, uv + vec2(ca, 0)).r;
+    color.b = texture(iChannel0, uv - vec2(ca, 0)).b;
+
+    // Vignette
+    float vignette = 1.0 - dot(uv - 0.5, uv - 0.5) * 0.5;
+    color *= vignette;
+
+    fragColor = vec4(color, 1.0);
+}
+```
+
+### 4. G-Buffer + Screen-Space Effects
+
+Two-Buffer pipeline, no temporal feedback:
+- **Buffer A**: Output normals + depth + diffuse to G-Buffer
+- **Buffer B**: Screen-space edge detection / SSAO / SSR
+- **Image**: Stylized compositing (e.g., hand-drawn style, noise distortion)
+
+```glsl
+// Buffer B: Screen-space edge detection
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 offset = 1.0 / iResolution.xy;
+
+    vec4 center = texture(iChannel0, uv);
+
+    // Roberts Cross edge detection
+    vec4 tl = texture(iChannel0, uv + vec2(-offset.x, offset.y));
+    vec4 tr = texture(iChannel0, uv + vec2(offset.x, offset.y));
+    vec4 bl = texture(iChannel0, uv + vec2(-offset.x, -offset.y));
+    vec4 br = texture(iChannel0, uv + vec2(offset.x, -offset.y));
+
+    float edge = checkSame(center, tl) * checkSame(center, tr) *
+                 checkSame(center, bl) * checkSame(center, br);
+
+    fragColor = vec4(edge, center.w, center.z, 1.0);
+}
+```
+
+### 5. State Storage + Visualization Separation
+
+Standard pattern for games/particle systems. Logic and rendering are fully separated:
+- **Buffer A**: Pure logic computation, state stored in fixed texel positions
+- **Image**: Pure rendering, reads state via `texelFetch`, draws visuals using distance fields/rasterization
+
+```glsl
+// Image: Read game state from Buffer A and render
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 aspect = vec2(iResolution.x / iResolution.y, 1.0);
+
+    // Read ball state
+    vec4 ballPV = texelFetch(iChannel0, ivec2(0, 0), 0);
+    vec2 ballPos = ballPV.xy;
+
+    // Read paddle position
+    float paddleX = texelFetch(iChannel0, ivec2(1, 0), 0).x;
+
+    // Draw ball (distance field)
+    float ballDist = length((uv - ballPos * 0.5 - 0.5) * aspect);
+    vec3 ballColor = vec3(1.0, 0.8, 0.2) * smoothstep(0.02, 0.015, ballDist);
+
+    // Draw paddle
+    vec2 paddleCenter = vec2(paddleX * 0.5 + 0.5, 0.05);
+    vec2 paddleSize = vec2(0.08, 0.01);
+    vec2 d = abs((uv - paddleCenter) * aspect) - paddleSize;
+    float paddleDist = length(max(d, 0.0));
+    vec3 paddleColor = vec3(0.2, 0.6, 1.0) * smoothstep(0.005, 0.0, paddleDist);
+
+    // Read and draw brick grid
+    vec3 brickColor = vec3(0);
+    for (int y = 1; y <= 12; y++) {
+        for (int x = 0; x <= 13; x++) {
+            float alive = texelFetch(iChannel0, ivec2(x, y), 0).x;
+            if (alive > 0.5) {
+                vec2 brickCenter = vec2(float(x) / 14.0 + 0.036, float(y) / 14.0 + 0.036);
+                vec2 bd = abs((uv - brickCenter) * aspect) - vec2(0.03, 0.015);
+                float brickDist = length(max(bd, 0.0));
+                brickColor += vec3(0.8, 0.3, 0.5) * smoothstep(0.003, 0.0, brickDist);
+            }
+        }
+    }
+
+    fragColor = vec4(ballColor + paddleColor + brickColor, 1.0);
+}
+```
--- a/skills/shader-dev/reference/normal-estimation.md
+++ b/skills/shader-dev/reference/normal-estimation.md
@@ -0,0 +1,418 @@
+# SDF Normal Estimation — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisite knowledge, step-by-step explanations, mathematical derivations, variant analysis, and complete combination code examples.
+
+---
+
+## Prerequisites
+
+### GLSL Fundamentals
+
+- **Vector types**: `vec2`/`vec3` operations, swizzle syntax (`.xyy`, `.yxy`, `.yyx`)
+- Swizzle is used in normal estimation to quickly construct three-axis offset vectors from `vec2(h, 0.0)`
+
+### Vector Calculus
+
+- **Gradient concept**: The gradient `∇f` of a scalar field `f(x, y, z)` is a vector pointing in the direction of the fastest increase of the function value
+- For an SDF, the gradient direction is the **outward surface normal direction**
+- Mathematical definition of gradient: `∇f = (∂f/∂x, ∂f/∂y, ∂f/∂z)`
+
+### SDF Concepts
+
+- `map(p)` returns the signed distance from point `p` to the nearest surface
+- Positive = outside the surface, negative = inside, zero = exactly on the surface
+- An ideal SDF has a gradient magnitude of exactly 1 (Eikonal equation), but in practice this may deviate after boolean operations or deformations
+
+### Numerical Differentiation
+
+- **Finite differences** to approximate derivatives: `f'(x) ≈ (f(x+h) - f(x-h)) / 2h` (central difference)
+- Or `f'(x) ≈ (f(x+h) - f(x)) / h` (forward difference)
+- Forward difference accuracy is O(h), central difference accuracy is O(h²)
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Define the SDF Scene Function
+
+**What**: Create a `map(vec3 p) -> float` function that returns the signed distance from any point in space to the scene surface.
+
+**Why**: All normal estimation methods need to repeatedly call this function to sample the distance field. The normal function itself does not care about the SDF shape — it only needs to query distance values at different positions in space.
+
+```glsl
+float map(vec3 p) {
+    float d = length(p) - 1.0; // Unit sphere
+    // Can compose more SDF primitives
+    return d;
+}
+```
+
+### Step 2: Choose a Difference Method and Implement the Normal Function
+
+#### Method A: Forward Differences — 4 Samples
+
+**What**: Sample the SDF at point `p` and at three axis-aligned offsets, using the differences to build the gradient.
+
+**Why**: The simplest and most intuitive approach. Requires 4 samples (`map(p)` once + three offsets once each), suitable for beginners and performance-sensitive scenarios with lower accuracy requirements.
+
+**Mathematical derivation**:
+- `∂f/∂x ≈ (f(x+ε, y, z) - f(x, y, z)) / ε`
+- Since we `normalize()` at the end, the constant denominator `ε` can be omitted
+- Thus `n = normalize(map(p+εx̂) - map(p), map(p+εŷ) - map(p), map(p+εẑ) - map(p))`
+
+```glsl
+// Classic forward difference
+const float EPSILON = 1e-3;
+
+vec3 getNormal(vec3 p) {
+    vec3 n;
+    n.x = map(vec3(p.x + EPSILON, p.y, p.z));
+    n.y = map(vec3(p.x, p.y + EPSILON, p.z));
+    n.z = map(vec3(p.x, p.y, p.z + EPSILON));
+    return normalize(n - map(p));
+}
+```
+
+#### Method B: Central Differences — 6 Samples
+
+**What**: Sample once in each positive and negative direction per axis, taking the difference.
+
+**Why**: Symmetric sampling eliminates the first-order error term, improving accuracy from O(ε) to O(ε²). The cost is 6 SDF calls.
+
+**Mathematical derivation**:
+- Taylor expansion: `f(x+ε) = f(x) + εf'(x) + ε²f''(x)/2 + ...`
+- `f(x-ε) = f(x) - εf'(x) + ε²f''(x)/2 - ...`
+- Subtraction: `f(x+ε) - f(x-ε) = 2εf'(x) + O(ε³)`
+- The first-order error term is eliminated, improving accuracy by one order
+
+```glsl
+// Compact swizzle notation
+vec3 getNormal(vec3 p) {
+    vec2 o = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + o.xyy) - map(p - o.xyy),
+        map(p + o.yxy) - map(p - o.yxy),
+        map(p + o.yyx) - map(p - o.yyx)
+    ));
+}
+```
+
+#### Method C: Tetrahedron Technique — 4 Samples (Recommended)
+
+**What**: Sample the SDF along the 4 vertices of a regular tetrahedron, computing the weighted sum to obtain the gradient.
+
+**Why**: Requires only 4 samples (2 fewer than central difference), yet is more accurate and symmetric than forward difference.
+
+**Mathematical derivation**:
+- The 4 vertices of a regular tetrahedron: `(+,+,+)`, `(+,-,-)`, `(-,+,-)`, `(-,-,+)`
+- The coefficient `0.5773 ≈ 1/√3` normalizes the vertices onto the unit sphere
+- The weighted sum `Σ eᵢ·map(p + eᵢ·ε)` is equivalent to a gradient estimate in 4 symmetric directions
+- Due to the perfect symmetry of the tetrahedron, error distribution is more uniform than forward difference
+- Actual accuracy falls between forward and central difference, but only requires 4 samples
+
+```glsl
+// Classic tetrahedron technique
+vec3 calcNormal(vec3 pos) {
+    float eps = 0.0005; // Adjustable: sample offset
+    vec2 e = vec2(1.0, -1.0) * 0.5773;
+    return normalize(
+        e.xyy * map(pos + e.xyy * eps) +
+        e.yyx * map(pos + e.yyx * eps) +
+        e.yxy * map(pos + e.yxy * eps) +
+        e.xxx * map(pos + e.xxx * eps)
+    );
+}
+```
+
+### Step 3: Normalize and Apply to Lighting
+
+**What**: Call `normalize()` on the gradient vector to obtain the unit normal for subsequent lighting calculations.
+
+**Why**: The gradient length obtained from finite differences depends on the local gradient magnitude of the SDF. Lighting calculations require unit vectors. For an ideal SDF (gradient magnitude of 1), normalize barely changes the direction, but for SDFs that have undergone boolean operations or deformations, the gradient magnitude may deviate from 1, and normalize ensures correct results.
+
+```glsl
+// After a raymarching hit
+vec3 pos = ro + rd * t;        // Hit point
+vec3 nor = calcNormal(pos);    // Surface normal
+
+// Basic Lambertian diffuse
+vec3 lightDir = normalize(vec3(1.0, 4.0, -4.0));
+float diff = max(dot(nor, lightDir), 0.0);
+vec3 col = vec3(0.8) * diff;
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Reverse Offset Forward Difference
+
+**Difference from base version**: Uses center point minus three negative-direction offset samples, rather than positive-direction offsets minus center. Functionally equivalent to forward difference, but with a more compact code structure.
+
+**Principle**: `map(p) - map(p - εx̂)` is equivalent to the mirror version of `map(p + εx̂) - map(p)`. Since we normalize at the end, the direction is unchanged.
+
+```glsl
+// Reverse offset variant
+vec2 noff = vec2(0.001, 0.0);
+vec3 normal = normalize(
+    map(pos) - vec3(
+        map(pos - noff.xyy),
+        map(pos - noff.yxy),
+        map(pos - noff.yyx)
+    )
+);
+```
+
+### Variant 2: Adaptive Epsilon (Distance Scaling)
+
+**Difference from base version**: Epsilon is multiplied by the ray travel distance `t`, using larger offsets for distant surfaces (avoiding floating-point noise) and smaller offsets for nearby surfaces (preserving detail).
+
+**Principle**: The farther the ray distance, the lower the floating-point precision (since absolute error is proportional to the magnitude of the value). Meanwhile, distant pixels cover a larger world-space area and don't need high-precision normals. Adaptive epsilon naturally matches both requirements.
+
+**Typical coefficient**: `0.001 * t`, where `0.001` can be adjusted based on scene complexity.
+
+```glsl
+// Adaptive epsilon with tetrahedron technique
+vec3 calcNormal(vec3 pos, float t) {
+    float precis = 0.001 * t; // Adjustable: base coefficient 0.001
+
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx)
+    );
+}
+// Usage: vec3 nor = calcNormal(pos, t);
+```
+
+### Variant 3: Large Epsilon Rounding / Anti-Aliasing Trick
+
+**Difference from base version**: Intentionally uses a large epsilon (e.g., `0.015`), causing normals to "blur" at geometric edges, producing a visual rounding and anti-aliasing effect.
+
+**Principle**: A large epsilon means the normal sampling spans a larger spatial range. At sharp edges of geometry, the SDF value changes on both sides are averaged out, causing normals to transition smoothly at edges, similar to a chamfer/fillet effect.
+
+**Use cases**: Procedural architecture, mechanical parts, and other scenarios needing visual rounding without modifying the SDF geometry.
+
+```glsl
+// Large epsilon rounding technique
+vec3 getNormal(vec3 p) {
+    vec2 e = vec2(0.015, -0.015); // Intentionally enlarged epsilon
+    return normalize(
+        e.xyy * map(p + e.xyy) +
+        e.yyx * map(p + e.yyx) +
+        e.yxy * map(p + e.yxy) +
+        e.xxx * map(p + e.xxx)
+    );
+}
+```
+
+### Variant 4: Anti-Inlining Loop Trick
+
+**Difference from base version**: Writes the tetrahedron's 4 samples as a `for` loop with bit operations to generate vertex directions, preventing the GLSL compiler from inlining `map()` 4 times, significantly reducing compile times for complex scenes.
+
+**Principle**:
+- GLSL compilers typically unroll small loops and inline function calls
+- For complex `map()` functions (e.g., hundreds of lines), being inlined 4 times causes code bloat
+- `#define ZERO (min(iFrame, 0))` makes the loop bound a runtime value (though it is always 0 in practice), preventing the compiler from unrolling at compile time
+- Bit operations `(((i+3)>>1)&1)` etc. generate the 4 tetrahedron vertex directions at runtime, equivalent to hand-written `e.xyy`, `e.yyx`, `e.yxy`, `e.xxx`
+
+**Bit operation correspondence**:
+| i | `(((i+3)>>1)&1)` | `((i>>1)&1)` | `(i&1)` | Direction |
+|---|---|---|---|---|
+| 0 | 1 | 0 | 0 | (+,-,-) |
+| 1 | 0 | 0 | 1 | (-,-,+) |
+| 2 | 0 | 1 | 0 | (-,+,-) |
+| 3 | 1 | 1 | 1 | (+,+,+) |
+
+```glsl
+// Anti-inlining loop trick
+#define ZERO (min(iFrame, 0)) // Prevent compile-time constant folding
+
+vec3 calcNormal(vec3 p, float t) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3(
+            (((i + 3) >> 1) & 1),
+            ((i >> 1) & 1),
+            (i & 1)
+        ) - 1.0);
+        n += e * map(p + e * 0.001 * t);
+    }
+    return normalize(n);
+}
+```
+
+### Variant 5: Normal + Edge Detection (Dual-Purpose Sampling)
+
+**Difference from base version**: On top of the 6+1 samples from central difference, additionally computes a Laplacian approximation (deviation of per-axis sample averages from the center value) for detecting surface discontinuities (edges).
+
+**Principle**:
+- The Laplacian operator `∇²f = ∂²f/∂x² + ∂²f/∂y² + ∂²f/∂z²` measures local curvature
+- Numerical approximation: `∂²f/∂x² ≈ (f(x+h) + f(x-h) - 2f(x)) / h²`
+- At surface discontinuities (edges, cracks), the Laplacian value spikes
+- In the code, `abs(d - 0.5*(d2+d1))` is the Laplacian approximation on the x axis (omitting constant factors)
+- `pow(edge, 0.55) * 15.0` is an empirical contrast adjustment
+
+```glsl
+// Normal + edge detection (dual-purpose sampling)
+float edge = 0.0;
+vec3 normal(vec3 p) {
+    vec3 e = vec3(0.0, det * 5.0, 0.0); // det = detail level
+
+    float d1 = de(p - e.yxx), d2 = de(p + e.yxx);
+    float d3 = de(p - e.xyx), d4 = de(p + e.xyx);
+    float d5 = de(p - e.xxy), d6 = de(p + e.xxy);
+    float d  = de(p);
+
+    // Laplacian edge detection: deviation of center value from per-axis averages
+    edge = abs(d - 0.5 * (d2 + d1))
+         + abs(d - 0.5 * (d4 + d3))
+         + abs(d - 0.5 * (d6 + d5));
+    edge = min(1.0, pow(edge, 0.55) * 15.0);
+
+    return normalize(vec3(d1 - d2, d3 - d4, d5 - d6));
+}
+```
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### Bottleneck 1: SDF Sample Count
+
+Normal estimation is the **second-largest SDF call hotspot** in the raymarching pipeline, after the marching loop itself. Every pixel calls the normal function once upon hitting a surface, and the normal function internally calls `map()` 4~7 times.
+
+| Method | Samples | Accuracy | Recommendation |
+|--------|---------|----------|----------------|
+| Forward difference | 4 | O(ε) | Simple scenes |
+| Reverse offset difference | 4 | O(ε) | Same as forward, more compact code |
+| Tetrahedron technique | 4 | Between forward and central | **Preferred** |
+| Central difference | 6 | O(ε²) | When symmetry is needed |
+| Central difference + edge | 7 | O(ε²) + extra info | When edge detection is needed |
+
+**Recommendation**: Default to the tetrahedron technique; only switch to central difference when visual artifacts (e.g., jagged normals) appear.
+
+### Bottleneck 2: Compile Time Explosion
+
+Complex SDFs (e.g., `map()` functions with hundreds of lines) inlined 4~6 times by the normal function can cause compile times to grow from seconds to minutes.
+
+**Root cause**: GLSL compilers attempt to unroll small loops and inline function calls, duplicating the `map()` code 4~6 times.
+
+**Solution**: Use the anti-inlining loop trick (Variant 4), combined with `#define ZERO (min(iFrame, 0))` to prevent the compiler from unrolling at compile time. This keeps only one copy of the `map()` code, called in a runtime loop.
+
+### Bottleneck 3: Epsilon Selection
+
+| Epsilon Range | Effect |
+|---------------|--------|
+| < 1e-5 | Insufficient floating-point precision, normals show noise spots |
+| 0.0005 ~ 0.001 | **Recommended default** |
+| 0.01 ~ 0.02 | Slight smoothing / rounding effect |
+| > 0.05 | Detail loss, geometric edges overly smoothed |
+
+**Best practice**: Use adaptive epsilon `eps * t`, where `eps ≈ 0.001` and `t` is the ray distance. This preserves detail up close and avoids floating-point noise at distance.
+
+### Bottleneck 4: Avoiding Redundant Sampling
+
+If the same position needs both normals and other information (e.g., edge detection, AO pre-estimation), reuse SDF sampling results whenever possible. Variant 5 is a good example: on top of the 6 samples for normal computation, only 1 additional center sample is needed for edge detection, saving nearly half compared to computing normals and edge detection separately (13 samples total).
+
+---
+
+## Combination Suggestions with Full Code
+
+### 1. Normal + Soft Shadow
+
+After the normal determines surface orientation, a secondary raymarch from the hit point toward the light source computes the soft shadow. The normal is used to offset the starting point to avoid self-intersection:
+
+```glsl
+float shadow = calcSoftShadow(pos + nor * 0.01, sunDir, 16.0);
+```
+
+A complete soft shadow function typically looks like this:
+
+```glsl
+float calcSoftShadow(vec3 ro, vec3 rd, float k) {
+    float res = 1.0;
+    float t = 0.01;
+    for (int i = 0; i < 64; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, k * h / t);
+        if (res < 0.001) break;
+        t += clamp(h, 0.01, 0.2);
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### 2. Normal + Ambient Occlusion (AO)
+
+The normal direction defines the sampling hemisphere for AO. Sampling the SDF along the normal with increasing step sizes — if the actual distance is less than the expected distance (i.e., nearby geometry is occluding), the AO value decreases:
+
+```glsl
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + nor * h);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+**Parameter notes**:
+- `0.01 + 0.12 * float(i) / 4.0`: Sample step from 0.01 to 0.13, covering near-distance occlusion
+- `sca *= 0.95`: Decreasing weight for farther samples
+- `3.0 * occ`: Contrast adjustment coefficient
+
+### 3. Normal + Fresnel Effect
+
+The angle between the normal and view direction controls Fresnel reflection intensity. At grazing angles (normal nearly perpendicular to view), reflection is strongest:
+
+```glsl
+float fresnel = pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 5.0);
+col = mix(col, envColor, fresnel);
+```
+
+**Principle**: `dot(nor, rd)` is close to -1 when the surface directly faces the viewer (`rd` points in the view direction, normal points outward) and close to 0 at grazing angles. Adding 1 shifts the range to [0, 1]; taking the 5th power enhances contrast.
+
+### 4. Normal + Bump Mapping
+
+Procedural perturbation layered on top of SDF normals adds surface detail without modifying the geometry:
+
+```glsl
+vec3 doBumpMap(vec3 pos, vec3 nor) {
+    vec2 e = vec2(0.001, 0.0);
+    float bump = texture(iChannel0, pos.xz * 0.5).x;
+    float bx = texture(iChannel0, (pos.xz + e.xy) * 0.5).x;
+    float bz = texture(iChannel0, (pos.xz + e.yx) * 0.5).x;
+    vec3 grad = vec3(bx - bump, 0.0, bz - bump) / e.x;
+    return normalize(nor + grad * 0.1); // 0.1 controls bump intensity
+}
+```
+
+**Principle**: Computes the height map gradient in texture space and adds it to the geometric normal. `0.1` controls the visual bump strength — larger values make the surface appear rougher.
+
+### 5. Normal + Triplanar Mapping
+
+The absolute values of the normal components serve as blending weights for triplanar texturing, achieving UV-free texturing:
+
+```glsl
+vec3 triplanar(sampler2D tex, vec3 pos, vec3 nor) {
+    vec3 w = pow(abs(nor), vec3(4.0));
+    w /= (w.x + w.y + w.z);
+    return texture(tex, pos.yz).rgb * w.x
+         + texture(tex, pos.zx).rgb * w.y
+         + texture(tex, pos.xy).rgb * w.z;
+}
+```
+
+**Principle**:
+- Faces with normals pointing along the X axis use YZ plane projection
+- Faces with normals pointing along the Y axis use ZX plane projection
+- Faces with normals pointing along the Z axis use XY plane projection
+- `pow(abs(nor), vec3(4.0))` makes blending sharper, reducing blurring in transition regions
+- Normalized weights `w /= (w.x + w.y + w.z)` ensure total weight sums to 1
--- a/skills/shader-dev/reference/particle-system.md
+++ b/skills/shader-dev/reference/particle-system.md
@@ -0,0 +1,589 @@
+# Particle Systems — Detailed Reference
+
+This document is a detailed supplement to SKILL.md, containing prerequisites, in-depth explanations of each step, variant details, performance optimization analysis, and complete code for combination suggestions.
+
+> **NOTE:** Code examples in this document primarily target the ShaderToy environment. **For standalone HTML deployment, refer to the WebGL2 single-file template in SKILL.md**, which includes complete HTML + JS + GLSL code.
+
+## Prerequisites
+
+- GLSL basic syntax (uniforms, varyings, built-in functions)
+- 2D/3D vector math (dot product, cross product, normalization, matrix rotation)
+- ShaderToy architecture (`mainImage`, `iTime`, `iResolution`, `iChannel`, multi-Buffer passes)
+- Basic physics concepts: velocity = derivative of position, acceleration = force/mass
+- Usage of `texelFetch` (precise pixel data reading from Buffers)
+
+## Implementation Steps in Detail
+
+### Step 1: Hash Random Functions
+
+**What**: Define a function that generates pseudo-random numbers from a float (particle ID). This is the foundational infrastructure for all particle systems.
+
+**Why**: Each particle needs unique but deterministic attributes (color, size, initial direction, etc.); hash functions provide repeatable "randomness".
+
+Three hash function dimensions are provided:
+- `hash11`: 1D → 1D, for scalar randomness (lifetime, brightness, etc.)
+- `hash12`: 1D → 2D, for 2D randomness (initial position, etc.)
+- `hash33`: 3D → 3D, for 3D velocity perturbation
+
+```glsl
+// Standard 1D -> 1D hash, returns [0, 1)
+float hash11(float p) {
+    return fract(sin(p * 127.1) * 43758.5453);
+}
+
+// 1D -> 2D hash, for 2D randomness
+vec2 hash12(float p) {
+    vec3 p3 = fract(vec3(p) * vec3(0.1031, 0.1030, 0.0973));
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.xx + p3.yz) * p3.zy);
+}
+
+// 3D -> 3D hash, for 3D velocity perturbation
+vec3 hash33(vec3 p) {
+    p = fract(p * vec3(443.897, 397.297, 491.187));
+    p += dot(p.zxy, p.yxz + 19.19);
+    return fract(vec3(p.x * p.y, p.z * p.x, p.y * p.z)) - 0.5;
+}
+```
+
+### Step 2: Particle Lifecycle Management
+
+**What**: Compute birth time, lifespan, current age for each particle, and auto-respawn after death.
+
+**Why**: Lifecycle is the core mechanism of particle systems — the cycle of birth, motion, fade-out, death, and respawn. `fract` or `mod` implements infinite cycling without additional state.
+
+Key design:
+- `spawnTime`: Each particle's birth time differs, generated by `hash11` from the ID, spread across the `[0, START_TIME]` interval
+- `lifetime`: Each particle's lifespan differs, random within the `[LIFETIME_MIN, LIFETIME_MAX]` interval
+- `mod(time - spawnTime, lifetime)`: Automatic cycling; the particle respawns immediately after death
+- `floor(...)` computes the current life cycle number, used to generate different random attributes each cycle
+
+```glsl
+#define NUM_PARTICLES 100   // adjustable: particle count
+#define LIFETIME_MIN 1.0    // adjustable: minimum lifespan (seconds)
+#define LIFETIME_MAX 3.0    // adjustable: maximum lifespan (seconds)
+#define START_TIME 2.0      // adjustable: time for all particles to be born
+
+// Returns: x = current normalized age [0,1], y = current life cycle number
+vec2 particleAge(int id, float time) {
+    float spawnTime = START_TIME * hash11(float(id) * 2.0);
+    float lifetime = mix(LIFETIME_MIN, LIFETIME_MAX, hash11(float(id) * 3.0 - 35.0));
+    float age = mod(time - spawnTime, lifetime);
+    float run = floor((time - spawnTime) / lifetime);
+    return vec2(age / lifetime, run);
+}
+```
+
+### Step 3: Stateless Particle Position Computation
+
+**What**: Compute 2D/3D position solely from particle ID and time, without relying on any Buffer.
+
+**Why**: For decorative effects (starfields, fireworks, orbiting light points), the stateless approach is simplest and most efficient. Define the main trajectory via parametric curves (e.g., Lissajous curves), then add random offset and gravity.
+
+Position is composed of three components:
+1. **Main trajectory** (harmonic oscillator): Multiple cosine waves superimposed to form smooth Lissajous curves, controlling the overall motion path of the particle group
+2. **Random drift**: Each particle linearly diffuses from the main trajectory position over time; `DRIFT_MAX` controls the diffusion range
+3. **Gravity**: Parabolic descent via `0.5 * g * t²`; `age²` is the normalized form of time
+
+```glsl
+#define GRAVITY vec2(0.0, -4.5)     // adjustable: gravity direction and strength
+#define DRIFT_MAX vec2(0.28, 0.28)  // adjustable: maximum random drift amplitude
+
+// Harmonic superposition for smooth main trajectory
+float harmonics(vec3 freq, vec3 amp, vec3 phase, float t) {
+    float val = 0.0;
+    for (int h = 0; h < 3; h++)
+        val += amp[h] * cos(t * freq[h] * 6.2832 + phase[h] / 360.0 * 6.2832);
+    return (1.0 + val) / 2.0;
+}
+
+vec2 particlePosition(int id, float time) {
+    vec2 ageInfo = particleAge(id, time);
+    float age = ageInfo.x;
+    float run = ageInfo.y;
+
+    // Main trajectory (harmonic oscillator)
+    float slowTime = time * 0.1; // time along main trajectory
+    vec2 mainPos = vec2(
+        harmonics(vec3(0.4, 0.66, 0.78), vec3(0.8, 0.24, 0.18), vec3(0.0, 45.0, 55.0), slowTime),
+        harmonics(vec3(0.415, 0.61, 0.82), vec3(0.72, 0.28, 0.15), vec3(90.0, 120.0, 10.0), slowTime)
+    );
+
+    // Random drift (grows linearly with time)
+    vec2 drift = DRIFT_MAX * (vec2(hash11(float(id) * 3.0 + run * 4.0),
+                                    hash11(float(id) * 7.0 - run * 2.5)) - 0.5) * age;
+    // Gravity effect
+    vec2 grav = GRAVITY * age * age * 0.5;
+
+    return mainPos + drift + grav;
+}
+```
+
+### Step 4: Buffer-Stored Particle State (Stateful System)
+
+**What**: Use one row of pixels in a Buffer texture to store all particles, with each pixel = one particle's (pos.x, pos.y, vel.x, vel.y).
+
+**Why**: When inter-frame persistent state is needed (physics collisions, force field interactions, N-body simulations), particle data must be written to a Buffer and read back the next frame. In ShaderToy, each pixel is a storage cell.
+
+Design points:
+- `fragCoord.y > 0.5`: Only the first row of pixels stores particles; remaining pixels are discarded
+- `fragCoord.x` corresponds to particle ID; each pixel's RGBA stores (pos.x, pos.y, vel.x, vel.y)
+- `iFrame < 5`: First few frames are initialization, randomly distributing particle positions
+- Force accumulation: boundary repulsion + inter-particle attraction/repulsion + friction
+- Clamp velocity and acceleration after integration to prevent numerical explosion
+
+```glsl
+// === Buffer A: Particle physics update ===
+#define NUM_PARTICLES 40    // adjustable: particle count
+#define MAX_VEL 0.5         // adjustable: maximum velocity
+#define MAX_ACC 3.0         // adjustable: maximum acceleration
+#define RESIST 0.2          // adjustable: drag coefficient
+#define DT 0.03             // adjustable: time step
+
+// Read the i-th particle's data
+vec4 loadParticle(float i) {
+    return texelFetch(iChannel0, ivec2(i, 0), 0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    if (fragCoord.y > 0.5 || fragCoord.x > float(NUM_PARTICLES)) discard;
+
+    float id = floor(fragCoord.x);
+    vec2 res = iResolution.xy / iResolution.y;
+
+    // Initialization
+    if (iFrame < 5) {
+        vec2 rng = hash12(id);
+        fragColor = vec4(0.1 + 0.8 * rng * res, 0.0, 0.0);
+        return;
+    }
+
+    // Read current state
+    vec4 particle = loadParticle(id); // xy = pos, zw = vel
+    vec2 pos = particle.xy;
+    vec2 vel = particle.zw;
+
+    // === Force accumulation ===
+    vec2 force = vec2(0.0);
+
+    // Boundary repulsion force
+    force += 0.8 * (1.0 / abs(pos) - 1.0 / abs(res - pos));
+
+    // Inter-particle interaction (attraction/repulsion)
+    for (float i = 0.0; i < float(NUM_PARTICLES); i++) {
+        if (i == id) continue;
+        vec4 other = loadParticle(i);
+        vec2 w = pos - other.xy;
+        float d = length(w);
+        if (d > 0.0)
+            force -= w * (6.3 + log(d * d * 0.02)) / exp(d * d * 2.4) / d;
+    }
+
+    // Friction force
+    force -= vel * RESIST / DT;
+
+    // === Integration ===
+    vec2 acc = force;
+    float a = length(acc);
+    acc *= a > MAX_ACC ? MAX_ACC / a : 1.0; // limit acceleration
+
+    vel += acc * DT;
+    float v = length(vel);
+    vel *= v > MAX_VEL ? MAX_VEL / v : 1.0; // limit velocity
+
+    pos += vel * DT;
+
+    fragColor = vec4(pos, vel);
+}
+```
+
+### Step 5: Particle Rendering — Point Light / Metaball Style
+
+**What**: Iterate over all particles in the Image pass and render each as a soft glowing point.
+
+**Why**: `1/dot(p,p)` produces natural inverse-square distance falloff; when multiple particles overlap, the result resembles metaball fusion. This is the most classic particle rendering method.
+
+Rendering principle:
+- `dot(p, p)` is `dist²`; using it as the denominator produces inverse-square distance falloff
+- `BRIGHTNESS` controls point size — larger values produce bigger glow points
+- `totalWeight` accumulates the metaball contribution of all particles
+- Color interpolates between `COLOR_START` and `COLOR_END` based on particle velocity
+- `mix(col, pcol, mb / totalWeight)` implements contribution-weighted color blending, with nearby particles having higher color weight
+- Final normalize + clamp prevents overexposure
+
+```glsl
+#define BRIGHTNESS 0.002        // adjustable: particle brightness
+#define COLOR_START vec3(0.0, 0.64, 0.2)  // adjustable: start color
+#define COLOR_END vec3(0.06, 0.35, 0.85)  // adjustable: end color
+
+vec3 renderParticles(vec2 uv) {
+    vec3 col = vec3(0.0);
+    float totalWeight = 0.0;
+
+    for (int i = 0; i < NUM_PARTICLES; i++) {
+        vec4 particle = loadParticle(float(i));
+        vec2 p = uv - particle.xy;
+
+        // Metaball-style falloff: radius / distance²
+        float mb = BRIGHTNESS / dot(p, p);
+        totalWeight += mb;
+
+        // Interpolate color based on particle attributes
+        float ratio = length(particle.zw) / MAX_VEL;
+        vec3 pcol = mix(COLOR_START, COLOR_END, ratio);
+        col = mix(col, pcol, mb / totalWeight);
+    }
+
+    totalWeight /= float(NUM_PARTICLES);
+    col = normalize(col) * clamp(totalWeight, 0.0, 0.4);
+    return col;
+}
+```
+
+### Step 6: Frame Feedback Motion Blur
+
+**What**: Blend the current frame with the previous frame's Buffer to produce motion trails.
+
+**Why**: Single-frame particles are just discrete dots; through temporal accumulation (feedback blending), continuous trails/afterimage effects are produced. The blend coefficient controls trail length.
+
+Design points:
+- `TRAIL_DECAY` closer to 1 produces longer trails (0.99 = very long trail, 0.9 = short trail)
+- Requires an extra Buffer pass: Buffer B handles trail accumulation, Image pass reads from Buffer B
+- `prev * TRAIL_DECAY + current`: Decay old frame + overlay new frame
+- This method can also simulate high particle density with few particles + long trails
+
+```glsl
+#define TRAIL_DECAY 0.95  // adjustable: trail decay rate, closer to 1 = longer trail
+
+// In the rendering Buffer's mainImage:
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Read previous frame
+    vec3 prev = texture(iChannel0, uv).rgb * TRAIL_DECAY;
+
+    // Draw current frame particles
+    vec3 current = renderParticles(fragCoord / iResolution.y);
+
+    // Overlay
+    fragColor = vec4(prev + current, 1.0);
+}
+```
+
+### Step 7: HSV Coloring and Star Glare Effects
+
+**What**: Color particles using HSV color space and add cross/star diffraction spike lines.
+
+**Why**: HSV makes it easy to rotate hue for rainbow effects; star glare (diffraction spikes) simulates real lens optical effects, giving light points more visual quality.
+
+HSV coloring principle:
+- `hsv.x` = Hue, 0-1 maps to one revolution of the color wheel
+- `hsv.y` = Saturation, 0 = gray, 1 = pure color
+- `hsv.z` = Value, 0 = black, 1 = brightest
+- Cosine waves approximate the RGB channel hue response curves
+
+Star glare principle:
+- Star glare is caused by diffraction from lens aperture blades in real photography
+- Implemented by stretching the distance field in specific directions: one horizontal, one vertical, one at each 45-degree diagonal
+- `stretch` parameter controls the stretch ratio; larger values produce thinner, longer lines
+- `0.707` is the approximation of `cos(45°)` = `sin(45°)`, used to rotate to diagonal directions
+
+```glsl
+// HSV -> RGB conversion
+vec3 hsv2rgb(vec3 c) {
+    vec4 K = vec4(1.0, 2.0 / 3.0, 1.0 / 3.0, 3.0);
+    vec3 p = abs(fract(c.xxx + K.xyz) * 6.0 - K.www);
+    return c.z * mix(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y);
+}
+
+// Star glare effect: produces elongated light rays in horizontal/vertical/diagonal directions
+float starGlare(vec2 relPos, float intensity) {
+    // Horizontal/vertical branches
+    vec2 stretch = vec2(9.0, 0.32); // adjustable: stretch ratio
+    float dh = length(relPos * stretch);
+    float dv = length(relPos * stretch.yx);
+
+    // Diagonal branches
+    vec2 diagPos = 0.707 * vec2(dot(relPos, vec2(1, 1)), dot(relPos, vec2(1, -1)));
+    float dd1 = length(diagPos * vec2(13.0, 0.61));
+    float dd2 = length(diagPos * vec2(0.61, 13.0));
+
+    float glare = 0.25 / (dh * 3.0 + 0.01)
+                + 0.25 / (dv * 3.0 + 0.01)
+                + 0.19 / (dd1 * 3.0 + 0.01)
+                + 0.19 / (dd2 * 3.0 + 0.01);
+    return glare * intensity;
+}
+```
+
+## Common Variant Details
+
+### Variant 1: Metaball Polar Coordinate Particles
+
+**Difference from base version**: Particles are uniformly distributed in polar coordinates and expand outward, using `1/dot(p,p)` metaball fusion instead of point lights, producing organic "blob-like" effects.
+
+Design points:
+- Particle positions change from Cartesian to polar coordinates: angles uniformly distributed around the circle, distance cycles with `fract` over time
+- `fract(time * speed + hash)` produces particles expanding from center outward then respawning
+- Metaball rendering: `0.84 / dot(p, p)` values naturally accumulate where particles overlap, forming fused organic shapes
+- Color interpolates between start and end colors based on distance `d`
+- `mb / totalSum` ensures color blending is weighted by contribution
+
+```glsl
+// Particle position changed to polar coordinate expansion
+float d = fract(time * 0.51 + 48934.4238 * sin(float(i) * 692.7398));
+float angle = TAU * float(i) / float(NUM_PARTICLES);
+vec2 particlePos = d * vec2(cos(angle), sin(angle)) * 4.0;
+
+// Metaball rendering replaces point light
+vec2 p = uv - particlePos;
+float mb = 0.84 / dot(p, p);  // adjustable: 0.84 = metaball radius
+col = mix(col, mix(startColor, endColor, d), mb / totalSum);
+```
+
+### Variant 2: Buffer Storage + Boids Flocking Behavior
+
+**Difference from base version**: Changes from stateless to stateful, with each particle stored in a Buffer pixel, enabling N-body attraction/repulsion interaction and Boids emergent behavior.
+
+Design points:
+- Each particle iterates over all other particles, computing the net attraction/repulsion force
+- The force formula `(6.3 + log(d² * 0.02)) / exp(d² * 2.4)` produces:
+  - Short-range repulsion (exponential decay dominates)
+  - Medium-range attraction (logarithmic term dominates)
+  - Long-range no effect (exponential decay approaches zero)
+- Friction `vel * 0.2 / dt` prevents infinite acceleration
+- Overall effect: particles self-organize into group motion patterns, exhibiting fish-school/bird-flock emergent behavior
+
+```glsl
+// Buffer A: force accumulation
+vec2 sumForce = vec2(0.0);
+for (float j = 0.0; j < NUM_PARTICLES; j++) {
+    if (j == id) continue;
+    vec4 other = texelFetch(iChannel0, ivec2(j, 0), 0);
+    vec2 w = pos - other.xy;
+    float d = length(w);
+    // Combined attraction+repulsion: short-range repulsion, long-range attraction
+    sumForce -= w * (6.3 + log(d * d * 0.02)) / exp(d * d * 2.4) / d;
+}
+sumForce -= vel * 0.2 / dt; // friction
+```
+
+### Variant 3: Verlet Integration Cloth Simulation
+
+**Difference from base version**: Particles are connected through spring constraints (grid topology), using Verlet integration instead of Euler method — no need to explicitly store velocity.
+
+Design points:
+- Verlet integration: `newPos = 2 * current - previous + acc * dt²`
+  - Velocity is implicit in `current - previous`
+  - No separate velocity storage needed; RGBA can store (current.xy, previous.xy)
+  - More stable than Euler in constraint solving (won't blow up from high-frequency oscillation)
+- Spring constraints: each pair of adjacent particles has a "rest length"
+  - Compute the difference between current distance and rest length
+  - Move particles toward the rest length by a small step (0.1 is the relaxation coefficient)
+  - Multiple constraint-solving iterations converge to a stable state
+- Grid topology: particle IDs arranged in rows and columns, each connected to its up/down/left/right neighbors
+
+```glsl
+// Verlet integration: velocity is implicit in (current position - previous position)
+// particle.xy = current position, particle.zw = previous position
+vec2 newPos = 2.0 * particle.xy - particle.zw + vec2(0.0, -0.6) * dt * dt;
+particle.zw = particle.xy;
+particle.xy = newPos;
+
+// Spring constraint solving
+vec4 neighbor = texelFetch(iChannel0, neighborId, 0);
+vec2 delta = neighbor.xy - particle.xy;
+float dist = length(delta);
+float restLength = 0.1; // adjustable: rest length
+particle.xy += 0.1 * (dist - restLength) * (delta / dist);
+```
+
+### Variant 4: 3D Particles + Ray Rendering
+
+**Difference from base version**: Particles are stored in 3D space; rendering uses rays cast from the camera, computing the closest distance from each ray to each particle for coloring.
+
+Design points:
+- Camera at `(0, 0, 2.5)`, ray direction determined by screen UV
+- Point-to-line distance formula: `|cross(P-O, D)|`, where O is ray origin, D is ray direction, P is particle position
+- `dot(cross(...), cross(...))` computes the squared distance (avoiding sqrt)
+- `× 1000.0` is the distance scaling factor controlling visual particle size
+- Difference from 2D rendering: 2D uses length of `uv - pos`, 3D uses closest distance from ray to point
+
+```glsl
+// 3D rendering: closest distance from ray to particle
+vec3 ro = vec3(0.0, 0.0, 2.5);
+vec3 rd = normalize(vec3(uv, -0.5));
+for (int i = 0; i < numParticles; i++) {
+    vec3 pos = texture(iChannel0, vec2(i, 100.0) * w).rgb;
+    // Squared distance from point to line
+    float d = dot(cross(pos - ro, rd), cross(pos - ro, rd));
+    d *= 1000.0;
+    float glow = 0.14 / (pow(d, 1.1) + 0.03);
+    col += glow * particleColor;
+}
+```
+
+### Variant 5: Raindrop Particles (3D Scene Integration)
+
+**Difference from base version**: Particles move in 3D world space (gravity + wind + jitter), rendered as screen-space water drops with normal mapping to simulate refraction. Respawned randomly after lifecycle ends.
+
+Design points:
+- `speedScale` includes `sin(π/2 * pow(age/lifetime, 2))` to accelerate the fall
+- Wind force is projected onto the camera's right/up directions via dot product
+- Jitter `randVec2 * jitterSpeed` simulates air turbulence
+- Death and respawn: `particle.z` accumulates age; when it exceeds `particle.a` (lifespan), position and lifespan are reset
+- Rendering can overlay raindrop SDF + refraction normal mapping to simulate realistic raindrop optics
+
+```glsl
+// 3D force accumulation
+float speedScale = 0.0015 * (0.1 + 1.9 * sin(PI * 0.5 * pow(age / lifetime, 2.0)));
+particle.x += (windShieldOffset.x + windIntensity * dot(rayRight, windDir)) * fallSpeed * speedScale * dt;
+particle.y += (windShieldOffset.y + windIntensity * dot(rayUp, windDir)) * fallSpeed * speedScale * dt;
+// Jitter
+particle.xy += 0.001 * (randVec2(particle.xy + iTime) - 0.5) * jitterSpeed * dt;
+// Death and respawn
+if (particle.z > particle.a) {
+    particle.xy = vec2(rand(seedX), rand(seedY)) * iResolution.xy;
+    particle.a = lifetimeMin + rand(pid) * (lifetimeMax - lifetimeMin);
+    particle.z = 0.0;
+}
+```
+
+### Variant 6: Vortex/Storm Particle System
+
+**Difference from base version**: Particles move along spiral trajectories, forming storm/sandstorm/blizzard effects. Stateless single pass.
+
+Design points:
+- Differential rotation: inner circles rotate faster than outer circles (`angularSpeed = k / (offset + radius)`), producing natural vortices
+- Particle color must be significantly brighter than the background (2-3x), otherwise invisible against similar-colored backgrounds
+- Brightness budget: with 150 particles, `numerator=0.002, epsilon=0.008` (peak=0.25) is safe
+- Vortex center dark zone implemented with `smoothstep(innerR, outerR, dist)` mask
+
+### Variant 7: Meteor/Trailing Line Rendering
+
+**Difference from base version**: Particles are rendered as elongated glow lines rather than circular light points.
+
+Design points:
+- **Must have a clearly visible static starfield background**: Call `starField()` function; stars rendered as sharp Gaussian points using `exp(-dist²*k)`, with peak brightness >= 0.3
+- Meteor trail must be bright enough: `core` multiplier >= 0.15; after dividing by sample count, each step still needs >= 0.005 visible contribution
+- **Do not use `1/(distPerp² + tiny_epsilon)`** for lines — use `exp(-distPerp / width)` for safe glow
+- Meteor head `headGlow = 0.005 / (dist² + 0.0008)` ensures bright visibility
+- trailLen range 0.15-0.35 ensures sufficient trail length
+
+### Variant 8: Fountain/Upward Jet Particle System
+
+**Difference from base version**: Particles jet upward from a single point, follow parabolic arcs, then fall back. Stateless single pass.
+
+Design points:
+- **Must include three layers**: (1) Main water column particles (upward jet + parabola) (2) Splash particles (flying sideways upon hitting water) (3) Water surface/pool visuals
+- **Particles must be sharp, visible individual points**: Use small epsilon (<= 0.002) with small numerator; must not only produce diffuse glow
+- Parabolic motion: `pos = base + vel0 * t + 0.5 * gravity * t²`
+- Ground clipping: `if (pos.y < waterLevel) continue;`
+- Brightness budget: 60 main particles + 40 splash particles, each with epsilon in the 0.001-0.002 range
+
+### Variant 10: Spiral Array/Magic Particle System
+
+**Difference from base version**: Particles arranged in spiral or geometric arrays, producing magic circles, magic dust, and similar effects. Particles feature iridescent color variation. Stateless single pass.
+
+Design points:
+- **Must have discrete visible particles**: Each particle must be an individually visible small light point, not just blurry glow blobs. Use small epsilon (0.0004-0.0006) for sufficient sharpness
+- Spiral trajectory: `angle = baseAngle + norm * spiralTurns + time * rotSpeed`, `radius` increases with norm
+- Magic circles use independent ring particle layers with uniformly distributed angles + time rotation, using elliptical projection to simulate 3D perspective
+- Iridescent effect: `hue = fract(particleId / total + time * speed + norm * shift)`, hue varies continuously with ID and time, covering the full color wheel
+- Starlight shimmer: `shimmer = 0.7 + 0.3 * sin(time * freq + particleId * phase)` controls each particle's brightness pulsation
+- Two-layer structure: (1) Spiral ascending particle stream (2) Horizontally rotating magic circle light point rings
+
+## Brightness Budget Quick Reference
+
+Single pass system: **N × (numerator / epsilon) < 5.0**
+
+| Particle Count | Recommended numerator | Recommended epsilon | Single Particle Peak | Total Peak (no fade) |
+|--------|---------------|-------------|-----------|------------------|
+| 40     | 0.015         | 0.03        | 0.5       | 20 → Reinhard OK |
+| 80     | 0.008         | 0.015       | 0.53      | 42 → Reinhard OK |
+| 150    | 0.002         | 0.008       | 0.25      | 37 → Reinhard OK |
+| 200    | 0.001         | 0.005       | 0.2       | 40 → Reinhard OK |
+
+Multi-pass ping-pong system: **N × (numerator / epsilon) × 1/(1-decay) < 10.0**
+
+| decay | Amplification Factor | 20 Particle Peak Limit | 50 Particle Peak Limit | 100 Particle Peak Limit |
+|-------|---------|---------------|---------------|----------------|
+| 0.88  | 8.3x   | 0.06          | 0.024         | 0.012          |
+| 0.92  | 12.5x  | 0.04          | 0.016         | 0.008          |
+| 0.95  | 20x    | 0.025         | 0.01          | 0.005          |
+
+## Performance Optimization In-Depth Analysis
+
+### 1. Particle Count and Loop Overhead
+- **Bottleneck**: Every pixel iterates over all particles (O(W×H×N)); particle count is the biggest performance lever.
+- **Optimization**: Reducing particle count from 200 to 80 may have little visual difference but doubles performance. Early exit optimization can also help:
+```glsl
+float dist = length(uv - ppos);
+if (dist > 0.1) continue; // adjustable: skip particles beyond influence range
+```
+- Note: The early exit threshold must be tuned based on particle brightness / influence radius; too small causes abrupt particle edge cutoff
+
+### 2. Frame Feedback as Substitute for High Particle Count
+- **Technique**: Few particles + frame feedback trails (`prev * 0.95 + current`) visually equals many more particles. Drawing 50 particles per frame + accumulation = visual density far exceeding 50.
+- This approach has the additional benefit of producing natural motion blur
+- Requires an additional Buffer pass for accumulated frames
+
+### 3. N-body Interaction Complexity
+- **Bottleneck**: Each particle interacts with all others = O(N²). Becomes very slow when N > 100.
+- **Optimization A**: Only interact with K nearest neighbors (using Voronoi tracking acceleration structure, see "Combining with Voronoi Spatial Acceleration Structure" below).
+- **Optimization B**: Divide space into grid cells, only check particles in adjacent cells. Implementing the grid on GPU requires additional Buffer passes to maintain grid data.
+
+### 4. Sub-frame Stepping
+- **Problem**: High-speed particles move multiple pixels per frame, leaving discontinuous trajectories.
+- **Optimization**: Perform multiple small steps per frame for each particle, accumulating rendering along the way:
+```glsl
+const int stepsPerFrame = 7; // adjustable
+for (int j = 0; j < stepsPerFrame; j++) {
+    // Render particle contribution at this position
+    pos += vel * 0.002 * 0.2;
+}
+col /= float(stepsPerFrame);
+```
+- More sub-frames produce more continuous trajectories but linearly increase computational cost
+- Suitable for firework explosions, high-speed bullet curtains, etc.
+
+### 5. Precision and Numerical Stability
+- Velocity and acceleration need clamping to prevent numerical explosion:
+```glsl
+float v = length(vel);
+vel *= v > MAX_VEL ? MAX_VEL / v : 1.0;
+```
+- Verlet integration is more stable than Euler in constraint solving, especially for cloth and spring networks
+- For long-running simulations, be aware of floating-point precision errors accumulating over time
+
+## Combination Suggestions with Complete Code
+
+### Combining with Raymarching Scenes
+Particle systems are often embedded in Raymarching scenes (e.g., rain, sparks, dust). Method: During the Raymarching step loop, simultaneously sample particle density/positions and overlay onto scene color. Or render particles to a separate Buffer and blend during final compositing.
+
+### Combining with Noise / Flow Fields
+Use Simplex/Perlin noise to generate a velocity field; particles move along the noise gradient:
+```glsl
+// Use noise to drive particle velocity
+vel += hash33(vel + time) * 7.0; // random perturbation
+vel = mix(vel, -pos * pow(length(pos), 0.75), 0.5 + 0.5 * sin(time)); // center attraction
+```
+This combination is suitable for "neural synapse", "smoke flow", and other organic effects.
+
+### Combining with Post-Processing
+- **Bloom**: Apply Gaussian blur to particle rendering output and overlay, enhancing the glow.
+- **Chromatic Aberration**: Offset-sample R/G/B channels separately, adding a lens effect.
+- **Tone Mapping**: Apply Reinhard mapping `col = col / (1.0 + col)` to HDR particle brightness.
+
+### Combining with SDF Shape Rendering
+Render particles as specific SDF shapes (fish, water drops, sparks) instead of abstract light points. Method: Rotate local coordinates based on particle velocity direction, then compute SDF distance in that coordinate system:
+```glsl
+float sdFish(vec2 p, float angle) {
+    float c = cos(angle), s = sin(angle);
+    p *= 20.0 * mat2(c, s, -s, c);
+    return max(min(length(p), length(p - vec2(0.56, 0.0))) - 0.3, -min(length(p - vec2(0.8, 0.0)) - 0.45, length(p + vec2(0.14, 0.0)) - 0.12)) * 0.05;
+}
+```
+
+### Combining with Voronoi Spatial Acceleration Structure
+For large-scale particles (thousands), use Voronoi tracking acceleration structure instead of brute-force traversal. Each pixel maintains the IDs of the 4 nearest particles, updated through neighborhood propagation. This reduces rendering and physics queries from O(N) to O(1) (fixed neighborhood query per pixel). Suitable for fluid simulation and large-scale swarm behavior.
--- a/skills/shader-dev/reference/path-tracing-gi.md
+++ b/skills/shader-dev/reference/path-tracing-gi.md
@@ -0,0 +1,602 @@
+# Path Tracing & Global Illumination - Detailed Reference
+
+This document is a complete reference for [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step detailed explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL basic syntax**: ShaderToy multi-pass (Buffer A/B/Image) architecture
+- **Vector math**: Dot product, cross product, reflection/refraction vector computation
+- **Probability fundamentals**: PDF (probability density function), Monte Carlo integration, importance sampling
+- **Rendering equation** basic form: $L_o = L_e + \int f_r \cdot L_i \cdot \cos\theta \, d\omega$
+- **Ray-geometry intersection** methods (spheres, planes, SDF)
+
+## Core Principles in Detail
+
+Path tracing solves the rendering equation via Monte Carlo methods. For each pixel, a ray is emitted from the camera and bounces through the scene. At each bounce:
+
+1. **Intersection**: Find the nearest intersection of the ray with the scene
+2. **Shading**: Compute the lighting contribution at the current node based on material type (diffuse/specular/refractive)
+3. **Sample next direction**: Generate the next bounce ray according to the BRDF/BSDF
+4. **Accumulate**: Add the weighted lighting contributions from all nodes along the path
+
+### Core Mathematics
+
+- **Rendering equation**: $L_o(x, \omega_o) = L_e(x, \omega_o) + \int_\Omega f_r(x, \omega_i, \omega_o) L_i(x, \omega_i) (\omega_i \cdot n) d\omega_i$
+- **Monte Carlo estimate**: $L \approx \frac{1}{N} \sum \frac{f_r \cdot L_i \cdot \cos\theta}{p(\omega)}$
+- **Schlick Fresnel**: $F = F_0 + (1 - F_0)(1 - \cos\theta)^5$
+- **Cosine-weighted sampling PDF**: $p(\omega) = \frac{\cos\theta}{\pi}$
+
+### Key Design
+
+An **iterative loop** replaces recursion, using two variables — `acc` (accumulated radiance) and `mask/throughput` (path attenuation) — to track path contributions. At each bounce, the material color is multiplied into the throughput, and self-emission and direct lighting are added to acc.
+
+## Implementation Steps in Detail
+
+### Step 1: Pseudorandom Number Generator
+
+**What**: Provide a different random number sequence per pixel per frame, driving all Monte Carlo sampling.
+
+**Why**: All random decisions in path tracing (direction sampling, Russian roulette, Fresnel selection) depend on random numbers. The seed must be sufficiently decorrelated between pixels and frames; otherwise structured noise will appear.
+
+**Method 1: sin-hash (simple, good for getting started)**
+```glsl
+float seed;
+float rand() { return fract(sin(seed++) * 43758.5453123); }
+// Initialization: seed = iTime + iResolution.y * fragCoord.x / iResolution.x + fragCoord.y / iResolution.y;
+```
+
+**Method 2: Integer hash (better quality, recommended)**
+```glsl
+int iSeed;
+int irand() { iSeed = iSeed * 0x343fd + 0x269ec3; return (iSeed >> 16) & 32767; }
+float frand() { return float(irand()) / 32767.0; }
+void srand(ivec2 p, int frame) {
+    int n = frame;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.y;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.x;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    iSeed = n;
+}
+```
+
+The sin-hash may produce periodic artifacts on some GPUs (due to inconsistent sin precision across hardware). The integer hash is more reliable and uniform. The Visual Studio LCG (`0x343fd`) is a commonly used linear congruential generator.
+
+### Step 2: Ray-Scene Intersection
+
+**What**: Given a ray origin and direction, find the nearest intersection along with normal and material information at the intersection point.
+
+**Why**: This is the fundamental operation of path tracing. Either analytic geometry (spheres, planes) or SDF ray marching can be used.
+
+**Analytic sphere intersection (classic smallpt approach)**
+```glsl
+struct Ray { vec3 o, d; };
+struct Sphere { float r; vec3 p, e, c; int refl; };
+
+float intersectSphere(Sphere s, Ray r) {
+    vec3 op = s.p - r.o;
+    float b = dot(op, r.d);
+    float det = b * b - dot(op, op) + s.r * s.r;
+    if (det < 0.) return 0.;
+    det = sqrt(det);
+    float t = b - det;
+    if (t > 1e-3) return t;
+    t = b + det;
+    return t > 1e-3 ? t : 0.;
+}
+```
+
+Derivation: Ray $r(t) = o + td$, sphere $|p - c|^2 = R^2$, substitution yields quadratic $t^2 - 2b \cdot t + c = 0$, where $b = (c - o) \cdot d$, discriminant $\Delta = b^2 - |c - o|^2 + R^2$. The epsilon of `1e-3` prevents self-intersection.
+
+**SDF ray marching (for complex geometry)**
+```glsl
+float map(vec3 p) { /* returns distance to nearest surface */ }
+
+float raymarch(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 256; i++) {
+        float h = map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t;
+}
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.0001, 0.);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)));
+}
+```
+
+The principle of SDF marching: each step safely advances by the "distance to the nearest surface," ensuring no surface is crossed. The step count (128-256) and threshold (0.0001) represent a tradeoff between accuracy and performance.
+
+### Step 3: Cosine-Weighted Hemisphere Sampling
+
+**What**: Generate a random direction distributed according to cosine weighting on the hemisphere above the surface normal, used for diffuse bounces.
+
+**Why**: Cosine-weighted sampling (Malley's method) matches the Lambertian BRDF distribution with PDF $\cos\theta / \pi$, simplifying BRDF/PDF to just the albedo and greatly reducing variance.
+
+With uniform hemisphere sampling (PDF = $1/2\pi$), each bounce would need an extra multiplication by $\cos\theta \cdot 2$, and variance would be higher since many sample directions contribute very little to the integral.
+
+**Method 1: fizzer method (most concise)**
+```glsl
+vec3 cosineDirection(vec3 nor) {
+    float u = frand();
+    float v = frand();
+    float a = 6.2831853 * v;
+    float b = 2.0 * u - 1.0;
+    vec3 dir = vec3(sqrt(1.0 - b * b) * vec2(cos(a), sin(a)), b);
+    return normalize(nor + dir); // fizzer method
+}
+```
+
+Principle: Uniformly sampling a point on the unit sphere and adding the normal direction, then normalizing, naturally produces a cosine distribution. This works because uniform points on the unit sphere, projected onto the hemisphere above the normal, naturally form a cosine distribution.
+
+**Method 2: Classic ONB construction (more intuitive)**
+```glsl
+vec3 cosineDirectionONB(vec3 n) {
+    vec2 r = vec2(frand(), frand());
+    vec3 u = normalize(cross(n, vec3(0., 1., 1.)));
+    vec3 v = cross(u, n);
+    float ra = sqrt(r.y);
+    float rx = ra * cos(6.2831853 * r.x);
+    float ry = ra * sin(6.2831853 * r.x);
+    float rz = sqrt(1.0 - r.y);
+    return normalize(rx * u + ry * v + rz * n);
+}
+```
+
+Principle: First build an orthonormal basis (ONB) with n as the z-axis, then sample in local coordinates using Malley's method: map uniform random numbers onto the unit disk ($r = \sqrt{\xi_2}$, $\phi = 2\pi\xi_1$), with z-component $\sqrt{1 - r^2}$.
+
+### Step 4: Material System and BRDF Evaluation
+
+**What**: Based on the material type at the intersection (diffuse, specular, refractive), determine the ray's next direction and energy attenuation.
+
+**Why**: Different materials respond to light completely differently. Diffuse scatters randomly, specular reflects perfectly, and refractive materials follow Snell's law. The Fresnel effect determines the reflection/refraction ratio.
+
+```glsl
+#define MAT_DIFFUSE  0
+#define MAT_SPECULAR 1
+#define MAT_DIELECTRIC 2
+```
+
+**Diffuse**:
+- New direction = `cosineDirection(normal)`
+- `throughput *= albedo`
+- Because cosine-weighted sampling is used, BRDF($1/\pi$) * $\cos\theta$ / PDF($\cos\theta/\pi$) = 1, so throughput only needs to be multiplied by albedo
+
+**Specular**:
+- New direction = `reflect(rd, normal)`
+- `throughput *= albedo`
+- A perfect mirror's BRDF is a delta function; only one direction contributes
+
+**Refractive (glass)**:
+```glsl
+void handleDielectric(inout Ray r, vec3 n, vec3 x, float ior,
+                      vec3 albedo, inout vec3 mask) {
+    float cosi = dot(n, r.d);
+    float eta = cosi > 0. ? ior : 1.0 / ior;       // Entering/leaving medium
+    vec3 nl = cosi > 0. ? -n : n;                    // Outward-facing normal
+    cosi = abs(cosi);
+
+    float cos2t = 1.0 - eta * eta * (1.0 - cosi * cosi);
+    r = Ray(x, reflect(r.d, n));                      // Default to reflection
+
+    if (cos2t > 0.) {
+        vec3 tdir = normalize(r.d / eta + nl * (cosi / eta - sqrt(cos2t)));
+        // Schlick Fresnel
+        float R0 = ((ior - 1.) * (ior - 1.)) / ((ior + 1.) * (ior + 1.));
+        float c = 1.0 - (cosi > 0. ? dot(tdir, n) : cosi);
+        float Re = R0 + (1.0 - R0) * c * c * c * c * c;
+        float P = 0.25 + 0.5 * Re;
+        if (frand() < P) {
+            mask *= Re / P;                            // Reflection
+        } else {
+            mask *= albedo * (1.0 - Re) / (1.0 - P);  // Refraction
+            r = Ray(x, tdir);
+        }
+    }
+}
+```
+
+Key points:
+- **Snell's law**: $n_1 \sin\theta_1 = n_2 \sin\theta_2$; total internal reflection occurs when $\sin\theta_2 > 1$
+- **Schlick approximation**: $R(\theta) = R_0 + (1-R_0)(1-\cos\theta)^5$, where $R_0 = ((n_1-n_2)/(n_1+n_2))^2$
+- **Russian Roulette selection**: Instead of selecting directly by `Re`, an adjusted probability `P = 0.25 + 0.5 * Re` is used, then compensated through the mask. This avoids the problem of almost always choosing refraction when Re is low
+
+### Step 5: Direct Light Sampling (Next Event Estimation)
+
+**What**: At each diffuse intersection, directly cast a shadow ray toward the light source to compute direct lighting contribution.
+
+**Why**: Purely random paths are unlikely to hit small-area light sources. Directly sampling light sources greatly reduces variance and accelerates convergence.
+
+```glsl
+// Solid angle sampling of spherical light source
+vec3 directLighting(vec3 x, vec3 n, vec3 albedo,
+                    vec3 lightPos, float lightRadius, vec3 lightEmission,
+                    int selfId) {
+    vec3 l0 = lightPos - x;
+    float cos_a_max = sqrt(1.0 - clamp(lightRadius * lightRadius / dot(l0, l0), 0., 1.));
+    float cosa = mix(cos_a_max, 1.0, frand());
+    float sina = sqrt(1.0 - cosa * cosa);
+    float phi = 6.2831853 * frand();
+
+    // Sample within the cone toward the light source
+    vec3 w = normalize(l0);
+    vec3 u = normalize(cross(w.yzx, w));
+    vec3 v = cross(w, u);
+    vec3 l = (u * cos(phi) + v * sin(phi)) * sina + w * cosa;
+
+    // Shadow test
+    if (shadowTest(Ray(x, l), selfId, lightId)) {
+        float omega = 6.2831853 * (1.0 - cos_a_max); // Solid angle
+        return albedo * lightEmission * clamp(dot(l, n), 0., 1.) * omega / 3.14159265;
+    }
+    return vec3(0.);
+}
+```
+
+Mathematical derivation:
+- Solid angle subtended by spherical light at the shading point: $\omega = 2\pi(1 - \cos\alpha_{max})$, where $\cos\alpha_{max} = \sqrt{1 - R^2/d^2}$
+- PDF for uniform sampling within the cone: $p = 1/\omega$
+- Direct lighting contribution: $L_{direct} = \frac{f_r \cdot L_e \cdot \cos\theta_{light}}{p} = albedo \cdot L_e \cdot \cos\theta \cdot \omega / \pi$
+
+Note: With NEE enabled, indirect bounces that hit the light source should **not** accumulate its emission again (to avoid double-counting). However, in smallpt-style implementations where the light source is large, this double-counting has negligible impact. The strict approach is to skip the indirect hit light emission when NEE is active.
+
+### Step 6: Path Tracing Main Loop
+
+**What**: Combine all the above modules into a complete path tracer.
+
+**Why**: The iterative structure avoids GLSL's lack of recursion support, while the throughput/acc pattern is the standard path tracing implementation paradigm.
+
+```glsl
+#define MAX_BOUNCES 8       // Adjustable: max bounce count; more = more accurate indirect lighting
+#define ENABLE_NEE true     // Adjustable: whether to enable direct light sampling
+
+vec3 pathtrace(Ray r) {
+    vec3 acc = vec3(0.);        // Accumulated radiance
+    vec3 throughput = vec3(1.); // Path attenuation (throughput)
+
+    for (int depth = 0; depth < MAX_BOUNCES; depth++) {
+        // 1. Intersection
+        float t;
+        vec3 n, albedo, emission;
+        int matType;
+        if (!intersectScene(r, t, n, albedo, emission, matType))
+            break; // Shot into the sky
+
+        vec3 x = r.o + r.d * t;
+        vec3 nl = dot(n, r.d) < 0. ? n : -n; // Outward-facing normal
+
+        // 2. Accumulate emission
+        acc += throughput * emission;
+
+        // 3. Russian roulette (starting from bounce 3)
+        if (depth > 2) {
+            float p = max(throughput.r, max(throughput.g, throughput.b));
+            if (frand() > p) break;
+            throughput /= p;
+        }
+
+        // 4. Sample based on material
+        if (matType == MAT_DIFFUSE) {
+            // Direct light sampling (NEE)
+            if (ENABLE_NEE)
+                acc += throughput * directLighting(x, nl, albedo, ...);
+            // Indirect bounce
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, cosineDirection(nl));
+
+        } else if (matType == MAT_SPECULAR) {
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, reflect(r.d, n));
+
+        } else if (matType == MAT_DIELECTRIC) {
+            handleDielectric(r, n, x, 1.5, albedo, throughput);
+        }
+    }
+    return acc;
+}
+```
+
+Key design points:
+- `acc` accumulates the final color, `throughput` records the attenuation from all materials along the path
+- Russian roulette maintains **unbiasedness**: termination probability is $1-p$, surviving paths divide throughput by $p$, so the expected value is unchanged
+- Normal offset (`x + nl * 1e-3`) prevents self-intersection due to floating-point precision
+
+### Step 7: Progressive Accumulation and Display
+
+**What**: Perform weighted averaging of multi-frame results, progressively converging to a noise-free image. Apply tone mapping and gamma correction for display.
+
+**Why**: A single frame of path tracing is extremely noisy. Through multi-frame accumulation, sample count grows linearly and noise decreases as $1/\sqrt{N}$.
+
+**Buffer A (path tracing + accumulation)**
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    srand(ivec2(fragCoord), iFrame);
+    // ... camera setup, ray generation ...
+    vec3 color = pathtrace(ray);
+
+    // Progressive accumulation
+    vec4 prev = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    if (iFrame == 0) prev = vec4(0.);
+    fragColor = prev + vec4(color, 1.0);
+}
+```
+
+Accumulation strategy: Store each frame's color and sample count in RGBA (RGB = color accumulation, A = sample count accumulation). Divide by A when displaying to get the average. Clear to zero when `iFrame == 0` to handle ShaderToy's edit reset.
+
+**Image Pass (tone mapping + gamma)**
+```glsl
+vec3 ACES(vec3 x) {
+    float a = 2.51, b = 0.03, c = 2.43, d = 0.59, e = 0.14;
+    return (x * (a * x + b)) / (x * (c * x + d) + e);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec4 data = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    vec3 col = data.rgb / max(data.a, 1.0);
+
+    col = ACES(col);                         // Tone mapping
+    col = pow(col, vec3(1.0 / 2.2));         // Gamma correction
+
+    // Optional: vignette
+    vec2 uv = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+ACES tone mapping compresses HDR radiance values into the [0,1] LDR range while preserving detail in highlights and shadows. Gamma correction (2.2) converts linear color space to sRGB display space.
+
+## Common Variants in Detail
+
+### 1. SDF Scene Path Tracing
+
+**Difference from base version**: Replaces analytic sphere intersection with SDF ray marching, supporting arbitrarily complex geometry (fractals, boolean operations, etc.).
+
+Challenges of SDF path tracing:
+- SDF marching is much slower than analytic intersection (each step requires 128+ iterations)
+- Numerical normals (central difference) are needed at each bounce, adding 6 extra `map()` calls
+- Self-intersection issues are more severe, requiring larger epsilon offsets
+
+```glsl
+float map(vec3 p) {
+    float d = p.y + 0.5;                        // Ground
+    d = min(d, length(p - vec3(0., 0.4, 0.)) - 0.4); // Sphere
+    return d;
+}
+
+float intersectScene(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 128; i++) {
+        float h = map(ro + rd * t);
+        if (h < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t < tmax ? t : -1.0;
+}
+// Normal via central difference: calcNormal()
+// Materials distinguished by ID returned from map()
+```
+
+### 2. Disney BRDF Path Tracing
+
+**Difference from base version**: Replaces simple Lambert + perfect mirror with the Disney principled BRDF, supporting metallic/roughness parameterized PBR materials.
+
+Core components of the Disney BRDF:
+- **GGX normal distribution (D)**: Describes the statistical distribution of microsurface normals; higher roughness = wider distribution
+- **Smith occlusion function (G)**: Accounts for self-shadowing between microsurfaces
+- **Fresnel term (F)**: Schlick approximation; metallic controls F0 (metals: F0 = albedo, dielectrics: F0 = 0.04)
+- **VNDF sampling**: Visible Normal Distribution Function sampling, more efficient than traditional GGX sampling
+
+```glsl
+struct Material {
+    vec3 albedo;
+    float metallic;   // 0=dielectric, 1=metal
+    float roughness;  // 0=smooth, 1=rough
+};
+
+// GGX normal distribution
+float D_GGX(float a2, float NoH) {
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * d * d);
+}
+
+// Smith occlusion function
+float G_Smith(float NoV, float NoL, float a2) {
+    float g1 = (2.0 * NoV) / (NoV + sqrt(a2 + (1.0 - a2) * NoV * NoV));
+    float g2 = (2.0 * NoL) / (NoL + sqrt(a2 + (1.0 - a2) * NoL * NoL));
+    return g1 * g2;
+}
+
+// VNDF sampling for importance sampling GGX
+vec3 SampleGGXVNDF(vec3 V, float ax, float ay, float r1, float r2) {
+    vec3 Vh = normalize(vec3(ax * V.x, ay * V.y, V.z));
+    float lensq = Vh.x * Vh.x + Vh.y * Vh.y;
+    vec3 T1 = lensq > 0. ? vec3(-Vh.y, Vh.x, 0) * inversesqrt(lensq) : vec3(1, 0, 0);
+    vec3 T2 = cross(Vh, T1);
+    float r = sqrt(r1);
+    float phi = 2.0 * PI * r2;
+    float t1 = r * cos(phi), t2 = r * sin(phi);
+    float s = 0.5 * (1.0 + Vh.z);
+    t2 = (1.0 - s) * sqrt(1.0 - t1 * t1) + s * t2;
+    vec3 Nh = t1 * T1 + t2 * T2 + sqrt(max(0., 1. - t1*t1 - t2*t2)) * Vh;
+    return normalize(vec3(ax * Nh.x, ay * Nh.y, max(0., Nh.z)));
+}
+```
+
+When using the Disney BRDF in path tracing, the sampling strategy typically is:
+- Use metallic as the probability to choose between diffuse and specular
+- Diffuse uses cosine-weighted sampling
+- Specular uses VNDF sampling for GGX
+
+### 3. Depth of Field
+
+**Difference from base version**: Uses a thin lens model to simulate the bokeh effect of real cameras.
+
+Principle of the thin lens model: All rays passing through the focal point converge to the same point. By randomly offsetting the ray origin within the aperture while keeping the target point on the focal plane unchanged, the depth of field effect can be simulated.
+
+```glsl
+#define APERTURE 0.12    // Adjustable: aperture size; larger = stronger bokeh
+#define FOCUS_DIST 8.0   // Adjustable: focus distance
+
+// In mainImage, after generating the ray:
+vec3 focalPoint = ro + rd * FOCUS_DIST;
+vec3 offset = ca * vec3(uniformDisk() * APERTURE, 0.);
+ro += offset;
+rd = normalize(focalPoint - ro);
+
+vec2 uniformDisk() {
+    vec2 r = vec2(frand(), frand());
+    float a = 6.2831853 * r.x;
+    return sqrt(r.y) * vec2(cos(a), sin(a));
+}
+```
+
+Parameter tuning suggestions:
+- `APERTURE`: 0.01 (almost no bokeh) to 0.5 (strong bokeh)
+- `FOCUS_DIST`: Set to the distance from the camera to the object you want in sharp focus
+- Bokeh effects require more samples to converge (since an extra random dimension is added)
+
+### 4. Multiple Importance Sampling (MIS)
+
+**Difference from base version**: Uses both BRDF sampling and light source sampling simultaneously, combining them with the power heuristic, achieving low variance across all scene configurations.
+
+Core idea of MIS: A single sampling strategy may have high variance in certain scene configurations (e.g., NEE performs poorly on glossy surfaces, BRDF sampling performs poorly with small light sources). MIS combines multiple strategies to compensate for each other's weaknesses.
+
+```glsl
+// Power heuristic (beta=2)
+float misWeight(float pdfA, float pdfB) {
+    float a2 = pdfA * pdfA;
+    float b2 = pdfB * pdfB;
+    return a2 / (a2 + b2);
+}
+
+// During shading, compute both:
+// 1. BRDF sampled direction -> if it hits a light, weight with misWeight(brdfPdf, lightPdf)
+// 2. Light sampled direction -> weight with misWeight(lightPdf, brdfPdf)
+// Sum of both replaces the single sampling strategy
+```
+
+The power heuristic ($\beta=2$) formula: $w_A = p_A^2 / (p_A^2 + p_B^2)$. Veach proved in his thesis that this is nearly optimal.
+
+### 5. Volumetric Path Tracing (Participating Media)
+
+**Difference from base version**: Performs random walks inside the medium, simulating translucent/subsurface scattering effects via Beer-Lambert attenuation and scattering events.
+
+Core concepts of volumetric rendering:
+- **Extinction coefficient** = absorption + scattering
+- **Beer-Lambert law**: Transmittance $T = e^{-\sigma_t \cdot d}$
+- **Scattering event**: Scattering occurs with probability $\sigma_s / \sigma_t$ (vs. absorption)
+- **Phase function**: Determines the distribution of scattering directions. Uniform sphere sampling = isotropic scattering, Henyey-Greenstein = controllable forward/backward scattering
+
+```glsl
+// Beer-Lambert transmittance attenuation
+vec3 transmittance = exp(-extinction * distance);
+
+// Random walk scattering
+float scatterDist = -log(frand()) / extinctionMajorant;
+if (scatterDist < hitDist) {
+    // Scattering event occurs
+    pos += ray.d * scatterDist;
+    // Sample new direction with phase function (e.g., uniform or Henyey-Greenstein)
+    ray.d = uniformSphereSample();
+    throughput *= albedo; // scattering / extinction
+}
+```
+
+Henyey-Greenstein phase function:
+- Parameter g in [-1, 1]: g > 0 forward scattering, g < 0 backward scattering, g = 0 isotropic
+- $p(\cos\theta) = \frac{1-g^2}{4\pi(1+g^2-2g\cos\theta)^{3/2}}$
+
+## Performance Optimization Details
+
+### 1. Sampling Strategy
+1-4 samples per pixel per frame, relying on inter-frame accumulation for convergence. This maintains real-time frame rates while eventually reaching high quality. For ShaderToy, `SAMPLES_PER_FRAME = 1` or `2` is usually the best choice, since more samples per frame lower the frame rate without accelerating visual convergence.
+
+### 2. Russian Roulette
+Starting from bounce 3-4, use the maximum throughput component as the survival probability. This terminates low-energy paths early while maintaining unbiasedness.
+```glsl
+float p = max(throughput.r, max(throughput.g, throughput.b));
+if (frand() > p) break;
+throughput /= p;
+```
+Mathematical guarantee: Termination probability $q = 1 - p$, surviving path throughput multiplied by $1/p$, so the expected value $E[L] = p \cdot L/p + (1-p) \cdot 0 = L$, unbiased.
+
+### 3. Direct Light Sampling (NEE)
+Always explicitly sample the light source on diffuse surfaces, avoiding dependence on random paths hitting the light. Particularly significant for small-area light sources. When the light source subtends a very small fraction of the hemisphere's solid angle, pure BRDF sampling can almost never hit the light; NEE is essential.
+
+### 4. Avoiding Self-Intersection
+Offset the intersection point along the normal direction (epsilon = 1e-3 ~ 1e-4), or record the last-hit object ID and skip self-intersection. Both approaches have tradeoffs:
+- Normal offset: Simple and universal, but may penetrate thin objects
+- ID skipping: Precise, but not suitable for concave objects (which may need self-intersection)
+
+### 5. Firefly Suppression
+Clamp extreme brightness with `min(color, 10.)` to prevent firefly noise spots. ACES tone mapping also helps compress high dynamic range. The root cause of fireflies is that certain paths find high-energy but low-probability light transport paths, resulting in extremely large Monte Carlo estimate values.
+
+### 6. SDF Scene Optimization
+- Limit the maximum marching steps (128-256); treat exceeding the limit as a miss
+- Set a reasonable maximum trace distance (tmax) to cull distant objects
+- Use larger epsilon during bounces (SDF numerical precision is typically worse than analytic geometry)
+- "Relaxed sphere tracing" can be used to increase step size when safe
+
+### 7. High-Quality PRNG
+Use integer hashes (such as Visual Studio LCG or Wang hash) instead of sin-hash to avoid periodic artifacts on some GPUs. The problem with sin-hash is that sin precision differs across GPUs (some use only mediump), which can produce visible structured noise.
+
+## Combination Suggestions in Detail
+
+### 1. Path Tracing + SDF Modeling
+Use SDF to define complex scene geometry (fractals, smooth boolean operations) while path tracing handles lighting computation. This is the most common combination on ShaderToy. SDF's advantage is the ability to easily create shapes difficult to express with traditional meshes (Mandelbulb, Menger sponge, etc.), while path tracing provides physically accurate lighting for these complex geometries.
+
+### 2. Path Tracing + Environment Maps
+Use an HDR cubemap as an infinitely distant environment light source. When a path shoots into the sky, sample the environment map for incident radiance. Can be combined with atmospheric scattering models for a more physically accurate sky.
+```glsl
+// When path misses the scene:
+if (!hit) {
+    acc += throughput * texture(iChannel1, rd).rgb; // HDR environment map
+    break;
+}
+```
+
+### 3. Path Tracing + PBR Materials
+The Disney BRDF/BSDF provides metallic/roughness parameterized material models, combined with GGX microsurface distribution and VNDF importance sampling for production-quality results. In ShaderToy, material parameters can be generated procedurally (based on position, noise, etc.).
+
+### 4. Path Tracing + Volumetric Rendering
+Add participating media to the path tracing framework, using Beer-Lambert law for transmittance and random walks for scattering, to achieve clouds, smoke, subsurface scattering, and other effects.
+```glsl
+// Add volume check in the path tracing loop:
+if (insideVolume) {
+    float scatterDist = -log(frand()) / sigma_t;
+    if (scatterDist < surfaceDist) {
+        // Volume scattering event
+        x = r.o + r.d * scatterDist;
+        r.d = samplePhaseFunction(r.d, g);
+        throughput *= sigma_s / sigma_t; // albedo
+        continue;
+    }
+}
+```
+
+### 5. Path Tracing + Spectral Rendering
+Each path samples a single wavelength instead of RGB, using Sellmeier/Cauchy equations to compute wavelength-dependent index of refraction, and finally converts to sRGB through CIE XYZ color matching functions. This correctly simulates dispersion and rainbow caustics.
+
+Basic spectral rendering workflow:
+1. Each path randomly selects a wavelength λ in [380, 780] nm
+2. Compute the index of refraction for that wavelength using the Sellmeier equation: $n^2 = 1 + \sum B_i \lambda^2 / (\lambda^2 - C_i)$
+3. All color computations in path tracing become single-channel (spectral power at that wavelength)
+4. Finally convert spectral radiance to XYZ via CIE XYZ color matching functions, then to sRGB
+
+### 6. Path Tracing + Temporal Accumulation / TAA
+Leverage ShaderToy's inter-frame buffer feedback mechanism for progressive rendering. Can be further extended to temporal reprojection, reusing historical frame data during camera movement to accelerate convergence.
+
+Basic temporal reprojection:
+1. Store the previous frame's camera matrix
+2. Reproject the current pixel into the previous frame's screen space
+3. If the position is valid and geometrically consistent, blend the historical frame with the current frame
+4. Otherwise discard historical data and restart accumulation
--- a/skills/shader-dev/reference/polar-uv-manipulation.md
+++ b/skills/shader-dev/reference/polar-uv-manipulation.md
@@ -0,0 +1,521 @@
+# Polar Coordinates & UV Manipulation — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, variant details, in-depth performance analysis, and complete combination code examples.
+
+## Prerequisites
+
+### GLSL Fundamentals
+- **uniform / varying**: Global variable passing mechanisms
+- **Built-in functions**: `sin`, `cos`, `atan`, `length`, `fract`, `mod`, `smoothstep`, `mix`, `clamp`, `pow`, `exp`, `log`, `abs`, `max`, `min`, `floor`, `ceil`, `dot`
+- **Vector types**: `vec2`, `vec3`, `vec4`, with swizzle support (e.g., `.xy`, `.rgb`)
+- **Matrix types**: `mat2` for 2D rotation
+
+### Vector Math
+- 2D vector operations: addition, subtraction, multiplication, division, length (`length`), normalization (`normalize`)
+- Dot product (`dot`): projection and angle relationships
+- 2D rotation matrix:
+```glsl
+mat2 rotate(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c);
+}
+```
+
+### Coordinate Systems
+- Cartesian coordinates (x, y): standard rectangular coordinate system
+- Screen coordinates: bottom-left (0,0), top-right (iResolution.x, iResolution.y)
+- Normalized coordinates: typically mapped to [-1, 1] or [0, 1] range
+
+### ShaderToy Framework
+- `mainImage(out vec4 fragColor, in vec2 fragCoord)`: entry function
+- `fragCoord`: current pixel's screen coordinates
+- `iResolution`: viewport resolution (pixels)
+- `iTime`: time since launch (seconds)
+- `iMouse`: mouse position
+
+## Implementation Steps
+
+### Step 1: UV Normalization and Centering
+
+**What**: Convert screen pixel coordinates to normalized coordinates centered at the screen center with uniform scaling.
+
+**Why**: All subsequent polar coordinate operations depend on a correct center point and uniform scale. Without this step, effects would be offset or stretched.
+
+**Three approaches compared**:
+
+| Approach | Range | Use Case |
+|----------|-------|----------|
+| `/ min(iResolution.x, iResolution.y)` | [-1, 1] square region | Most universal, ensures circles stay circular |
+| `/ iResolution.y` | [-aspect, aspect] × [-1, 1] | When full screen width is needed |
+| Pixel quantization | Depends on PIXEL_FILTER | Pixelated/retro style |
+
+```glsl
+// Approach 1: range [-1, 1], most common
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / min(iResolution.x, iResolution.y);
+
+// Approach 2: range [-aspect, aspect] x [-1, 1]
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Approach 3: precise pixel size control (precise pixel size control)
+float pixel_size = length(iResolution.xy) / PIXEL_FILTER; // PIXEL_FILTER adjustable: pixelation level
+vec2 uv = (floor(fragCoord * (1.0/pixel_size)) * pixel_size - 0.5*iResolution.xy) / length(iResolution.xy);
+```
+
+### Step 2: Cartesian to Polar Coordinate Transform
+
+**What**: Convert (x, y) coordinates to (r, θ) polar coordinates.
+
+**Why**: This is the fundamental transform of the entire paradigm, mapping the linear xy space to a radial space centered at the origin. In polar coordinates:
+- A circle is simply r = constant
+- A ray is simply θ = constant
+- This makes creating ring/spiral/radial effects very straightforward
+
+**About the `atan` function**:
+- `atan(y, x)` (two-argument version) is equivalent to atan2 in math, returning [-π, π]
+- `atan(y/x)` (single-argument version) only returns [-π/2, π/2], losing quadrant information
+- Always use the two-argument version
+
+```glsl
+// Basic transform
+float r = length(uv);           // Radius
+float theta = atan(uv.y, uv.x); // Angle, range [-PI, PI]
+
+// Wrapped as reusable functionvec2 toPolar(vec2 p) {
+    return vec2(length(p), atan(p.y, p.x));
+}
+
+// Normalize angle to [0, 1] rangevec2 polar = vec2(atan(uv.y, uv.x) / 6.283 + 0.5, length(uv));
+// polar.x in [0,1], polar.y is radius
+```
+
+### Step 3: Operations in Polar Coordinate Space
+
+**What**: Perform various transforms in (r, θ) space to create effects.
+
+**Why**: The unique property of polar coordinate space is that rotation, spirals, radial repetition, and other effects that are extremely difficult in Cartesian coordinates become simple addition, subtraction, and multiplication operations here.
+
+#### 3a. Radial Distortion (Swirl) — Angle Offset by Radius
+
+**Principle**: `θ_new = θ - k × r` causes points farther from the center to rotate more, naturally forming a vortex. `k` controls how "tight" the vortex is.
+
+```glsl
+// Greater radius = more rotation → vortex effect
+float spin_amount = 0.25; // Adjustable: vortex strength, 0=no rotation, 1=maximum rotation
+float new_theta = theta - spin_amount * 20.0 * r;
+```
+
+#### 3b. Angular Twist — Angle Plus Time Offset
+
+**Principle**: Adding functions of time and the angle itself to the angle produces distorted rings that change over time. The `sin(theta)` term makes the distortion non-uniform, creating an organic feel.
+
+```glsl
+// Angle varies with time and position → twisted ringsfloat twist_angle = theta + 2.0 * iTime + sin(theta) * sin(iTime) * 3.14159;
+```
+
+#### 3c. Archimedean Spiral — Radius Minus Angle
+
+**Principle**: The Archimedean spiral r = a + bθ has the property of equal spacing. In UV space, `y -= x` (i.e., r -= θ) "unfolds" concentric rings into equally-spaced spiral bands.
+
+```glsl
+// Unfold into spiral bandsvec2 spiral_uv = vec2(theta_normalized, r);
+spiral_uv.y -= spiral_uv.x; // Key: "unfold" radial space into spirals
+```
+
+#### 3d. Logarithmic Spiral — Angle Plus log(r) Shear
+
+**Principle**: The logarithmic spiral (equiangular spiral) r = ae^(bθ) has the property of self-similarity — it looks exactly the same when magnified. The `log(r)` shear makes rotation amount grow logarithmically at different radii, commonly seen in nature (nautilus shells, galaxy arms).
+
+```glsl
+// Logarithmic spiral stretch
+float shear = 2.0 * log(r); // Adjustable: coefficient controls spiral tightness
+float c = cos(shear), s = sin(shear);
+mat2 spiral_mat = mat2(c, -s, s, c); // Rotation matrix implements shear
+```
+
+#### 3e. Kaleidoscope — Angle Modulo and Mirroring
+
+**Principle**: Divides the 2π angular range into N equal sectors, then maps all pixels to a single sector. Mirroring makes adjacent sectors symmetric, avoiding seams.
+
+**Mathematical Derivation**:
+1. `sector = 2π / N`: Angular width of each sector
+2. `c_idx = floor((θ + sector/2) / sector)`: Current sector index
+3. `θ' = mod(θ + sector/2, sector) - sector/2`: Fold to [-sector/2, sector/2]
+4. `θ' *= (2 × (c_idx mod 2) - 1)`: Flip odd sectors
+
+```glsl
+// Angular subdivision + mirroring for kaleidoscopefloat rep = 12.0;          // Adjustable: number of symmetry axes
+float sector = TAU / rep;  // Angle per sector
+float a = polar.y;         // Angle component
+
+// Modulo to single sector
+float c_idx = floor((a + sector * 0.5) / sector);
+a = mod(a + sector * 0.5, sector) - sector * 0.5;
+
+// Mirror: flip adjacent sectors
+a *= mod(c_idx, 2.0) * 2.0 - 1.0;
+```
+
+#### 3f. Spiral Arm Compression — Periodic Modulation in Angular Domain
+
+**Principle**: Galaxy spiral arms are not simple lines but regions of higher matter density. `cos(N × (θ - shear))` creates periodic compression in the angular domain, causing matter (color/brightness) to accumulate along N arms. The `COMPR` parameter controls arm "sharpness".
+
+**Density Compensation**: Compression changes local density (like an accordion effect); `arm_density` compensates for this non-uniformity, preventing the arms from being too bright or too dark.
+
+```glsl
+// Galaxy spiral arm effect
+float NB_ARMS = 5.0;   // Adjustable: number of spiral arms
+float COMPR = 0.1;      // Adjustable: intra-arm compression strength
+float phase = NB_ARMS * (theta - shear);
+theta = theta - COMPR * cos(phase); // Compress angular domain to form arm structures
+float arm_density = 1.0 + NB_ARMS * COMPR * sin(phase); // Density compensation
+```
+
+### Step 4: Polar to Cartesian Reconstruction (Round Trip)
+
+**What**: Convert modified polar coordinates back to Cartesian coordinates.
+
+**Why**: Some effects need to transform in polar space and then return to xy space for further processing (e.g., overlaying texture noise, Truchet patterns, etc.). This forms the complete Cartesian→Polar→Cartesian "round trip".
+
+**Notes**:
+- After inverse transform, the coordinate origin may need adjustment (e.g., a `mid` offset to screen center)
+- If you only need to color in polar space (e.g., ring gradients), no inverse transform is needed
+
+```glsl
+// Basic inverse transform
+vec2 new_uv = vec2(r * cos(new_theta), r * sin(new_theta));
+
+// Wrapped as reusable functionvec2 toRect(vec2 p) {
+    return vec2(p.x * cos(p.y), p.x * sin(p.y));
+}
+
+// Complete round trip: offset to screen center after transform
+vec2 mid = (iResolution.xy / length(iResolution.xy)) / 2.0;
+vec2 warped_uv = vec2(
+    r * cos(new_theta) + mid.x,
+    r * sin(new_theta) + mid.y
+) - mid;
+```
+
+### Step 5: Polar Coordinate Shape Definition (SDF)
+
+**What**: Define signed distance fields of shapes via r(θ) functions in polar coordinates.
+
+**Why**: Many classic curves (cardioid, rose curves, star shapes) have elegant analytical expressions in polar coordinates that would be extremely complex in Cartesian coordinates.
+
+**Advantages of SDF**:
+- Negative value = inside, positive value = outside, zero = boundary
+- Convenient boolean operations (`max` = intersection, `min` = union)
+- `smoothstep` directly produces anti-aliased edges
+- `abs(d)` produces outlines, `1/abs(d)` produces glow
+
+```glsl
+// Cardioid
+float a = atan(p.x, p.y) / 3.141593; // Note: atan(x,y) not atan(y,x), so heart points up
+float h = abs(a);
+float heart_r = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+float dist = r - heart_r; // Negative = inside, positive = outside
+
+// Rose curve / petals
+float PETAL_FREQ = 3.0; // Adjustable: petal frequency (K.x/K.y controls integer/fractional petals)
+float A_coeff = 0.2;    // Adjustable: petal amplitude
+float rose_dist = abs(r - A_coeff * sin(PETAL_FREQ * theta) - 0.5); // Distance to curve
+
+// Render SDF as visible shape
+float shape = smoothstep(0.01, -0.01, dist); // Anti-aliased edge
+```
+
+### Step 6: Coloring and Anti-Aliasing
+
+**What**: Color based on polar coordinate information and handle edge anti-aliasing.
+
+**Why**: Polar coordinate coloring naturally produces radial gradients and ring patterns. Anti-aliasing is especially important in polar coordinates because pixel density varies significantly away from the center due to angular subdivision.
+
+**Anti-aliasing method comparison**:
+
+| Method | Pros | Cons |
+|--------|------|------|
+| `fwidth` | Adaptive, precise | Requires GPU derivative support |
+| Fixed resolution width | Simple, reliable | Not adaptive to scaling |
+| `smoothstep` + fixed offset | Simplest | Average results |
+
+```glsl
+// Adaptive anti-aliasing based on fwidthfloat aa = smoothstep(-1.0, 1.0, value / fwidth(value));
+
+// Resolution-based anti-aliasingfloat aa_size = 2.0 / iResolution.y;
+float edge = smoothstep(0.5 - aa_size, 0.5 + aa_size, value);
+
+// General SDF anti-aliasing using smoothstep
+float d = some_sdf_value;
+float col = smoothstep(aa_size, -aa_size, d); // aa_size ≈ 1~3 pixels
+
+// Radial gradient coloring
+vec3 color = vec3(1.0, 0.4 * r, 0.3); // Color varies with radius
+color *= 1.0 - 0.4 * r;               // Darken at edges
+
+// Inter-spiral-band anti-aliasingfloat inter_spiral_aa = 1.0 - pow(abs(2.0 * fract(spiral_uv.y) - 1.0), 10.0);
+```
+
+## Variant Details
+
+### Variant 1: Dynamic Vortex/Swirl Background
+
+**Difference from basic version**: Complete Cartesian→Polar→Cartesian round trip + iterative domain warping to generate complex textures.
+
+**Technical Points**:
+1. First apply vortex distortion in polar coordinates
+2. Convert back to Cartesian coordinates
+3. Perform 5 iterations of domain warping in the transformed space, each iteration nonlinearly offsetting coordinates
+4. The iterative sin/cos combination produces complex organic textures
+
+**Parameter Descriptions**:
+- `SPIN_AMOUNT`: Vortex strength, controls polar distortion magnitude
+- `SPIN_EASE`: Vortex easing, makes rotation speed differ between center and edges
+- `speed`: Animation speed, driven by `iTime`
+
+```glsl
+// Polar coordinate vortex transform
+float new_angle = atan(uv.y, uv.x) + speed
+    - SPIN_EASE * 20.0 * (SPIN_AMOUNT * uv_len + (1.0 - SPIN_AMOUNT));
+vec2 mid = (screenSize.xy / length(screenSize.xy)) / 2.0;
+uv = vec2(uv_len * cos(new_angle) + mid.x,
+           uv_len * sin(new_angle) + mid.y) - mid;
+
+// Iterative domain warping for organic textures
+uv *= 30.0;
+for (int i = 0; i < 5; i++) {
+    uv2 += sin(max(uv.x, uv.y)) + uv;
+    uv  += 0.5 * vec2(cos(5.1123 + 0.353*uv2.y + speed*0.131),
+                       sin(uv2.x - 0.113*speed));
+    uv  -= cos(uv.x + uv.y) - sin(uv.x*0.711 - uv.y);
+}
+```
+
+### Variant 2: Polar Torus Twist
+
+**Difference from basic version**: Renders geometry directly in polar coordinate space (without returning to Cartesian), simulating a 3D torus through angular slicing.
+
+**Technical Points**:
+1. Offset the r dimension to the ring's centerline (`r -= OUT_RADIUS`) to center the ring region
+2. "Slice" along the ring in the angular dimension, with each slice being one edge of a regular polygon
+3. The `twist` variable makes the polygon twist along the ring, producing a Möbius strip-like effect
+4. The `sin(uvr.y)*sin(iTime)` term varies the twist speed with angle, creating organic squeezing/stretching
+
+```glsl
+// Geometric slicing in polar coordinates
+vec2 uvr = vec2(length(uv), atan(uv.y, uv.x) + PI);
+uvr.x -= OUT_RADIUS; // Offset to ring centerline
+
+float twist = uvr.y + 2.0*iTime + sin(uvr.y)*sin(iTime)*PI;
+for (int i = 0; i < NUM_FACES; i++) {
+    float x0 = IN_RADIUS * sin(twist + TAU * float(i) / float(NUM_FACES));
+    float x1 = IN_RADIUS * sin(twist + TAU * float(i+1) / float(NUM_FACES));
+    // Define face start/end positions in the polar r direction
+    vec4 face = slice(x0, x1, uvr);
+    col = mix(col, face.rgb, face.a);
+}
+```
+
+### Variant 3: Galaxy / Logarithmic Spiral (Galaxy Style)
+
+**Difference from basic version**: Uses `log(r)` for equiangular spirals, combined with FBM noise and spiral arm compression.
+
+**Technical Points**:
+1. The `log(r)` shear is the core — it maps concentric circles to logarithmic spirals
+2. Rotation matrix R rotates the noise sampling coordinates by the shear angle, aligning noise along the spiral arms
+3. `NB_ARMS` and `COMPR` control the number and sharpness of arms
+4. FBM noise is sampled in the rotated space, producing galactic dust texture
+
+```glsl
+float rho = length(uv);
+float ang = atan(uv.y, uv.x);
+float shear = 2.0 * log(rho);     // Logarithmic spiral core
+mat2 R = mat2(cos(shear), -sin(shear), sin(shear), cos(shear));
+
+// Spiral arms
+float phase = NB_ARMS * (ang - shear);
+ang = ang - COMPR * cos(phase) + SPEED * t; // Inter-arm compression
+uv = rho * vec2(cos(ang), sin(ang));         // Reconstruct Cartesian
+float gaz = fbm_noise(0.09 * R * uv);        // Sample noise in spiral space
+```
+
+### Variant 4: Archimedean Spiral Band + Vortices
+
+**Difference from basic version**: Unfolds polar coordinates into spiral bands, creates independent vortex animations within bands, with arc-length parameterization.
+
+**Technical Points**:
+1. `U.y -= U.x` is the core of Archimedean unfolding — converts concentric rings to equally-spaced spiral bands
+2. Arc-length parameterization `arc_length()` ensures uniform cell area within the spiral band
+3. Each cell uses `dot` + `cos` to create a small vortex, strong at center, weak at edges
+4. `cell_id.x` gives different cells different vortex phases, avoiding monotonous repetition
+
+```glsl
+vec2 U = vec2(atan(U.y, U.x)/TAU + 0.5, length(U));
+U.y -= U.x;                                    // Archimedean unfolding
+U.x = arc_length(ceil(U.y) + U.x) - iTime;     // Arc-length parameterization
+
+// Vortex within each cell of the spiral band
+vec2 cell_uv = fract(U) - 0.5;
+float vortex = dot(cell_uv,
+    cos(vec2(-33.0, 0.0)                       // Rotation matrix angle offset
+        + 0.3 * (iTime + cell_id.x)            // Time + spatial rotation amount
+        * max(0.0, 0.5 - length(cell_uv))));   // Strong at center, weak at edges
+```
+
+### Variant 5: Complex Number / Polar Duality (Jeweled Vortex Style)
+
+**Difference from basic version**: Uses complex number operations (multiplication = rotation + scaling, power = spiral mapping) instead of explicit trigonometric functions to implement conformal mappings.
+
+**Technical Points**:
+1. Complex power `z^(1/e)` is equivalent to `(r^(1/e), θ/e)` in polar coordinates — simultaneously scaling radius and compressing angle
+2. `exp(log(length(u)) / e)` implements `r^(1/e)` without explicitly computing the power
+3. `ceil(r - a/TAU)` produces spiral contour lines — corresponding to different sheets of the Riemann surface in the complex plane
+4. Multi-layered `sin`/`cos` combinations produce jewel-like interference colors
+
+```glsl
+float e = n * 2.0;  // Complex power exponent, controls spiral curvature
+float a = atan(u.y, u.x) - PI/2.0;     // Angle
+float r = exp(log(length(u)) / e);      // r^(1/e) — complex root
+float sc = ceil(r - a/TAU);             // Spiral contour lines
+float s = pow(sc + a/TAU, 2.0);         // Spiral gradient
+// Multi-layer spiral compositing
+col += sin(cr + s/n * TAU / 2.0);       // Spiral color layer 1
+col *= cos(cr + s/n * TAU);             // Spiral color layer 2
+col *= pow(abs(sin((r - a/TAU) * PI)), abs(e) + 5.0); // Smooth edges
+```
+
+## In-Depth Performance Analysis
+
+### 1. Avoiding Numerical Issues at the Pole
+
+`atan(0,0)` and `length(0)` may produce numerical instability near the origin. While GLSL's `atan` won't crash at the origin, the return value is undefined and may cause flickering.
+
+```glsl
+// Safe polar coordinate conversion
+float r = max(length(uv), 1e-6); // Avoid division by zero
+float theta = atan(uv.y, uv.x);  // atan2 is not well-defined at origin but won't crash
+```
+
+**When needed**: Protection is required when subsequent calculations include `1.0/r`, `log(r)`, or `normalize(uv)`. If only `r * something`, r=0 at the origin is naturally safe.
+
+### 2. Trigonometric Function Optimization
+
+Frequent sin/cos calls are the main cost of polar coordinate shaders. Although GPU sin/cos is hardware-accelerated, heavy use in loops can still become a bottleneck.
+
+```glsl
+// If both sin and cos are needed, replace with a single matrix multiplication
+mat2 ROT(float a) { float c=cos(a), s=sin(a); return mat2(c,s,-s,c); }
+vec2 rotated = ROT(angle) * uv; // Cleaner than computing sin, cos separately and manually constructing
+
+// Use vector dot product instead of explicit trig
+// Instead of U.y = cos(rot)*U.x + sin(rot)*U.y
+// Use U.y = dot(U, cos(vec2(-33,0) + angle))
+```
+
+**Principle**: `cos(vec2(a, b))` in GLSL is a single SIMD instruction that computes two cos values simultaneously. Combined with `dot`, rotation can be achieved with only one `cos` call (leveraging the identity `cos(x - π/2) = sin(x)`).
+
+### 3. Leveraging Kaleidoscope Symmetry
+
+A kaleidoscope inherently reduces computation by a factor of N (N = number of symmetry segments), serving as a natural optimization. All expensive pattern calculations are done in just one sector:
+
+```glsl
+// Do kaleidoscope folding first, then expensive pattern computation
+vec2 kp = kaleidoscope(polar, segments); // Cheap
+vec2 rect = toRect(kp);
+// All subsequent computation only applies to one sector
+float expensive_pattern = some_costly_function(rect); // Same cost but N× visual complexity
+```
+
+**Note**: The cost of kaleidoscope folding itself (a few `floor`, `mod`, and multiplication operations) is far less than the visual complexity it "saves". A 12-segment kaleidoscope means you get 12x visual richness for 1/12 the pattern computation cost.
+
+### 4. Loop Optimization in Spiral Bands
+
+For effects like rose curves that require multi-loop computation, keep loop counts reasonable:
+
+```glsl
+// Rose curves only need ceil(K.y) loops
+for (int i = 0; i < 7; i++) { // 7 loops are enough to cover most fractional frequencies
+    v = max(v, ribbon_value);
+    a += 6.28; // Next loop
+}
+// Don't use excessively large loop counts; 4~8 loops suffice for most cases
+```
+
+**Why 4~8 loops**: The rose curve r = cos(p/q × θ) has a period of q loops (when p/q is fractional). For most practical petal frequencies, 7 loops provide full coverage. Excessive loops not only waste computation but may also produce artifacts from floating-point accumulation errors.
+
+### 5. Pixel Filter Downsampling
+
+For stylized effects, downsampling can dramatically reduce computation:
+
+```glsl
+float pixel_size = length(iResolution.xy) / 745.0; // Adjustable: smaller = more pixelated
+vec2 uv = floor(fragCoord / pixel_size) * pixel_size; // Quantize coordinates
+// All subsequent computation uses quantized uv, adjacent pixels share results
+```
+
+**Performance benefit**: If pixel_size makes each "virtual pixel" cover 4×4 actual pixels, the GPU only needs to compute 1/16 of unique values (remaining adjacent pixels produce identical results and may benefit from cache optimization).
+
+## Complete Combination Code Examples
+
+### Polar Coordinates + FBM Noise
+
+Sample FBM noise in polar coordinate space to produce organic spiral textures (galactic dust, flame vortices):
+
+```glsl
+vec2 polar_uv = rho * vec2(cos(modified_ang), sin(modified_ang));
+float organic = fbm(polar_uv * frequency); // Sample in transformed space
+```
+
+### Polar Coordinates + Truchet Patterns
+
+Lay Truchet tiles in kaleidoscope-folded space to produce kaleidoscopic geometric tunnel effects. The kaleidoscope provides symmetry; Truchet provides detail patterns.
+
+```glsl
+// Kaleidoscope folding
+vec2 kp = kaleidoscope(polar, segments);
+vec2 rect = toRect(kp);
+
+// Truchet grid
+rect *= 4.0;
+vec2 cell_id = floor(rect + 0.5);
+vec2 cell_uv = fract(rect + 0.5) - 0.5;
+float cell_hash = fract(sin(dot(cell_id, vec2(127.1, 311.7))) * 43758.5453);
+
+// Arc Truchet
+float d = length(cell_uv);
+float truchet = abs(d - 0.35);
+if (cell_hash > 0.5) {
+    truchet = min(truchet, abs(length(cell_uv - 0.5) - 0.5));
+} else {
+    truchet = min(truchet, abs(length(cell_uv + 0.5) - 0.5));
+}
+```
+
+### Polar Coordinates + SDF Shapes
+
+Define shape contours with polar equations r(θ), combined with SDF techniques for boolean operations, rounded corners, and glow:
+
+```glsl
+float heart_sdf = r - heart_r_theta;
+float glow = 0.02 / abs(heart_sdf); // Glow effect
+float solid = smoothstep(0.01, -0.01, heart_sdf); // Solid fill
+```
+
+### Polar Coordinates + Checkerboard/Grid
+
+Lay a checkerboard pattern in polar coordinate space, naturally forming ring/spiral checkerboards:
+
+```glsl
+// Create checkerboard in polar UV
+float checker = sign(sin(u * PI * 4.0) * cos(uvr.y * 16.0));
+col *= checker * (1.0/16.0) + 0.7; // Low contrast checkerboard texture
+```
+
+### Polar Coordinates + Post-Processing
+
+Polar coordinate effects combined with gamma correction, vignette, and color mapping can greatly enhance visual quality:
+
+```glsl
+col = pow(col, vec3(1.0/2.2));                                    // Gamma
+col = col*0.6 + 0.4*col*col*(3.0-2.0*col);                      // Contrast enhancement
+col *= 0.5 + 0.5*pow(19.0*q.x*q.y*(1.0-q.x)*(1.0-q.y), 0.7);  // Vignette
+```
--- a/skills/shader-dev/reference/post-processing.md
+++ b/skills/shader-dev/reference/post-processing.md
@@ -0,0 +1,375 @@
+# Post-Processing Effects Detailed Reference
+
+This file is a complete supplement to [SKILL.md](SKILL.md), covering prerequisites, detailed explanations of each step (what and why), variant details, in-depth performance optimization analysis, and complete combination suggestions.
+
+## Prerequisites
+
+- GLSL fundamentals and the ShaderToy environment (iResolution, iTime, iChannel, textureLod, etc.)
+- Basic vector and matrix operations
+- Difference between linear color space and gamma correction
+- Texture sampling and UV coordinate systems
+- Basic concepts of convolution (kernel, weights, normalization)
+- Multi-pass rendering concepts (Buffer A/B/C/D and Image pass in ShaderToy)
+
+## Applicable Scenarios
+
+Use this technique when you have completed the primary rendering of a scene and need screen-space image enhancement on the result. Typical applications include:
+
+- **HDR to LDR Conversion**: After using linear HDR lighting in a scene, tone mapping is needed to compress values into the displayable range
+- **Atmosphere Enhancement**: Effects like vignette, color grading, and film grain to enhance a cinematic look
+- **Glow and Bloom**: Simulating lens bloom to produce soft light diffusion around bright areas
+- **Motion and Defocus Blur**: Simulating physical camera characteristics through motion blur and depth of field
+- **Anti-Aliasing**: Post-processing AA solutions such as FXAA and TAA
+- **Chromatic Aberration and Lens Effects**: Optical simulations like chromatic aberration and lens flare
+
+## Core Principles
+
+### Tone Mapping
+
+Maps HDR linear color values from [0, ∞) to the LDR display range [0, 1]. Core mathematical models:
+
+- **Reinhard**: `color = color / (1.0 + color)`, a simple S-curve compression
+- **Filmic Reinhard**: `q = (T²+1)·x² / (q + x + T²)`, with white point (W) and shoulder (T) parameters
+- **ACES**: Industry standard, converts colors to the ACES color space via a 3×3 matrix, then applies a rational polynomial `(ax+b)/(cx+d)+e` for nonlinear mapping
+- **General Rational Polynomial**: `(a·x²+b·x) / (c·x²+d·x+e)`, can fit various tone curves
+
+### Gaussian Blur
+
+2D Gaussian kernel `G(x,y) = exp(-(x²+y²)/(2σ²))`. Due to separability, it can be split into two 1D passes (horizontal + vertical), reducing O(n²) to O(2n).
+
+### Bloom
+
+Extracts bright pixels (bright-pass threshold), then applies multi-level Gaussian blur and adds the result back to the original image. Multi-octave approach: progressively downsample + blur, then progressively composite, producing bloom layers from narrow to wide.
+
+### Vignette
+
+Attenuates brightness based on the pixel's distance to the screen center. Common formulas:
+- **Multiplicative**: Power of `16·u·v·(1-u)·(1-v)`
+- **Radial**: `1 - pow(dist * scale, exponent)` mixed with strength
+
+### Chromatic Aberration
+
+Simulates the difference in lens refraction for different wavelengths. Samples the same texture with different scale factors for R/G/B channels, with offset increasing from center to edges.
+
+## Implementation Steps
+
+### Step 1: Tone Mapping — Map HDR to Displayable Range
+
+**What**: Compress HDR linear color values from the render output into the [0,1] range.
+
+**Why**: Physically correct lighting calculations produce brightness values far exceeding the display range. Direct clamping would lose highlight detail. Tone mapping uses a nonlinear curve to preserve shadow detail and highlight transitions.
+
+Comparison of four approaches:
+- **Reinhard**: Simplest, good for beginners. A single line `color / (1.0 + color)` achieves S-curve compression, but the highlight region is compressed too aggressively, lacking a smooth "shoulder" transition.
+- **Filmic Reinhard**: The white point (W) parameter controls the mapping position of the brightest value, and the shoulder parameter (T2) controls how gently highlights are compressed. Higher T2 values produce softer highlight transitions.
+- **ACES**: Industry standard approach. First converts linear sRGB to the ACES AP1 color space via an input matrix, applies a rational polynomial nonlinear mapping, then converts back to sRGB via an output matrix. Most accurate color representation, but slightly more computationally expensive.
+- **General Rational Polynomial**: A general curve with 5 adjustable parameters that can manually fit any tone curve. Maximum flexibility, but requires manual parameter tuning.
+
+### Step 2: Gamma Correction — Linear Space to Display Space
+
+**What**: Convert linear color values to sRGB gamma space for correct display on monitors.
+
+**Why**: Monitor brightness response is nonlinear (approximately γ=2.2). Directly outputting linear values would appear too dark. Gamma correction compensates with `pow(1/2.2)`.
+
+Notes:
+- The ACES approach already includes gamma correction, so no additional step is needed
+- Some pipelines use 0.4545 (≈1/2.2) as the gamma value
+- Gamma correction must be performed after tone mapping
+
+### Step 3: Contrast Enhancement — Hermite S-Curve
+
+**What**: Apply an S-curve to the tone-mapped colors to enhance midtone contrast.
+
+**Why**: After tone mapping, the image may appear flat. An S-curve makes darks darker and brights brighter, increasing visual impact. The cubic Hermite basis function `3x² - 2x³` of `smoothstep` is a natural S-curve.
+
+Implementation details:
+- Must be performed after gamma correction, when the value range is [0,1]
+- Use `clamp` to ensure input is within valid range
+- The `contrast_strength` parameter controls effect intensity via `mix`, 0 for no effect, 1 for full effect
+- The `smoothstep(-0.025, 1.0, color)` version provides a slight toe lift in the darks, avoiding pure black
+
+### Step 4: Color Grading
+
+**What**: Apply channel-level adjustments to shift the overall color tone.
+
+**Why**: Different color temperatures and tones convey different moods. Warm tones (yellow/orange bias) give a cozy feeling, while cool tones (blue/cyan bias) give a sense of detachment.
+
+Four approaches in detail:
+- **Per-Channel Multiplication**: Simplest and most direct. `vec3(1.11, 0.89, 0.79)` boosts the red channel while reducing blue/green, producing warm tones. Swap the coefficients for cool tones.
+- **Power Color Grading**: Adjusts color by changing each channel's gamma curve. Values <1 brighten that channel, >1 darken it. Gentler than multiplication, with greater impact on midtones.
+- **HSV Hue Shift**: After converting to HSV, you can directly rotate the hue and adjust saturation. Suitable for scenarios requiring precise hue control.
+- **Desaturation Blend**: Mixes the original color with its luminance value (grayscale). Higher blend ratios produce a more washed-out look, creating a "cinematic" or "faded" effect.
+
+### Step 5: Vignette
+
+**What**: Darken the edges of the image to guide the viewer's focus toward the center.
+
+**Why**: Simulates the optical vignetting of real lenses and is a classic film composition technique.
+
+Comparison of three approaches:
+- **Approach A (Multiplicative, classic)**: `16·u·v·(1-u)·(1-v)` constructs a parabolic surface in UV space that equals 1 at the center and 0 at the corners. The power parameter controls falloff speed, 0.25 is commonly used. Advantage: minimal computation. Disadvantage: fixed rectangular gradient shape.
+- **Approach B (Radial distance)**: Based on the Euclidean distance from pixel to screen center. Accounts for aspect ratio correction, producing an elliptical vignette. Three parameters control intensity, starting radius, and falloff steepness.
+- **Approach C (Inverse quadratic falloff)**: `1/(1 + dot(p,p))` produces very natural optical vignetting. Squaring twice makes the falloff more pronounced. Smoothstep blending controls effect intensity.
+
+### Step 6: Gaussian Blur — Basic Blur
+
+**What**: Apply Gaussian convolution blur to the image. This is the fundamental building block for Bloom.
+
+**Why**: The Gaussian kernel is the only smoothing kernel that is both isotropic and separable, producing a naturally soft blur.
+
+Implementation details:
+- `normpdf` computes the Gaussian probability density, where 0.39894 ≈ 1/√(2π)
+- KERNEL_SIZE must be odd to ensure center symmetry
+- First build a 1D kernel and exploit symmetry (`kernel[HALF+j] = kernel[HALF-j]`)
+- Z is the normalization factor, ensuring all weights sum to 1
+- 2D convolution is implemented via two nested loops, with the outer product `kernel[j] * kernel[i]` constructing 2D weights
+- In production, use a separable approach (two 1D passes) instead for better performance
+
+### Step 7: Bloom — HDR Glow
+
+**What**: Extract bright areas from the image, apply multi-level blur, and add the result back to create a glow diffusion effect.
+
+**Why**: Both the human eye and camera lenses see glow around strong light sources. Bloom is the most impactful post-processing effect for enhancing the "HDR feel" of an image.
+
+Implementation details:
+- Uses `textureLod` to sample from high LOD levels of the mipmap; the GPU hardware automatically handles downsampled blur
+- Sampling from LOD 5/6/7 corresponds to approximately 32x/64x/128x downsampling, producing different blur radii from narrow to wide
+- 2x2 neighborhood supersampling (loop from -1 to 1) reduces blockiness
+- `maxBloom` cap prevents extremely bright pixels from producing excessive bloom
+- `pow(bloom, vec3(1.5))` applies gamma adjustment to concentrate bloom in bright areas
+- Note: ShaderToy Buffers do not generate mipmaps by default; this must be enabled in the channel settings
+
+### Step 8: Chromatic Aberration
+
+**What**: Sample R/G/B channels with different UV scales to simulate lens dispersion.
+
+**Why**: Real lenses cannot focus all wavelengths of light onto the same focal plane. This "imperfection" actually adds realism and visual interest to the image.
+
+Implementation details:
+- Offset direction is calculated from the screen center
+- In each iteration, R/G/B channels are sampled with different scale factors
+- The red channel contracts (rf decreasing), blue channel expands (bf increasing), green channel remains nearly unchanged
+- The difference in contraction/expansion rates produces the dispersion effect, increasing from center to edges
+- The iterative implementation accumulates samples at different scale factors to simulate a continuous spectrum
+- CA_SAMPLES: more samples produce smoother results; 4-8 is usually sufficient
+
+### Step 9: Film Grain
+
+**What**: Overlay pseudo-random noise to simulate film grain texture.
+
+**Why**: Subtle random noise breaks the "perfect" feel of digital images, adds organic texture, and helps reduce color banding.
+
+Two implementation approaches:
+- **Hash Noise**: A simple `fract(sin(...) * 43758.5453)` pseudo-random function. Multiplied by iTime to ensure different noise each frame. An intensity of around 0.012 looks natural.
+- **Bayer Matrix Ordered Dithering**: A 4x4 Bayer matrix provides 17 levels of ordered dithering. More uniform than random noise, particularly suitable for eliminating 8-bit color banding. `(dither - 0.5) * 4.0 / 255.0` limits the dither amount to approximately ±2 color levels.
+
+### Step 10: Motion Blur
+
+**What**: Apply directional blur along each pixel's motion direction.
+
+**Why**: Static frames lack a sense of motion. Motion blur simulates the effect of object movement during shutter exposure, making animation smoother and more natural.
+
+Implementation details:
+- Motion direction is determined from a velocity buffer
+- Samples uniformly along the motion direction with linearly decreasing weights (lower weight at greater distances)
+- MB_STRENGTH controls the blur radius (in UV space); 0.25 means sampling up to 25% screen distance from the pixel
+- 32 samples are usually sufficient; random jittering can achieve similar results with fewer samples
+
+Camera reprojection approach:
+- Requires a depth buffer and previous frame's camera matrix
+- Projects the current pixel's world coordinate to the previous frame's UV to obtain the motion vector
+- The shutterAngle parameter (0~1) controls the blur amount
+- Randomized sample positions avoid regular stripe artifacts
+
+### Step 11: Depth of Field
+
+**What**: Calculate the Circle of Confusion (CoC) based on pixel depth and focal plane distance, and use disk sampling with defocus to simulate out-of-focus blur.
+
+**Why**: Simulates a real thin lens model, producing soft bokeh for objects outside the focal plane, enhancing depth perception.
+
+Implementation details:
+- **CoC Model**: Based on the thin lens formula, CoC size is proportional to how much the pixel depth deviates from the focal plane. The aperture parameter controls the aperture size, affecting the depth of field range.
+- **Fibonacci Spiral Sampling**: The golden angle (≈ 2.3998 radians) ensures sampling points are uniformly distributed on the disk. `sqrt(i)` radius increment produces uniform area density.
+- **Weight Strategy**: Uses each sample point's own CoC as weight, ensuring in-focus sharp regions are not "contaminated" by out-of-focus blur.
+- 64 samples produce high-quality bokeh; 32 are sufficient for most needs.
+
+### Step 12: FXAA — Fast Approximate Anti-Aliasing
+
+**What**: Detect aliased edges in the image and apply directional blur along edges to eliminate aliasing.
+
+**Why**: Post-processing AA does not require modifying the rendering pipeline and has extremely low cost. FXAA detects edge direction through luminance gradients and uses a small number of texture samples for directional blurring.
+
+Implementation details:
+- Sample luminance from 4 diagonal neighbors (NW, NE, SW, SE) and the center
+- Calculate the luminance range (lumaMin/lumaMax) for final quality assessment
+- Edge direction is computed from horizontal/vertical luminance differences
+- `dirReduce` and `rcpDirMin` control the scaling of the direction vector to prevent excessive blurring
+- Two-level sampling strategy: rgbA samples at 1/3 and 2/3 positions, rgbB adds samples at -0.5 and 0.5 positions on top of that
+- Final decision: if rgbB's luminance exceeds the neighborhood range (indicating an edge crossing), fall back to rgbA
+
+## Variant Details
+
+### Variant 1: Multi-Pass Separable Bloom
+
+Differs from the basic single-pass mipmap bloom: uses independent Buffers for separable Gaussian blur (horizontal pass + vertical pass), providing higher bloom quality and greater control.
+
+**Buffer A Details (Horizontal Blur + Downsampling)**:
+- `BLOOM_THRESHOLD`: Brightness threshold; only pixels exceeding this value enter bloom. Lower values mean more pixels participate.
+- `BLOOM_DOWNSAMPLE`: Downsampling factor; 3 means computing at 1/3 resolution. Reduces computation while expanding the effective blur radius.
+- `BLUR_RADIUS`: Blur radius (in pixels); 16 means sampling 16 pixels in each direction.
+- The `-8.0` in the Gaussian weight `exp(-8.0 * d * d)` controls the falloff speed; adjust to change the "softness" of the blur.
+- Boundary check `xy.x >= int(iResolution.x) / BLOOM_DOWNSAMPLE` ensures computation only within the downsampled region.
+
+**Buffer B Details (Vertical Blur)**:
+- Identical structure to Buffer A, except the sampling direction changes from horizontal `ivec2(k, 0)` to vertical `ivec2(0, k)`
+- Input is Buffer A's output (iChannel0 bound to Buffer A)
+- The combination of two separable blur passes is equivalent to a full 2D Gaussian blur
+
+### Variant 2: ACES + Complete Color Pipeline
+
+Differs from the basic version: uses the complete ACES RRT+ODT pipeline, including color space matrix conversion and built-in sRGB gamma, suitable for projects pursuing cinema-grade color.
+
+Key differences:
+- Input matrix m1 converts linear sRGB to the ACES AP1 color space
+- Rational polynomial `(v*(v+a)-b) / (v*(c*v+d)+e)` simulates the ACES RRT (Reference Rendering Transform)
+- Output matrix m2 converts ACES AP1 back to linear sRGB
+- The final `pow(..., 1/2.2)` performs sRGB gamma encoding, so a separate gamma correction step is not needed when using this approach
+
+### Variant 3: Physical DoF + Motion Blur Combination
+
+Differs from the basic version: uses depth buffer and previous frame camera matrix for physically correct depth of field + motion blur, sharing the same sampling loop.
+
+Key design:
+- DoF and motion blur are processed in the same for loop, avoiding two independent sampling passes
+- `randomT` hash randomizes each sample point's time position, reducing regular stripe artifacts
+- Motion blur: interpolates between current and previous frame UV by `shutterAngle`
+- DoF: Fibonacci spiral offset, with offset amount controlled by CoC
+- Both effects share the same `textureLod` sample after stacking, saving half the bandwidth
+
+### Variant 4: TAA Temporal Anti-Aliasing
+
+Differs from basic FXAA: leverages multi-frame history for temporal domain supersampling. Each frame uses sub-pixel jittering, blends with the previous frame, and uses neighborhood color clamping to prevent ghosting.
+
+Key steps explained:
+1. **De-jittered Sampling**: The current frame is rendered with sub-pixel jitter; during sampling, the jitter offset must be subtracted to restore the correct UV
+2. **Neighborhood Clamping**: The min/max of colors in the 3x3 neighborhood defines the "reasonable color range". History frame colors outside this range indicate scene changes (occlusion/reveal)
+3. **Reprojection**: Uses the current pixel's world coordinates and the previous frame's view-projection matrix to calculate the corresponding UV position in the previous frame
+4. **Blend Strategy**: When the history frame is within the reasonable range, use a high weight (0.9) for temporal stability; when outside the range, use 0 weight to fully use the current frame and avoid ghosting
+5. `blend = 0.9` is adjustable: higher values are smoother but more prone to trailing artifacts
+
+### Variant 5: Lens Flare + Starburst
+
+Differs from the basic version: overlays lens flare simulation on top of bloom, including starburst and chromatic ghosts.
+
+Key techniques explained:
+- **Starburst Pattern**: `cos(angle * NUM_APERTURE_BLADES)` creates a periodic pattern in the angular domain, simulating diffraction from aperture blades. `NUM_APERTURE_BLADES` controls the number of starburst points. `pow` controls the sharpness of the starburst, becoming less pronounced farther from the light source.
+- **Octagonal Ghosts**: Multiple ghosts placed at reflected positions along the optical axis (the line from the sun to the screen center). `smoothstep` produces soft-edged disk shapes.
+- **Spectral Color**: `wavelengthToRGB` converts wavelength (nm) to RGB; `fract(ghostDist * 5.0)` produces rainbow bands within the ghost, simulating the dispersion effect of real lens ghosts.
+
+## In-Depth Performance Optimization
+
+### 1. Separable Blur Instead of 2D Convolution
+
+An 11×11 2D Gaussian convolution requires 121 samples; splitting into two 1D passes requires only 22. This is the primary optimization for all blur operations.
+
+Mathematical basis: The separability of the Gaussian kernel `G(x,y) = G(x) · G(y)`, meaning a 2D Gaussian kernel equals the outer product of two 1D Gaussian kernels. This means performing horizontal blur followed by vertical blur (or vice versa) produces identical results to direct 2D convolution.
+
+### 2. Hardware Mipmap Instead of Manual Downsampling
+
+`textureLod(tex, uv, lod)` leverages the GPU hardware's mipmap chain for free downsampled blur, suitable for fast bloom. Note that ShaderToy Buffers do not generate mipmaps by default (you need to enable `mipmap` in the channel settings).
+
+Each mipmap level halves the resolution, equivalent to a 2x2 box filter. LOD 5 corresponds to 32x downsampling, LOD 6 to 64x, LOD 7 to 128x. Although a box filter is not a Gaussian kernel, the results approach Gaussian when multiple levels are combined.
+
+### 3. Downsample Before Blurring
+
+Bloom does not need to be computed at full resolution. Downsample the image by 2-4x first, blur at low resolution, then bilinearly upsample back.
+
+Advantages:
+- Computation reduced by 4-16x (area ratio)
+- The same blur kernel size covers a larger screen area at lower resolution
+- Bilinear interpolation during upsampling automatically smooths the result
+
+### 4. Reduce Sample Count
+
+Recommended sample counts:
+- Motion blur: 16-32 samples are usually sufficient; use random jittering (temporal jitter) instead of regular intervals to hide stripe artifacts from insufficient sampling
+- DoF: 32-64 Fibonacci spiral samples produce high-quality bokeh. Fibonacci spirals are more uniform than random distribution, avoiding clustering
+- Chromatic Aberration: 4-8 samples produce good results, since chromatic aberration is inherently a low-frequency variation
+
+### 5. Leverage Bilinear Interpolation for Free Blur
+
+Sampling between two texels causes the GPU hardware to automatically perform bilinear blending, equivalent to a 2-tap average. A single sample effectively obtains weighted information from 4 texels.
+
+Application: Optimize a 5-tap Gaussian blur to 3 texture samples (1 at center + 1 on each side, with the side sample points placed between two texels).
+
+### 6. Conditional Compilation
+
+Use `#define` switches to control each post-processing module. Disabling unneeded effects has zero cost — the preprocessor completely removes the code, generating no instructions.
+
+```glsl
+#define ENABLE_BLOOM 0  // Disable bloom; the branch code is completely absent after compilation
+```
+
+### 7. Avoid Branching
+
+if/else statements in post-processing should be converted to mathematical forms like `mix`/`step`/`smoothstep` whenever possible, avoiding GPU warp divergence.
+
+Example:
+```glsl
+// Bad: if/else branching
+if (brightness > threshold) color = bright_path; else color = dark_path;
+
+// Good: mathematical form
+float t = step(threshold, brightness);
+color = mix(dark_path, bright_path, t);
+```
+
+## Combination Suggestions
+
+### 1. Bloom + Tone Mapping (Most Basic Combination)
+
+Bloom is computed in linear HDR space, added to the scene, then tone mapping is applied. **The order must not be reversed** — doing bloom in LDR space means highlights have already been clamped, and bloom cannot correctly extract super-bright pixels.
+
+```glsl
+// Correct order
+color += bloom;          // Add bloom in HDR space
+color = tonemap(color);  // Then tone map
+color = pow(color, vec3(1.0/2.2)); // Finally gamma
+```
+
+### 2. TAA + Motion Blur + DoF (Physical Camera Simulation)
+
+TAA removes aliasing first, then DoF and motion blur can share a sampling loop. TAA's sub-pixel jitter can also complement motion blur's temporal jitter.
+
+Suggested pipeline order:
+1. TAA (Buffer D): Blend current frame + history frame
+2. DoF + Motion Blur (Image pass): Shared sampling loop
+3. Other subsequent effects
+
+### 3. Chromatic Aberration + Vignette + Film Grain (Lens Simulation Trio)
+
+These three effects all simulate physical lens imperfections; when combined, the image has a strong "real footage" feel.
+
+Execution order:
+1. Chromatic Aberration (CA) is done during the sampling stage — directly replaces the normal `texture()` call
+2. Vignette is applied after all color processing, multiplicatively
+3. Grain is applied last, additively
+
+### 4. Color Grading + Tone Mapping + Contrast (Color Pipeline)
+
+Color grading (multiplication/power adjustments) is done in linear space, tone mapping handles HDR compression, and the S-curve contrast is applied in gamma space. The order of these three steps determines the final color style.
+
+Key point: Color grading in linear space produces the most natural results, because the perceived brightness relationships are correct in linear space.
+
+### 5. Bloom + Lens Flare (Cinematic Light Effects)
+
+Bloom provides soft highlight diffusion; lens flare provides starburst and ghosts. Both share the same bright-pass extraction result, but flare computes directional patterns while bloom is isotropic blur.
+
+### 6. Multi-Pass Complete Pipeline (Production-Grade)
+
+Recommended production-grade pipeline:
+- **Buffer A**: Scene rendering + velocity/depth encoding (pack motion vectors and depth into alpha channel or additional textures)
+- **Buffer B**: Bloom downsampling + horizontal blur (horizontal Gaussian on Buffer A's bright-pass output)
+- **Buffer C**: Bloom vertical blur (vertical Gaussian on Buffer B, completing separable bloom)
+- **Buffer D**: TAA (current frame + history frame blending, needs to read Buffer D's own historical output)
+- **Image**: Final compositing — DoF + Motion Blur + Bloom compositing + Tone Mapping + Color Grading + Vignette + Grain + Dithering
--- a/skills/shader-dev/reference/procedural-2d-pattern.md
+++ b/skills/shader-dev/reference/procedural-2d-pattern.md
@@ -0,0 +1,439 @@
+# 2D Procedural Patterns — Detailed Reference
+
+This document is a complete supplement to [SKILL.md](SKILL.md), containing prerequisites, detailed explanations for each step, variant descriptions, in-depth performance analysis, and combination example code.
+
+---
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: uniform, varying, built-in functions
+- **Vector Math**: `dot`, `length`, `normalize`, `atan`
+- **Coordinate Space Concepts**: UV normalization, aspect ratio correction
+- **Basic Math Functions**: `sin`/`cos`, `fract`/`floor`/`mod`, `smoothstep`, `pow`
+- **Polar Coordinates**: `atan(y,x)` returns angle, `length` returns radial distance
+
+---
+
+## Core Principles in Detail
+
+The essence of 2D procedural patterns is the combination of **domain transforms + distance fields + color mapping**:
+
+1. **Domain Repetition**: use `fract()`/`mod()` to fold an infinite plane into finite cells, each cell independently rendering the same (or variant) pattern
+2. **Cell Identification**: use `floor()` to extract the integer coordinates of the current cell as a hash seed to generate pseudo-random numbers, driving independent variations per cell
+3. **Distance Fields (SDF)**: use mathematical functions to compute the distance from a pixel to geometric shapes (circles, hexagons, line segments, arcs), converting to crisp or soft edges via `smoothstep`
+4. **Color Mapping**: Cosine palette `a + b*cos(2pi(c*t+d))` or HSV mapping, converting scalar values to rich colors
+5. **Layered Compositing**: results from multiple loops or multi-layer passes are combined through addition, multiplication, or `mix` to build visual complexity
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: UV Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert pixel coordinates to normalized coordinates centered on the screen with Y-axis range [-1, 1]
+
+**Why**: A unified coordinate system ensures patterns don't distort with resolution changes; using Y-axis as reference maintains square pixels
+
+```glsl
+vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+```
+
+### Step 2: Domain Repetition — Dividing Space into Repeating Cells
+
+**What**: Scale UV coordinates and take the fractional part to generate repeating local coordinates; simultaneously extract cell IDs using `floor`
+
+**Why**: `fract()` folds an infinite plane into a repeating [0,1) space, `floor()` provides a unique cell identifier for subsequent randomization. Subtracting 0.5 centers the origin
+
+```glsl
+#define SCALE 4.0 // Tunable: repetition density, higher = more cells
+vec2 cell_uv = fract(uv * SCALE) - 0.5;
+vec2 cell_id = floor(uv * SCALE);
+```
+
+For hexagonal grids, domain repetition requires special handling (two offset rectangular grids, taking the nearest):
+
+```glsl
+const vec2 s = vec2(1, 1.7320508); // 1 and sqrt(3)
+vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+// Take the nearest hexagonal center
+vec4 hex_data = dot(h.xy, h.xy) < dot(h.zw, h.zw)
+    ? vec4(h.xy, hC.xy)
+    : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+```
+
+### Step 3: Cell Randomization
+
+**What**: Use cell IDs to generate pseudo-random numbers, giving each cell different attributes (size, position, color offset)
+
+**Why**: Pure repetition looks mechanical; randomization gives patterns a "procedural yet lively" quality
+
+```glsl
+float hash21(vec2 p) {
+    return fract(sin(dot(p, vec2(141.173, 289.927))) * 43758.5453);
+}
+
+float rnd = hash21(cell_id);
+float radius = 0.15 + 0.1 * rnd; // Tunable: base radius and random range
+```
+
+### Step 4: Distance Field Shape Rendering
+
+**What**: Compute the distance from the pixel to the target shape, then convert to visualization using `smoothstep`
+
+**Why**: SDF is the cornerstone of procedural graphics — a single scalar value simultaneously encodes shape, edges, and glow effects
+
+```glsl
+// Circle SDF
+float d = length(cell_uv) - radius;
+
+// Hexagon SDF
+float hex_sdf(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+// Line segment SDF (for networks/grid lines)
+float line_sdf(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+
+// Anti-aliased rendering with smoothstep
+float shape = 1.0 - smoothstep(radius - 0.008, radius + 0.008, length(cell_uv));
+```
+
+### Step 5: Polar Coordinate Conversion and Ring/Arc Patterns
+
+**What**: Convert Cartesian coordinates to polar coordinates, using radial distance to draw concentric rings and angle to draw sectors/arc segments
+
+**Why**: Polar coordinates are naturally suited for radar sweeps, concentric circles, spirals, and other radially symmetric patterns
+
+```glsl
+vec2 polar = vec2(length(uv), atan(uv.y, uv.x));
+float ring_id = floor(polar.x * NUM_RINGS + 0.5) / NUM_RINGS; // Tunable: NUM_RINGS ring count
+
+// Concentric rings
+float ring = 1.0 - pow(abs(sin(polar.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+
+// Arc segment clipping
+float arc_end = polar.y + sin(iTime + ring_id * 5.5) * 1.52 - 1.5;
+ring *= smoothstep(0.0, 0.05, arc_end);
+```
+
+### Step 6: Cosine Palette
+
+**What**: Generate a continuous rainbow color mapping function using four vec3 parameters
+
+**Why**: A single line of code generates infinite smooth color schemes, more flexible and GPU-friendly than lookup tables
+
+```glsl
+vec3 palette(float t) {
+    // Tunable: modify a/b/c/d to change color scheme
+    vec3 a = vec3(0.5, 0.5, 0.5);       // Brightness offset
+    vec3 b = vec3(0.5, 0.5, 0.5);       // Amplitude
+    vec3 c = vec3(1.0, 1.0, 1.0);       // Frequency
+    vec3 d = vec3(0.263, 0.416, 0.557);  // Phase offset
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 7: Iterative Stacking and Glow Effects
+
+**What**: Repeatedly perform domain repetition + distance field calculation in a loop, accumulating color; use `pow(1/d)` to produce glow
+
+**Why**: A single layer pattern is too simple; multi-layer iterative stacking produces fractal-like visual complexity with minimal code. Exponentially decaying glow gives patterns a neon light feel
+
+```glsl
+#define NUM_LAYERS 4.0 // Tunable: number of iteration layers, more = more complex
+vec3 finalColor = vec3(0.0);
+vec2 uv0 = uv; // Preserve original UV for global coloring
+
+for (float i = 0.0; i < NUM_LAYERS; i++) {
+    uv = fract(uv * 1.5) - 0.5;    // Tunable: 1.5 is the scale factor
+    float d = length(uv) * exp(-length(uv0));
+    vec3 col = palette(length(uv0) + i * 0.4 + iTime * 0.4);
+    d = sin(d * 8.0 + iTime) / 8.0; // Tunable: 8.0 is the ripple frequency
+    d = abs(d);
+    d = pow(0.01 / d, 1.2);         // Tunable: 0.01 is glow width, 1.2 is decay exponent
+    finalColor += col * d;
+}
+```
+
+### Step 8: Trigonometric Interference Patterns
+
+**What**: Use `sin`/`cos` to mutually perturb coordinates in iterations, generating water caustic-like interference patterns
+
+**Why**: Superposition of trigonometric functions produces complex Moire-like interference patterns; a few iterations yield highly organic visual effects
+
+```glsl
+#define MAX_ITER 5 // Tunable: iteration count, more = richer detail
+vec2 p = mod(uv * TAU, TAU) - 250.0; // TAU period ensures tileability
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005; // Tunable: intensity coefficient
+
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4); // Tunable: 1.4 is the contrast exponent
+vec3 colour = vec3(pow(abs(c), 8.0));
+```
+
+### Step 9: Multi-Layer Depth Compositing
+
+**What**: Render the same pattern at different zoom levels, using depth fade-in/out to simulate parallax
+
+**Why**: Multi-scale stacking breaks the mechanical feel of a single scale, producing a pseudo-3D depth effect
+
+```glsl
+#define NUM_DEPTH_LAYERS 4.0 // Tunable: number of depth layers
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);    // Dense far away, sparse up close
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z); // Fade at both ends
+    m += fade * patternLayer(uv * size, i, iTime);
+}
+```
+
+### Step 10: Post-Processing Pipeline
+
+**What**: Apply gamma correction, contrast enhancement, saturation adjustment, and vignette in sequence
+
+**Why**: Post-processing transforms "technically correct" output into "visually pleasing" final results
+
+```glsl
+// Gamma correction
+col = pow(clamp(col, 0.0, 1.0), vec3(1.0 / 2.2));
+// Contrast enhancement (S-curve)
+col = col * 0.6 + 0.4 * col * col * (3.0 - 2.0 * col);
+// Saturation adjustment
+col = mix(col, vec3(dot(col, vec3(0.33))), -0.4); // Tunable: -0.4 increases saturation, positive reduces it
+// Vignette
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.7);
+```
+
+---
+
+## Common Variants in Detail
+
+### Variant 1: Hexagonal Grid + Truchet Arcs
+
+**Difference from base version**: Replaces the square grid with a hexagonal grid coordinate system, drawing three randomly oriented arcs within each hexagonal cell; arcs form maze-like continuous paths between cells
+
+**Key modified code**:
+```glsl
+// Hexagon distance field
+float hex(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+// Hexagonal grid coordinates (returns xy=cell-local coords, zw=cell ID)
+const vec2 s = vec2(1.0, 1.7320508);
+vec4 getHex(vec2 p) {
+    vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+    vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+    return dot(h.xy, h.xy) < dot(h.zw, h.zw)
+        ? vec4(h.xy, hC.xy)
+        : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+}
+
+// Truchet three-arc: one arc for each of three directions
+float r = 1.0;
+vec2 q1 = p - vec2(0.0, r) / s;
+vec2 q2 = rot2(6.28318 / 3.0) * p - vec2(0.0, r) / s;
+vec2 q3 = rot2(6.28318 * 2.0 / 3.0) * p - vec2(0.0, r) / s;
+// Take nearest arc
+float d = min(min(length(q1), length(q2)), length(q3));
+d = abs(d - 0.288675) - 0.1; // 0.288675 = sqrt(3)/6, arc radius
+```
+
+### Variant 2: Water Caustic Interference Pattern
+
+**Difference from base version**: Does not use domain repetition grids; instead generates full-screen interference textures through trigonometric iteration, seamlessly tileable
+
+**Key modified code**:
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5 // Tunable: iteration count
+
+vec2 p = mod(uv * TAU, TAU) - 250.0;
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005;
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4);
+vec3 colour = vec3(pow(abs(c), 8.0));
+colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0); // Aquatic color shift
+```
+
+### Variant 3: Polar Concentric Rings + Animated Arc Segments
+
+**Difference from base version**: Uses polar coordinates instead of Cartesian grids, drawing concentric ring arc segments with independent animation, suitable for radar/HUD style
+
+**Key modified code**:
+```glsl
+#define NUM_RINGS 20.0 // Tunable: ring count
+#define PALETTE vec3(0.0, 1.4, 2.0) + 1.5
+
+vec2 plr = vec2(length(p), atan(p.y, p.x));
+float id = floor(plr.x * NUM_RINGS + 0.5) / NUM_RINGS;
+
+// Each ring rotates independently
+p *= rot2(id * 11.0);
+p.y = abs(p.y); // Mirror symmetry
+
+// Concentric ring SDF
+float rz = 1.0 - pow(abs(sin(plr.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+
+// Arc segment animation
+float arc = plr.y + sin(iTime + id * 5.5) * 1.52 - 1.5;
+rz *= smoothstep(0.0, 0.05, arc);
+
+// Per-ring coloring
+vec3 col = (sin(PALETTE + id * 5.0 + iTime) * 0.5 + 0.5) * rz;
+```
+
+### Variant 4: Multi-Layer Depth Parallax Network
+
+**Difference from base version**: Renders grid nodes and connections at multiple zoom levels, using depth fade-in/out to produce a pseudo-3D effect
+
+**Key modified code**:
+```glsl
+#define NUM_DEPTH_LAYERS 4.0 // Tunable: number of depth layers
+
+// Random vertex position within each cell
+vec2 GetPos(vec2 id, vec2 offs, float t) {
+    float n = hash21(id + offs);
+    return offs + vec2(sin(t + n * 6.28), cos(t + fract(n * 100.0) * 6.28)) * 0.4;
+}
+
+// Line segment SDF
+float df_line(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+
+// Multi-layer compositing
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z);
+    m += fade * NetLayer(uv * size, i, iTime);
+}
+```
+
+### Variant 5: Fractal Apollian Pattern
+
+**Difference from base version**: Uses iterative fold-and-invert transforms to generate infinitely detailed aperiodic fractal patterns, combined with HSV coloring
+
+**Key modified code**:
+```glsl
+float apollian(vec4 p, float s) {
+    float scale = 1.0;
+    for (int i = 0; i < 7; ++i) {     // Tunable: iteration count (5~12)
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // Space folding
+        float r2 = dot(p, p);
+        float k = s / r2;              // Tunable: s is scaling factor (1.0~1.5)
+        p *= k;                        // Inversion mapping
+        scale *= k;
+    }
+    return abs(p.y) / scale;
+}
+
+// 4D slice animation for smooth morphing
+vec4 pp = vec4(p.x, p.y, 0.0, 0.0) + offset;
+pp.w = 0.125 * (1.0 - tanh(length(pp.xyz)));
+float d = apollian(pp / 4.0, 1.2) * 4.0;
+
+// HSV coloring
+float hue = fract(0.75 * length(p) - 0.3 * iTime) + 0.3;
+float sat = 0.75 * tanh(2.0 * length(p));
+vec3 col = hsv2rgb(vec3(hue, sat, 1.0));
+```
+
+---
+
+## In-Depth Performance Optimization
+
+### 1. Control Iteration Count
+The iteration loop is the biggest performance bottleneck. Increasing `NUM_LAYERS` from 4 to 8 halves performance. On mobile, keep it at 3 or fewer layers.
+
+### 2. Avoid Branching
+Replace `if/else` with branchless `step()`/`smoothstep()`/`mix()` alternatives:
+```glsl
+// Bad: if(rnd > 0.5) p.y = -p.y;
+// Good: p.y *= sign(rnd - 0.5);  // or use mix
+```
+
+### 3. Merge Distance Field Calculations
+Combine multiple shape SDFs using `min()`/`max()` and apply a single `smoothstep`, rather than rendering each shape separately.
+
+### 4. Precompute Constants
+Compute `sin`/`cos` pairs (e.g., rotation matrices) once outside the loop; write irrational numbers like `1.7320508` (sqrt(3)) as direct constants.
+
+### 5. Minimize `atan` Calls
+`atan` is an expensive function. If you only need periodic angular variation, consider approximating with `dot`.
+
+### 6. LOD Strategy
+Reduce iteration count at distance/when zoomed out:
+```glsl
+int iters = int(mix(3.0, float(MAX_ITER), smoothstep(0.0, 1.0, 1.0 / scale)));
+```
+
+### 7. Use `smoothstep` Instead of `pow`
+`pow(x, n)` is slower than `smoothstep` on some GPUs, and `smoothstep` naturally clamps to [0,1].
+
+---
+
+## Complete Combination Suggestion Examples
+
+### 1. + Noise Texture
+Overlay Perlin/Simplex noise perturbation on distance fields to give geometric patterns an organic/eroded feel. Triangle noise (as used in "Overly Satisfying") is an efficient low-cost alternative:
+```glsl
+d += triangleNoise(uv * 10.0) * 0.05; // Noise perturbation amount is tunable
+```
+
+### 2. + Post-Processing Cross-Hatch
+Overlay cross-hatching effects on patterns to simulate hand-drawn/printmaking style (as used in "Hexagonal Maze Flow"):
+```glsl
+float gr = dot(col, vec3(0.299, 0.587, 0.114)); // Grayscale
+float hatch = (gr < 0.45) ? clamp(sin((uv.x - uv.y) * 125.6) * 2.0 + 1.5, 0.0, 1.0) : 1.0;
+col *= hatch * 0.5 + 0.5;
+```
+
+### 3. + SDF Boolean Operations
+Combine multiple base patterns through `min` (union), `max` (intersection), and subtraction into complex geometry:
+```glsl
+float d = max(hexSDF, -circleSDF); // Hexagon minus circle = hexagonal ring
+```
+
+### 4. + Domain Warping
+Apply sin/cos distortion to UVs before domain repetition, producing flowing/swirling effects:
+```glsl
+uv += 0.05 * vec2(sin(uv.y * 5.0 + iTime), sin(uv.x * 3.0 + iTime));
+```
+
+### 5. + Radial Blur / Motion Blur
+Average multiple samples in the polar coordinate direction on the final color, producing rotational motion blur to enhance dynamism.
+
+### 6. + Pseudo-3D Lighting
+Use SDF gradients as normals and add simple diffuse/specular lighting to give 2D patterns a relief/embossed appearance (as in "Apollian with a twist" shadow casting method).
--- a/skills/shader-dev/reference/procedural-noise.md
+++ b/skills/shader-dev/reference/procedural-noise.md
@@ -0,0 +1,551 @@
+# Procedural Noise — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform, varying, built-in functions (`fract`, `floor`, `mix`, `smoothstep`, `dot`, `sin`/`cos`)
+- **Vector Math**: dot product, cross product, matrix multiplication (`mat2` rotation matrix)
+- **Coordinate Spaces**: UV coordinate normalization, screen aspect ratio correction
+- **Interpolation Theory**: linear interpolation, Hermite interpolation `3t^2-2t^3` (smoothstep)
+- **ShaderToy Environment**: `iTime`, `iResolution`, `fragCoord`, `mainImage` signature
+
+## Use Cases in Detail
+
+Procedural noise is the most fundamental and versatile technique in real-time GPU graphics, applicable to:
+
+- **Natural phenomena simulation**: fire, clouds, water surfaces, lava, lightning, smoke, etc.
+- **Terrain generation**: mountains, canyons, erosion landscapes, snowline distribution
+- **Texture synthesis**: marble textures, wood grain, organic patterns, abstract art
+- **Volume rendering**: volumetric clouds, volumetric fog, light scattering
+- **Motion effects**: fluid simulation approximation, particle trajectory perturbation, domain warping animation
+
+Core idea: instead of using pre-made textures, generate pseudo-random, spatially continuous signals in real-time on the GPU through mathematical functions, then produce rich multi-scale detail through fractal summation (FBM) and Domain Warping.
+
+## Core Principles in Detail
+
+### 1. Noise Functions — Building Continuous Pseudo-Random Signals
+
+The essence of a noise function is: **generate random values at integer lattice points, then smoothly interpolate between them**.
+
+Two mainstream implementations:
+
+**Value Noise**: each lattice point stores a random scalar, bilinear interpolation yields a continuous field.
+- Formula: `N(p) = mix(mix(h00, h10, u), mix(h01, h11, u), v)`, where `u,v` are the fractional parts after Hermite smoothing
+
+**Simplex Noise**: uses gradient dot products + radial falloff kernels on a triangular lattice (2D) or tetrahedral lattice (3D).
+- Advantages: fewer lattice lookups (2D: 3 vs 4), no axis-aligned artifacts, lower computational cost
+- Core: skew transform maps square grid to triangular grid, using `K1=(sqrt(3)-1)/2` for skewing, `K2=(3-sqrt(3))/6` for unskewing
+
+### 2. Hash Functions — Source of Lattice Random Values
+
+Hash functions map integer coordinates to pseudo-random values in [0,1] or [-1,1]:
+
+- **sin-based hash** (classic but has precision risks): `fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453)`
+- **sin-free hash** (cross-platform stable): pure arithmetic `fract(p * 0.1031)` + `dot` mixing + `fract` output
+
+### 3. FBM (Fractional Brownian Motion) — Multi-Scale Detail Summation
+
+Sum multiple noise "octaves" at different frequencies and amplitudes:
+
+```
+FBM(p) = sum( amplitude_i * noise(frequency_i * p) )
+```
+
+Standard parameters:
+- **Lacunarity (frequency multiplier)**: each octave's frequency multiplied by ~2.0
+- **Persistence/Gain (amplitude decay)**: each octave's amplitude multiplied by ~0.5
+- **Inter-octave rotation**: use a rotation matrix to eliminate axis-aligned artifacts
+
+### 4. Domain Warping — Organic Distortion
+
+Feed the output of noise back as coordinate offsets, producing distorted organic patterns:
+- **Single-layer warping**: `fbm(p + fbm(p))`
+- **Multi-layer cascade**: `fbm(p + fbm(p + fbm(p)))` — classic three-layer domain warping
+
+### 5. FBM Variants — Different Visual Characteristics
+
+| Variant | Formula | Visual Effect |
+|---------|---------|---------------|
+| Standard FBM | `sum( a*noise(p) )` | Smooth, soft (cloud interiors) |
+| Ridged FBM | `sum( a*abs(noise(p)) )` | Sharp creases (ridges, lightning) |
+| Sinusoidal ridged | `sum( a*sin(noise(p)*k) )` | Periodic ridges (lava) |
+| Erosion FBM | `sum( a*noise(p)/(1+dot(d,d)) )` | Smooth ridges, fine valleys (terrain) |
+| Sea wave FBM | `sum( a*octave_fn(p) )` | Sharp wave crests (ocean surface) |
+
+## Step-by-Step Implementation Details
+
+### Step 1: Hash Function
+
+**What**: Implement a hash function that maps 2D integer coordinates to pseudo-random values.
+
+**Why**: Hashing is the fundamental building block of all noise. The sin-free version is stable across GPUs; the sin version is more concise.
+
+**Code (sin-free version)**:
+```glsl
+// 2D -> 1D hash, sin-free, cross-platform stable
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * .1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// 2D -> 2D hash (for gradient noise)
+vec2 hash22(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * vec3(.1031, .1030, .0973));
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.xx + p3.yz) * p3.zy);
+}
+```
+
+**Code (classic sin version)**:
+```glsl
+float hash(vec2 p) {
+    float h = dot(p, vec2(127.1, 311.7));
+    return fract(sin(h) * 43758.5453123);
+}
+
+// Gradient version, output [-1, 1]
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return -1.0 + 2.0 * fract(sin(p) * 43758.5453123);
+}
+```
+
+### Step 2: Value Noise
+
+**What**: Perform Hermite-smoothed interpolation between hashed values at integer lattice points to obtain a continuous 2D noise field.
+
+**Why**: Value noise is the simplest noise implementation with minimal code, suitable as a foundation for FBM and domain warping. Using the `smoothstep` polynomial `3t^2-2t^3` directly guarantees C1 continuity (no seam discontinuities).
+
+**Code**:
+```glsl
+float noise(in vec2 x) {
+    vec2 p = floor(x);    // Integer lattice point
+    vec2 f = fract(x);    // Fractional part within cell
+    f = f * f * (3.0 - 2.0 * f);  // Hermite smoothing (can substitute quintic: 6t^5-15t^4+10t^3)
+    float a = hash(p + vec2(0.0, 0.0));
+    float b = hash(p + vec2(1.0, 0.0));
+    float c = hash(p + vec2(0.0, 1.0));
+    float d = hash(p + vec2(1.0, 1.0));
+    return mix(mix(a, b, f.x), mix(c, d, f.x), f.y);  // Bilinear interpolation
+}
+```
+
+### Step 3: Simplex Noise
+
+**What**: Use gradient dot products and radial falloff kernels on a triangular grid to generate isotropic 2D noise.
+
+**Why**: Compared to value noise, Simplex Noise has no axis-aligned artifacts, lower computational cost (2D requires only 3 lattice points instead of 4), and higher visual quality. Suitable for scenarios requiring high-quality noise (fire, clouds).
+
+**Code**:
+```glsl
+float noise(in vec2 p) {
+    const float K1 = 0.366025404;  // (sqrt(3)-1)/2 — skew factor
+    const float K2 = 0.211324865;  // (3-sqrt(3))/6 — unskew factor
+
+    vec2 i = floor(p + (p.x + p.y) * K1);  // Skew to triangular grid
+
+    vec2 a = p - i + (i.x + i.y) * K2;     // Vertex 0 offset
+    vec2 o = (a.x > a.y) ? vec2(1.0, 0.0) : vec2(0.0, 1.0);  // Determine which triangle
+    vec2 b = a - o + K2;                    // Vertex 1 offset
+    vec2 c = a - 1.0 + 2.0 * K2;           // Vertex 2 offset
+
+    vec3 h = max(0.5 - vec3(dot(a, a), dot(b, b), dot(c, c)), 0.0);  // Radial falloff
+    vec3 n = h * h * h * h * vec3(  // h^4 kernel * gradient dot product
+        dot(a, hash2(i + 0.0)),
+        dot(b, hash2(i + o)),
+        dot(c, hash2(i + 1.0))
+    );
+    return dot(n, vec3(70.0));  // Normalize to ~[-1, 1]
+}
+```
+
+### Step 4: Standard FBM (Fractional Brownian Motion)
+
+**What**: Sum multiple octaves of noise with decreasing amplitudes to obtain a multi-scale fractal signal.
+
+**Why**: A single noise octave has a single frequency and cannot produce the multi-scale detail found in nature. FBM simulates fractal self-similarity by summing noise at different frequencies. **The inter-octave rotation matrix is a key technique** that breaks axis-aligned artifacts.
+
+**Code (4-octave loop version)**:
+```glsl
+#define OCTAVES 4           // Tunable: number of octaves (1-8), more = richer detail but more expensive
+#define GAIN 0.5            // Tunable: amplitude decay (0.3-0.7), higher = more prominent high frequencies
+#define LACUNARITY 2.0      // Tunable: frequency multiplier (1.5-3.0), higher = larger gap between octaves
+
+float fbm(vec2 p) {
+    // Encodes both rotation and scaling, eliminates axis-aligned artifacts
+    // |m| = sqrt(1.6^2+1.2^2) = 2.0, rotation angle ~ 36.87 degrees
+    mat2 m = mat2(1.6, 1.2, -1.2, 1.6);
+
+    float f = 0.0;
+    float a = 0.5;   // Initial amplitude
+    for (int i = 0; i < OCTAVES; i++) {
+        f += a * noise(p);
+        p = m * p;    // Rotation + frequency scaling
+        a *= GAIN;    // Amplitude decay
+    }
+    return f;
+}
+```
+
+**Manually unrolled version (with slightly varying lacunarity)**:
+```glsl
+// Slightly varying lacunarity (2.01, 2.02, 2.03...) breaks exact self-similarity
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80);  // Pure rotation ~36.87 degrees
+
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;  // Normalization
+}
+```
+
+### Step 5: Ridged FBM
+
+**What**: Take the absolute value of noise before summation, producing sharp "ridges" at zero crossings.
+
+**Why**: Standard FBM produces overly smooth patterns and cannot represent sharp structures like lightning, mountain ridges, or cracks. The `abs()` operation folds the noise's zero crossings into sharp V-shaped ridge lines.
+
+**Code**:
+```glsl
+float fbm_ridged(in vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        // abs((noise-0.5)*2) maps [0,1] to a V-shape in [0,1]
+        rz += abs((noise(p) - 0.5) * 2.0) / z;
+        z *= 2.0;   // Amplitude decay (1/z)
+        p *= 2.0;   // Frequency scaling
+    }
+    return rz;
+}
+```
+
+**Sinusoidal ridged variant**:
+```glsl
+// sin(noise*7) produces smoother periodic ridges, suitable for lava textures
+rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;
+```
+
+### Step 6: Domain Warping
+
+**What**: Use the output of noise/FBM to distort the input coordinates of subsequent noise, producing organic distortion patterns.
+
+**Why**: Domain warping is the core technique for producing "painterly", "ink wash", "geological" and other organic patterns. The number of nested warping layers controls complexity.
+
+**Basic domain warping**:
+```glsl
+// Low-frequency FBM as offset to distort subsequent sampling
+float q = fbm(uv * 0.5);   // Low-frequency domain warping field
+uv -= q - time;             // Use q to offset sampling coordinates
+float f = fbm(uv);          // Sample at warped coordinates
+```
+
+**Classic three-layer cascaded domain warping**:
+```glsl
+// Two independent FBMs produce decorrelated vec2 offsets
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));  // Different offsets for decorrelation
+}
+
+float func(vec2 q, out vec2 o, out vec2 n) {
+    // Layer 1: q -> 4-octave FBM -> 2D offset field o
+    o = 0.5 + 0.5 * fbm4_2(q);
+
+    // Layer 2: o -> 6-octave FBM -> 2D offset field n (higher frequency)
+    n = fbm6_2(4.0 * o);
+
+    // Layer 3: original coordinates + offsets -> final FBM sampling
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p);
+
+    // Contrast enhancement: boost contrast in heavily warped areas
+    f = mix(f, f * f * f * 3.5, f * abs(n.x));
+    return f;
+}
+```
+
+**Dual-axis FBM domain warping**:
+```glsl
+float dualfbm(in vec2 p) {
+    vec2 p2 = p * 0.7;
+    // Two independent FBMs offset X/Y axes separately, different time offsets avoid symmetry
+    vec2 basis = vec2(fbm(p2 - time * 1.6), fbm(p2 + time * 1.7));
+    basis = (basis - 0.5) * 0.2;  // Center + scale
+    p += basis;
+    return fbm(p * makem2(time * 0.2));  // Final sampling after rotation
+}
+```
+
+### Step 7: Flow Noise
+
+**What**: Apply independent gradient field displacement within each FBM octave, simulating fluid transport effects.
+
+**Why**: Ordinary domain warping is "global" (distorting before or after FBM), while flow noise is "per-octave" — each frequency layer has its own flow direction and speed, producing extremely realistic lava and fluid effects.
+
+**Code**:
+```glsl
+#define FLOW_SPEED 0.6       // Tunable: main flow speed
+#define BASE_SPEED 1.9       // Tunable: base point flow speed
+#define ADVECTION 0.77       // Tunable: advection factor (0.5=stable, 0.95=turbulent)
+#define GRAD_SCALE 0.5       // Tunable: gradient displacement strength
+
+// Noise gradient (central differences)
+vec2 gradn(vec2 p) {
+    float ep = 0.09;
+    float gradx = noise(vec2(p.x + ep, p.y)) - noise(vec2(p.x - ep, p.y));
+    float grady = noise(vec2(p.x, p.y + ep)) - noise(vec2(p.x, p.y - ep));
+    return vec2(gradx, grady);
+}
+
+float flow(in vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    vec2 bp = p;  // Base point (prevents advection divergence)
+    for (float i = 1.0; i < 7.0; i++) {
+        p += time * FLOW_SPEED;                        // Main flow displacement
+        bp += time * BASE_SPEED;                       // Base flow displacement
+        vec2 gr = gradn(i * p * 0.34 + time * 1.0);   // Noise gradient field
+        gr *= makem2(time * 6.0 - (0.05 * p.x + 0.03 * p.y) * 40.0);  // Spatially varying rotation
+        p += gr * GRAD_SCALE;                          // Gradient displacement
+        rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z; // Sinusoidal ridged accumulation
+        p = mix(bp, p, ADVECTION);                     // Mix back to base (prevent divergence)
+        z *= 1.4;   // Amplitude decay
+        p *= 2.0;   // Frequency scaling
+        bp *= 1.9;  // Base frequency scaling (slightly different)
+    }
+    return rz;
+}
+```
+
+### Step 8: Derivative FBM
+
+**What**: Track the analytical gradient of noise during FBM accumulation, using the accumulated gradient magnitude to suppress high-frequency detail in steep areas.
+
+**Why**: This is a signature technique for terrain rendering. Standard FBM adds detail uniformly across all areas, but natural terrain has smooth ridges due to hydraulic erosion while valleys retain fine detail. Derivative FBM automatically simulates this erosion effect through the `1/(1+|gradient|^2)` factor.
+
+**Code**:
+```glsl
+// Value noise with analytical derivative: returns vec3(value, d/dx, d/dy)
+vec3 noised(in vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 u = f * f * (3.0 - 2.0 * f);           // Hermite interpolation
+    vec2 du = 6.0 * f * (1.0 - f);               // Hermite derivative (analytical)
+
+    float a = hash(p + vec2(0, 0));
+    float b = hash(p + vec2(1, 0));
+    float c = hash(p + vec2(0, 1));
+    float d = hash(p + vec2(1, 1));
+
+    return vec3(
+        a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y,  // Value
+        du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx)                  // Gradient
+    );
+}
+
+#define TERRAIN_OCTAVES 16   // Tunable: terrain octave count (5-16), more = finer detail
+#define TERRAIN_GAIN 0.5     // Tunable: amplitude decay
+
+float terrainFBM(in vec2 x) {
+    const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);  // Pure rotation ~36.87 degrees
+    float a = 0.0;       // Accumulated value
+    float b = 1.0;       // Current amplitude
+    vec2 d = vec2(0.0);  // Accumulated gradient
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        vec3 n = noised(x);    // (value, dx, dy)
+        d += n.yz;             // Accumulate gradient
+        a += b * n.x / (1.0 + dot(d, d));  // Key: larger gradient = smaller contribution (erosion effect)
+        b *= TERRAIN_GAIN;
+        x = m2 * x * 2.0;     // Rotation + frequency scaling
+    }
+    return a;
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Ridged FBM (Ridged/Turbulent FBM)
+
+- **Difference from base version**: applies `abs()` to noise values, producing sharp ridge lines at zero crossings
+- **Use cases**: lightning, mountain ridges, cracks, veins, electric arcs
+- **Key modified code**:
+```glsl
+// Standard FBM line:
+f += a * noise(p);
+// Changed to ridged:
+f += a * abs(noise(p));
+// Or sinusoidal ridged (smoother periodic ridges, suitable for lava):
+f += a * (sin(noise(p) * 7.0) * 0.5 + 0.5);
+```
+
+### Variant 2: Domain Warped FBM
+
+- **Difference from base version**: FBM output is fed back as coordinate offsets, producing organic distortion
+- **Use cases**: cloud deformation, geological textures, ink wash style, abstract art
+- **Key modified code**:
+```glsl
+// Classic three-layer domain warping
+vec2 o = 0.5 + 0.5 * vec2(fbm(q + vec2(1.0)), fbm(q + vec2(6.2)));
+vec2 n = vec2(fbm(4.0 * o + vec2(9.2)), fbm(4.0 * o + vec2(5.7)));
+float f = 0.5 + 0.5 * fbm(q + 2.0 * n + 1.0);
+```
+
+### Variant 3: Derivative Erosion FBM
+
+- **Difference from base version**: tracks analytical gradient, suppresses high frequencies in steep areas (simulates hydraulic erosion)
+- **Use cases**: realistic terrain, mountains, canyons
+- **Key modified code**:
+```glsl
+vec2 d = vec2(0.0);  // Accumulated gradient
+for (int i = 0; i < N; i++) {
+    vec3 n = noised(p);       // (value, dx, dy)
+    d += n.yz;                // Accumulate gradient
+    a += b * n.x / (1.0 + dot(d, d));  // Key: divide by gradient magnitude
+    b *= 0.5;
+    p = m2 * p * 2.0;
+}
+```
+
+### Variant 4: Flow Noise
+
+- **Difference from base version**: applies independent gradient field displacement within each octave, simulating fluid transport
+- **Use cases**: lava, liquid metal, flowing magma
+- **Key modified code**:
+```glsl
+for (float i = 1.0; i < 7.0; i++) {
+    vec2 gr = gradn(i * p * 0.34 + time);                              // Gradient field
+    gr *= makem2(time * 6.0 - (0.05 * p.x + 0.03 * p.y) * 40.0);     // Spatially varying rotation
+    p += gr * 0.5;                                                      // Displacement
+    rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;                      // Accumulation
+    p = mix(bp, p, 0.77);                                               // Mix back to base
+}
+```
+
+### Variant 5: Custom Sea Octave FBM
+
+- **Difference from base version**: uses `1-abs(sin(uv))` to construct peaked waveforms, combined with bidirectional propagation and choppy decay
+- **Use cases**: ocean water surface, waves
+- **Key modified code**:
+```glsl
+float sea_octave(vec2 uv, float choppy) {
+    uv += noise(uv);                      // Noise domain perturbation
+    vec2 wv = 1.0 - abs(sin(uv));         // Peaked waveform
+    vec2 swv = abs(cos(uv));              // Smooth waveform
+    wv = mix(wv, swv, wv);               // Adaptive blending
+    return pow(1.0 - pow(wv.x * wv.y, 0.65), choppy);
+}
+// Bidirectional propagation in FBM loop:
+d  = sea_octave((uv + SEA_TIME) * freq, choppy);
+d += sea_octave((uv - SEA_TIME) * freq, choppy);
+choppy = mix(choppy, 1.0, 0.2);  // Higher octaves are smoother
+```
+
+## Performance Optimization Details
+
+### 1. Reduce Octave Count (Most Direct)
+
+Each additional octave doubles the noise sampling cost. Distant objects can use fewer octaves:
+```glsl
+// LOD-aware octave count
+int oct = 5 - int(log2(1.0 + t * 0.5));  // Fewer octaves at greater distances
+```
+
+### 2. Multi-Level LOD Strategy
+
+Provide functions at different precision levels for different purposes:
+```glsl
+float terrainL(vec2 x) { /* 3 octaves — for camera height */ }
+float terrainM(vec2 x) { /* 9 octaves — for ray marching */ }
+float terrainH(vec2 x) { /* 16 octaves — for normal calculation */ }
+```
+
+### 3. Use Texture Sampling Instead of Math
+
+Store precomputed noise in textures, using hardware texture filtering instead of arithmetic hashing:
+```glsl
+float noise(in vec2 x) { return texture(iChannel0, x * 0.01).x; }
+// Or use texelFetch for exact lookup:
+float a = texelFetch(iChannel0, (p + 0) & 255, 0).x;
+```
+
+### 4. Manually Unroll Loops
+
+GLSL compilers typically optimize manually unrolled small loops (4-6 iterations) better than `for` loops, and allow slightly varying lacunarity per octave.
+
+### 5. Adaptive Step Size (Volume Rendering)
+
+```glsl
+// Step size grows linearly with distance
+float dt = max(0.05, 0.02 * t);
+```
+
+### 6. Directional Derivative Instead of Full Gradient (Volumetric Lighting)
+
+```glsl
+// 1 extra sample vs 3
+float dif = clamp((den - map(pos + 0.3 * sundir)) / 0.25, 0.0, 1.0);
+```
+
+### 7. Early Termination
+
+```glsl
+if (sum.a > 0.99) break;  // Volume is already opaque, stop marching
+```
+
+## Combination Suggestions in Detail
+
+### 1. FBM + Ray Marching
+
+Noise drives a height field or density field, ray marching finds intersections. This is the standard combination for terrain and ocean surface rendering:
+- Height field: `height = terrainFBM(pos.xz)`, ray march to find the intersection where `pos.y == height`
+- Volume field: `density = fbm(pos)`, forward-accumulate transmittance and color
+
+### 2. FBM + Finite Difference Normals + Lighting
+
+Use finite differences on a 2D noise field to estimate normals, adding pseudo-3D lighting effects:
+```glsl
+vec3 nor = normalize(vec3(f(p+ex)-f(p), epsilon, f(p+ey)-f(p)));
+float dif = dot(nor, lightDir);
+```
+
+### 3. FBM + Color Mapping
+
+Map the same scalar at different power exponents to RGB channels, producing natural color gradients:
+```glsl
+vec3 col = vec3(1.5*c, 1.5*c*c*c, c*c*c*c*c*c);  // Fire: red -> orange -> yellow -> white
+```
+Or inverse color mapping:
+```glsl
+vec3 col = vec3(0.2, 0.07, 0.01) / rz;  // Areas with small ridge values are brightest
+```
+
+### 4. FBM + Fresnel Water Surface Coloring
+
+Noise drives water surface waveforms, Fresnel equations blend reflected sky and refracted water color:
+```glsl
+float fresnel = pow(1.0 - dot(n, -eye), 3.0);
+vec3 color = mix(refracted, reflected, fresnel);
+```
+
+### 5. Multi-Layer FBM Compositing
+
+Different FBM layers with different parameters control different properties:
+- **Shape layer**: low-frequency standard FBM controls cloud shape
+- **Ridged layer**: mid-frequency ridged FBM adds edge detail
+- **Color layer**: high-frequency FBM controls cloud interior color variation
+- **Combination**: `f *= r + f;` shape * ridged produces sharp edges
+
+### 6. FBM + Volumetric Lighting (Directional Derivative)
+
+In volume rendering, the density difference along the light direction approximates lighting:
+```glsl
+float shadow = clamp((density_here - density_toward_sun) / scale, 0.0, 1.0);
+vec3 lit_color = mix(shadow_color, light_color, shadow);
+```
--- a/skills/shader-dev/reference/ray-marching.md
+++ b/skills/shader-dev/reference/ray-marching.md
@@ -0,0 +1,396 @@
+# Ray Marching Detailed Reference
+
+This document serves as a detailed reference for the Ray Marching Skill, covering prerequisites, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, varyings, built-in functions (`mix`, `clamp`, `smoothstep`, `normalize`, `dot`, `cross`, `reflect`, `refract`)
+- **Vector Math**: dot product, cross product, vector normalization, matrix multiplication
+- **Coordinate Systems**: transformations from screen space to NDC to view space to world space
+- **Basic Lighting Models**: diffuse (Lambertian), specular (Phong/Blinn-Phong)
+
+## Implementation Steps in Detail
+
+### Step 1: UV Coordinate Normalization and Ray Direction Computation
+
+**What**: Convert pixel coordinates to normalized coordinates in the [-1,1] range, and compute the ray direction from the camera.
+
+**Why**: This establishes the mapping from screen pixels to the 3D world. Dividing by `iResolution.y` preserves the aspect ratio; the z component controls the field of view.
+
+```glsl
+// Method A: Concise version (common for quick prototyping)
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+vec3 ro = vec3(0.0, 0.0, -3.0);             // Ray origin (camera position)
+vec3 rd = normalize(vec3(uv, 1.0));          // Ray direction, z=1.0 gives ~90° FOV
+
+// Method B: Precise FOV control
+vec2 xy = fragCoord - iResolution.xy / 2.0;
+float z = iResolution.y / tan(radians(FOV) / 2.0); // FOV is adjustable: field of view in degrees
+vec3 rd = normalize(vec3(xy, -z));
+```
+
+### Step 2: Building the Camera Matrix (Look-At)
+
+**What**: Construct a view matrix from the camera position, target point, and up direction, then transform the view-space ray direction into world space.
+
+**Why**: Without a camera matrix, the ray direction is fixed along -Z. With a Look-At matrix, the camera can be freely positioned and rotated.
+
+```glsl
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);                     // Forward direction
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);            // Up reference (cr controls roll)
+    vec3 cu = normalize(cross(cw, cp));                // Right direction
+    vec3 cv = cross(cu, cw);                           // Up direction
+    return mat3(cu, cv, cw);
+}
+
+// Usage:
+mat3 ca = setCamera(ro, ta, 0.0);
+vec3 rd = ca * normalize(vec3(uv, FOCAL_LENGTH)); // FOCAL_LENGTH adjustable: 1.0~3.0, larger = narrower FOV
+```
+
+### Step 3: Defining the Scene SDF
+
+**What**: Write a function that returns the signed distance from any point in space to the nearest surface.
+
+**Why**: The SDF is the core of Ray Marching — it simultaneously defines geometry and step distance.
+
+```glsl
+// --- Basic SDF Primitives ---
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// --- CSG Boolean Operations ---
+float opUnion(float a, float b)        { return min(a, b); }
+float opSubtraction(float a, float b)  { return max(a, -b); }
+float opIntersection(float a, float b) { return max(a, b); }
+
+// --- Smooth Boolean Operations (organic blending) ---
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;  // k adjustable: blend radius, 0.1~0.5
+}
+
+// --- Spatial Transforms ---
+// Translation: apply inverse translation to the sample point
+// Rotation: multiply the sample point by a rotation matrix
+// Scaling: p /= s, result *= s
+
+// --- Scene Composition Example ---
+float map(vec3 p) {
+    float d = sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5);   // Sphere
+    d = opUnion(d, p.y);                                    // Add ground plane
+    d = smin(d, sdBox(p - vec3(1.0, 0.3, 0.0), vec3(0.3)), 0.2); // Smooth blend with box
+    return d;
+}
+```
+
+### Step 4: Core Ray Marching Loop
+
+**What**: Iteratively step along the ray direction, using the SDF value at each step to determine the advance distance, and check whether the ray has hit a surface or exceeded the maximum range.
+
+**Why**: Sphere Tracing guarantees that each step advances the maximum safe distance (without penetrating surfaces), taking large steps in open areas and automatically slowing down near surfaces.
+
+```glsl
+#define MAX_STEPS 128   // Adjustable: max step count, 64~256, more = more precise but slower
+#define MAX_DIST 100.0  // Adjustable: max travel distance
+#define SURF_DIST 0.001 // Adjustable: surface hit threshold, 0.0001~0.01
+
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < SURF_DIST) return t;   // Surface hit
+        t += d;
+        if (t > MAX_DIST) break;        // Out of range
+    }
+    return -1.0; // No hit
+}
+```
+
+### Step 5: Normal Estimation
+
+**What**: Compute the surface normal at the hit point using the numerical gradient of the SDF.
+
+**Why**: Normals are the foundation of lighting calculations. The gradient direction of the SDF is the surface normal direction.
+
+```glsl
+// Method A: Central differences (6 SDF calls, straightforward)
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);  // e.x adjustable: differentiation step size
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// Method B: Tetrahedron trick (4 SDF calls, prevents compiler inline bloat, recommended)
+vec3 calcNormal(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = 0; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.001 * e);
+    }
+    return normalize(n);
+}
+```
+
+### Step 6: Lighting and Shading
+
+**What**: Compute Phong lighting (ambient + diffuse + specular) at the hit point.
+
+**Why**: Give SDF surfaces realistic shading with highlights and shadow gradients.
+
+```glsl
+vec3 shade(vec3 p, vec3 rd) {
+    vec3 nor = calcNormal(p);
+    vec3 lightDir = normalize(vec3(0.6, 0.35, 0.5));   // Light direction (adjustable)
+    vec3 viewDir = -rd;
+    vec3 halfDir = normalize(lightDir + viewDir);
+
+    // Diffuse
+    float diff = clamp(dot(nor, lightDir), 0.0, 1.0);
+    // Specular
+    float spec = pow(clamp(dot(nor, halfDir), 0.0, 1.0), SHININESS); // SHININESS adjustable: 8~64
+    // Ambient + sky light
+    float sky = sqrt(clamp(0.5 + 0.5 * nor.y, 0.0, 1.0));
+
+    vec3 col = vec3(0.2, 0.2, 0.25);             // Material base color (adjustable)
+    vec3 lin = vec3(0.0);
+    lin += diff * vec3(1.3, 1.0, 0.7) * 2.2;     // Main light
+    lin += sky  * vec3(0.4, 0.6, 1.15) * 0.6;    // Sky light
+    lin += vec3(0.25) * 0.55;                      // Fill light
+    col *= lin;
+    col += spec * vec3(1.3, 1.0, 0.7) * 5.0;     // Specular highlight
+
+    return col;
+}
+```
+
+### Step 7: Post-Processing (Gamma Correction and Tone Mapping)
+
+**What**: Convert linear lighting results to sRGB space and apply tone mapping to prevent overexposure.
+
+**Why**: GPU computations are done in linear space, but displays require gamma-corrected values. Tone mapping compresses HDR values into the [0,1] range.
+
+```glsl
+// Gamma correction
+col = pow(col, vec3(0.4545));  // i.e., 1/2.2
+
+// Optional: Reinhard tone mapping (before gamma)
+col = col / (1.0 + col);
+
+// Optional: Vignette
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.25);
+```
+
+## Common Variants in Detail
+
+### 1. Volumetric Ray Marching
+
+**Difference from the basic version**: Instead of finding a surface intersection, the ray advances in **fixed steps**, accumulating density/color at each step. Used for flames, smoke, and clouds.
+
+**Key modified code**:
+```glsl
+#define VOL_STEPS 150       // Adjustable: volume sample count
+#define VOL_STEP_SIZE 0.05  // Adjustable: step size
+
+// Density field (built with FBM noise)
+float fbmDensity(vec3 p) {
+    float den = 0.2 - p.y;                                    // Base height falloff
+    vec3 q = p - vec3(0.0, 1.0, 0.0) * iTime;
+    float f  = 0.5000 * noise(q); q = q * 2.02 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.2500 * noise(q); q = q * 2.03 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.1250 * noise(q); q = q * 2.01 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.0625 * noise(q);
+    return den + 4.0 * f;
+}
+
+// Volumetric marching main function
+vec3 volumetricMarch(vec3 ro, vec3 rd) {
+    vec4 sum = vec4(0.0);
+    float t = 0.05;
+    for (int i = 0; i < VOL_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float den = fbmDensity(pos);
+        if (den > 0.0) {
+            den = min(den, 1.0);
+            vec3 col = mix(vec3(1.0, 0.5, 0.05), vec3(0.48, 0.53, 0.5),
+                           clamp(pos.y * 0.5, 0.0, 1.0));  // Fire-to-smoke color gradient
+            col *= den;
+            col.a = den * 0.6;
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);                     // Front-to-back compositing
+            if (sum.a > 0.99) break;                         // Early exit
+        }
+        t += VOL_STEP_SIZE;
+    }
+    return clamp(sum.rgb, 0.0, 1.0);
+}
+```
+
+### 2. CSG Scene Construction (Constructive Solid Geometry)
+
+**Difference from the basic version**: Combines multiple SDF primitives using `min` (union), `max` (intersection), and `max(a,-b)` (subtraction), along with rotation/translation transforms to create complex mechanical parts.
+
+**Key modified code**:
+```glsl
+float sceneSDF(vec3 p) {
+    p = rotateY(iTime * 0.5) * p;                                // Rotate entire scene
+
+    float sphere = sdSphere(p, 1.2);
+    float cube = sdBox(p, vec3(0.9));
+    float cyl = sdCylinder(p, vec2(0.4, 2.0));                   // Vertical cylinder
+    float cylX = sdCylinder(p.yzx, vec2(0.4, 2.0));              // X-axis cylinder (swizzled)
+    float cylZ = sdCylinder(p.xzy, vec2(0.4, 2.0));              // Z-axis cylinder
+
+    // Sphere ∩ Cube - three-axis cylinders = nut shape
+    return opSubtraction(
+        opIntersection(sphere, cube),
+        opUnion(cyl, opUnion(cylX, cylZ))
+    );
+}
+```
+
+### 3. Physically-Based Volumetric Scattering
+
+**Difference from the basic version**: Uses physically correct extinction coefficients, scattering coefficients, and transmittance formulas, with volumetric shadows (marching toward the light source to compute transmittance). Based on Frostbite engine's energy-conserving integration formula.
+
+**Key modified code**:
+```glsl
+void getParticipatingMedia(out float sigmaS, out float sigmaE, vec3 pos) {
+    float heightFog = 0.3 * clamp((7.0 - pos.y), 0.0, 1.0);  // Height fog
+    sigmaS = 0.02 + heightFog;                                  // Scattering coefficient
+    sigmaE = max(0.000001, sigmaS);                              // Extinction coefficient (includes absorption)
+}
+
+// Energy-conserving scattering integral (Frostbite improved version)
+vec3 S = lightColor * sigmaS * phaseFunction() * volShadow;     // Incoming light
+vec3 Sint = (S - S * exp(-sigmaE * stepLen)) / sigmaE;          // Integrate current step
+scatteredLight += transmittance * Sint;                          // Accumulate
+transmittance *= exp(-sigmaE * stepLen);                         // Update transmittance
+```
+
+### 4. Glow Accumulation
+
+**Difference from the basic version**: During the Ray March loop, additionally tracks the closest distance from the ray to the surface `dM`. Even without a hit, this produces a glow effect. Commonly used for glowing spheres and plasma.
+
+**Key modified code**:
+```glsl
+vec2 rayMarchWithGlow(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    float dMin = MAX_DIST;                    // Track minimum distance
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < dMin) dMin = d;               // Update closest distance
+        if (d < SURF_DIST) break;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return vec2(t, dMin);
+}
+
+// Add glow based on dMin during shading
+float glow = 0.02 / max(dMin, 0.001);        // Closer = brighter
+col += glow * vec3(1.0, 0.8, 0.9);
+```
+
+### 5. Refraction and Bidirectional Marching (Interior Marching)
+
+**Difference from the basic version**: After hitting a surface, computes the refraction direction and marches **inside the object in reverse** (negating the SDF) to find the exit point. Can achieve glass, water, and liquid metal effects.
+
+**Key modified code**:
+```glsl
+// Bidirectional marching: determine SDF sign based on whether the origin is inside or outside
+float castRay(vec3 ro, vec3 rd) {
+    float sign = (map(ro) < 0.0) ? -1.0 : 1.0;   // Negate distance if inside
+    float t = 0.0;
+    for (int i = 0; i < 120; i++) {
+        float h = sign * map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > 12.0) break;
+        t += h;
+    }
+    return t;
+}
+
+// Refraction: after hitting the outer surface, march inside along the refracted direction
+vec3 refDir = refract(rd, nor, IOR);                // IOR adjustable: index of refraction, e.g., 0.9
+float t2 = 2.0;
+for (int i = 0; i < 50; i++) {
+    float h = map(hitPos + refDir * t2);
+    t2 -= h;                                         // Reverse marching (from inside outward)
+    if (abs(h) > 3.0) break;
+}
+vec3 nor2 = calcNormal(hitPos + refDir * t2);        // Exit point normal
+```
+
+## Performance Optimization in Detail
+
+### 1. Reducing SDF Call Count
+
+- Use the tetrahedron trick for normal computation (4 calls instead of 6 with central differences)
+- Use `min(iFrame,0)` as the loop start value to prevent the compiler from unrolling and inlining map() multiple times
+
+### 2. Bounding Box Acceleration
+
+Perform AABB ray intersection before marching to skip empty regions:
+```glsl
+vec2 tb = iBox(ro - center, rd, halfSize);
+if (tb.x < tb.y && tb.y > 0.0) { /* Only march inside the box */ }
+```
+
+### 3. Adaptive Precision
+
+- Scale the hit threshold with distance: `SURF_DIST * (1.0 + t * 0.1)` — distant surfaces don't need high precision
+- Clamp step size: `t += clamp(h, 0.01, 0.2)` — prevent individual steps from being too large or too small
+
+### 4. Early Exit
+
+- In volume rendering: `if (sum.a > 0.99) break;` — stop immediately when opaque
+- In shadow computation: `if (res < 0.004) break;` — stop when fully occluded
+
+### 5. Reducing map() Complexity
+
+- Use simplified SDFs for distant objects
+- First test with a cheap bounding SDF; only compute the expensive precise SDF when `sdBox(p, bound) < currentMin`
+
+### 6. Anti-Aliasing
+
+- Supersampling (AA=2 means 2x2 sampling, 4 rays per pixel), but at 4x performance cost
+- In volume rendering, use dithering instead of supersampling to reduce banding artifacts
+
+## Combination Suggestions in Detail
+
+### 1. Ray Marching + FBM Noise
+
+Use fractal noise to perturb SDF surfaces for terrain and rock textures, or build volumetric density fields to render clouds/smoke.
+
+### 2. Ray Marching + Domain Warping
+
+Apply spatial distortions (twist, bend, repeat) to sample points to create infinitely repeating corridors or twisted surreal geometry.
+
+### 3. Ray Marching + PBR Materials
+
+SDF provides geometry; combine with Cook-Torrance BRDF, environment map reflections, and Fresnel terms for realistic metal/dielectric materials.
+
+### 4. Ray Marching + Post-Processing
+
+Multi-pass architecture: the first Buffer performs Ray Marching and outputs color + depth (stored in the alpha channel); the second pass applies depth of field (DOF), motion blur, and tone mapping.
+
+### 5. Ray Marching + Procedural Animation
+
+Drive SDF primitive positions/sizes/blend coefficients with time parameters, combined with easing functions (smoothstep, parabolic) to create character animations without a skeletal system.
--- a/skills/shader-dev/reference/sdf-2d.md
+++ b/skills/shader-dev/reference/sdf-2d.md
@@ -0,0 +1,724 @@
+# 2D SDF Detailed Reference
+
+This file contains the complete step-by-step tutorial, mathematical derivations, detailed explanations, and advanced usage for [SKILL.md](SKILL.md).
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, varyings, built-in functions (length, dot, clamp, mix, smoothstep, step, sign, abs, max, min)
+- **Vector Math**: 2D vector operations, geometric meaning of dot and cross products
+- **Coordinate Systems**: conversion from screen coordinates to normalized device coordinates (NDC), aspect ratio correction
+- **Signed Distance Field Concept**: the function returns the signed distance to the shape boundary — negative inside, zero on the boundary, positive outside
+
+## Core Principles in Detail
+
+The core idea of 2D SDF: **for each pixel on screen, compute its shortest signed distance `d` to the target shape boundary**.
+
+- `d < 0`: pixel is inside the shape
+- `d = 0`: pixel is exactly on the boundary
+- `d > 0`: pixel is outside the shape
+
+Once you have the distance value `d`, use functions like `smoothstep` and `clamp` to map it to color/opacity, enabling:
+- **Fill**: color when `d < 0`
+- **Anti-aliased edges**: `smoothstep(-aa, aa, d)` for sub-pixel smoothing at the boundary
+- **Stroke**: apply smoothstep again on `abs(d) - strokeWidth`
+- **Boolean operations**: `min(d1, d2)` = union, `max(d1, d2)` = intersection, `max(-d1, d2)` = subtraction
+
+Key mathematical formulas:
+```
+Circle:       d = length(p - center) - radius
+Rectangle:    d = length(max(abs(p) - halfSize, 0.0)) + min(max(abs(p).x - halfSize.x, abs(p).y - halfSize.y), 0.0)
+Line segment: d = length(p - a - clamp(dot(p-a, b-a)/dot(b-a, b-a), 0, 1) * (b-a)) - width/2
+Union:        d = min(d1, d2)
+Intersection: d = max(d1, d2)
+Subtraction:  d = max(-d1, d2)
+Smooth union: d = mix(d2, d1, h) - k*h*(1-h),  h = clamp(0.5 + 0.5*(d2-d1)/k, 0, 1)
+```
+
+## Implementation Steps in Detail
+
+### Step 1: Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert screen pixel coordinates to normalized coordinates centered at the screen center, with the y range of [-1, 1].
+
+**Why**: Pixel coordinates depend on resolution. After normalization, SDF parameters (such as radius) have resolution-independent physical meaning. Dividing by `iResolution.y` (not `.x`) ensures correct aspect ratio so circles don't become ellipses.
+
+**Code**:
+```glsl
+// Method 1: Origin at center, y range [-1, 1] (most common, standard practice)
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Method 2: If you need to work in pixel space (suitable for fixed pixel-size UI)
+vec2 p = fragCoord.xy;
+vec2 center = iResolution.xy * 0.5;
+
+// Method 3: [0, 1] range normalization (requires manual aspect ratio handling)
+vec2 uv = fragCoord.xy / iResolution.xy;
+```
+
+### Step 2: Defining SDF Primitive Functions
+
+**What**: Write basic primitive functions that return signed distances. Each function takes the current point `p` and shape parameters, and returns a `float` distance value.
+
+**Why**: These are the atomic building blocks for all 2D SDF graphics. Encapsulating them as independent functions allows free combination, transformation, and reuse.
+
+**Code**:
+```glsl
+// ---- Circle ----
+// The most basic SDF: distance from point to center minus radius
+float sdCircle(vec2 p, float radius) {
+    return length(p) - radius;
+}
+
+// ---- Rectangle (optional rounded corners) ----
+// halfSize is half-width and half-height, radius is the corner radius
+float sdBox(vec2 p, vec2 halfSize, float radius) {
+    halfSize -= vec2(radius);
+    vec2 d = abs(p) - halfSize;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0)) - radius;
+}
+
+// ---- Line Segment ----
+// Line segment from start to end, with width
+float sdLine(vec2 p, vec2 start, vec2 end, float width) {
+    vec2 dir = end - start;
+    float h = clamp(dot(p - start, dir) / dot(dir, dir), 0.0, 1.0);
+    return length(p - start - dir * h) - width * 0.5;
+}
+
+// ---- Triangle (exact signed distance) ----
+// Three vertices p0, p1, p2, only one sqrt needed
+float sdTriangle(vec2 p, vec2 p0, vec2 p1, vec2 p2) {
+    vec2 e0 = p1 - p0, v0 = p - p0;
+    vec2 e1 = p2 - p1, v1 = p - p1;
+    vec2 e2 = p0 - p2, v2 = p - p2;
+
+    // Squared distance to each edge (projection + clamp)
+    float d0 = dot(v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0),
+                   v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0));
+    float d1 = dot(v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0),
+                   v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0));
+    float d2 = dot(v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0),
+                   v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0));
+
+    // Determine inside/outside using cross product sign
+    float o = e0.x * e2.y - e0.y * e2.x;
+    vec2 d = min(min(vec2(d0, o * (v0.x * e0.y - v0.y * e0.x)),
+                     vec2(d1, o * (v1.x * e1.y - v1.y * e1.x))),
+                     vec2(d2, o * (v2.x * e2.y - v2.y * e2.x)));
+    return -sqrt(d.x) * sign(d.y);
+}
+
+// ---- Ellipse (approximate) ----
+// Simplified ellipse SDF based on scaled space
+float sdEllipse(vec2 p, vec2 center, float a, float b) {
+    float a2 = a * a, b2 = b * b;
+    vec2 d = p - center;
+    return (b2 * d.x * d.x + a2 * d.y * d.y - a2 * b2) / (a2 * b2);
+}
+```
+
+### Step 3: CSG Boolean Operations
+
+**What**: Combine two SDF distance values using min/max operations to achieve union, subtraction, and intersection of shapes.
+
+**Why**: This is the most powerful capability of SDFs — building arbitrarily complex shapes from simple primitives. `min` takes the smaller of the two field values to produce a union (since smaller distance means "closer" to the shape interior); `max` takes the larger value for intersection; `max(a, -b)` inverts b's inside/outside and intersects for subtraction.
+
+**Code**:
+```glsl
+// Union: take the nearest shape
+float opUnion(float d1, float d2) {
+    return min(d1, d2);
+}
+
+// Intersection: overlapping region of both shapes
+float opIntersect(float d1, float d2) {
+    return max(d1, d2);
+}
+
+// Subtraction: carve d1 out of d2
+float opSubtract(float d1, float d2) {
+    return max(-d1, d2);
+}
+
+// Smooth union: produces a rounded transition at the junction, k controls transition width
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+
+// XOR: non-overlapping region of both shapes
+float opXor(float d1, float d2) {
+    return min(max(-d1, d2), max(-d2, d1));
+}
+```
+
+### Step 4: Coordinate Transforms
+
+**What**: Transform coordinates before computing the SDF so that shapes appear at desired positions and angles.
+
+**Why**: SDF functions define shapes centered at the origin by default. By transforming the input coordinates (rather than the shape itself), you can freely place and rotate multiple primitives in the scene without affecting the mathematical properties of the distance field.
+
+**Code**:
+```glsl
+// Translation: move the coordinate origin to position t
+vec2 translate(vec2 p, vec2 t) {
+    return p - t;
+}
+
+// Counter-clockwise rotation
+vec2 rotateCCW(vec2 p, float angle) {
+    mat2 m = mat2(cos(angle), sin(angle), -sin(angle), cos(angle));
+    return p * m;
+}
+
+// Usage example: translate then rotate
+float d = sdBox(rotateCCW(translate(p, vec2(0.5, 0.3)), iTime), vec2(0.2), 0.05);
+```
+
+### Step 5: Distance Field Visualization and Rendering
+
+**What**: Convert the SDF distance value to final color output. Includes fill, anti-aliasing, stroke, contour lines, and other visualization methods.
+
+**Why**: The distance value itself is just a scalar that needs a mapping strategy to become a visual effect. `smoothstep` creates sub-pixel smooth transitions at the boundary, avoiding aliasing from hard edges. The `fwidth` function uses screen-space derivatives to automatically calculate pixel width, achieving resolution-independent anti-aliasing.
+
+**Code**:
+```glsl
+// ---- Method 1: clamp for simple alpha (most basic) ----
+float t = clamp(d, 0.0, 1.0);
+vec4 shapeColor = vec4(color, 1.0 - t);
+
+// ---- Method 2: smoothstep anti-aliasing (recommended general approach) ----
+// aa controls edge softness, typical value is pixel size px = 2.0/iResolution.y
+float px = 2.0 / iResolution.y;                      // Adjustable: anti-aliasing width
+float mask = smoothstep(px, -px, d);                  // 1.0 inside, 0.0 outside
+vec3 col = mix(backgroundColor, shapeColor, mask);
+
+// ---- Method 3: fwidth adaptive anti-aliasing (suitable for zooming scenes) ----
+float anti = fwidth(d) * 1.0;                         // Adjustable: multiplier, larger = softer edges
+float mask = 1.0 - smoothstep(-anti, anti, d);
+
+// ---- Method 4: Classic distance field debug visualization ----
+vec3 col = (d > 0.0) ? vec3(0.9, 0.6, 0.3)           // Outside: orange
+                      : vec3(0.65, 0.85, 1.0);        // Inside: blue
+col *= 1.0 - exp(-12.0 * abs(d));                     // Distance falloff
+col *= 0.8 + 0.2 * cos(120.0 * d);                    // Contour lines, 120.0 adjustable: line density
+col = mix(col, vec3(1.0), smoothstep(1.5*px, 0.0, abs(d) - 0.002)); // Zero contour highlight
+```
+
+### Step 6: Stroke and Border Rendering
+
+**What**: Use the absolute value of the distance field to extract the shape's outline, or render inner/outer borders separately.
+
+**Why**: Strokes are a natural byproduct of SDFs — `abs(d)` gives unsigned distance, and subtracting the stroke width yields the "stroke shape" SDF. Unlike rasterized strokes that require geometry expansion, SDF strokes need only one line of math.
+
+**Code**:
+```glsl
+// ---- Fill mask ----
+float fillMask(float d) {
+    return clamp(-d, 0.0, 1.0);
+}
+
+// ---- Stroke rendering (fwidth adaptive) ----
+// stroke is the stroke width (in distance field units)
+vec4 renderShape(float d, vec3 color, float stroke) {
+    float anti = fwidth(d) * 1.0;
+    vec4 strokeLayer = vec4(vec3(0.05), 1.0 - smoothstep(-anti, anti, d - stroke));
+    vec4 colorLayer  = vec4(color,      1.0 - smoothstep(-anti, anti, d));
+    if (stroke < 0.0001) return colorLayer;
+    return vec4(mix(strokeLayer.rgb, colorLayer.rgb, colorLayer.a), strokeLayer.a);
+}
+
+// ---- Inner border mask ----
+float innerBorderMask(float d, float width) {
+    return clamp(d + width, 0.0, 1.0) - clamp(d, 0.0, 1.0);
+}
+
+// ---- Outer border mask ----
+float outerBorderMask(float d, float width) {
+    return clamp(d, 0.0, 1.0) - clamp(d - width, 0.0, 1.0);
+}
+```
+
+### Step 7: Multi-Layer Compositing
+
+**What**: Render multiple SDF shapes as layers with alpha channels, then blend them back-to-front using `mix`.
+
+**Why**: Complex 2D scenes typically contain backgrounds, multiple shapes, strokes, and other visual layers. Rendering each SDF as an independent RGBA layer and compositing them layer by layer with standard alpha blending (`mix(bottom, top, top.a)`) is both intuitive and gives precise control over stacking order.
+
+**Code**:
+```glsl
+// Background layer
+vec3 bgColor = vec3(1.0, 0.8, 0.7 - 0.07 * p.y) * (1.0 - 0.25 * length(p));
+
+// Shape layer 1
+float d1 = sdCircle(translate(p, pos1), 0.3);
+vec4 layer1 = renderShape(d1, vec3(0.9, 0.3, 0.2), 0.02);
+
+// Shape layer 2
+float d2 = sdBox(translate(p, pos2), vec2(0.2), 0.05);
+vec4 layer2 = renderShape(d2, vec3(0.2, 0.5, 0.8), 0.0);
+
+// Composite back-to-front
+vec3 col = bgColor;
+col = mix(col, layer1.rgb, layer1.a);   // Overlay shape 1
+col = mix(col, layer2.rgb, layer2.a);   // Overlay shape 2
+
+fragColor = vec4(col, 1.0);
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Solid Fill + Stroke Mode
+
+**Difference from the basic version**: Instead of showing distance field debug colors, renders solid shapes with clean strokes, suitable for UI and icons.
+
+**Key modified code**:
+```glsl
+// Replace the distance field visualization section
+vec3 shapeColor = vec3(0.32, 0.56, 0.53);
+float strokeW = 0.015;   // Adjustable: stroke width
+vec4 shape = render(d, shapeColor, strokeW);
+
+vec3 col = bgCol;
+col = mix(col, shape.rgb, shape.a);
+```
+
+### Variant 2: Multi-Layer CSG Illustration
+
+**Difference from the basic version**: Combines multiple SDF primitives through boolean operations into complex patterns (e.g., an umbrella, a logo), with each layer independently colored and composited layer by layer. Suitable for 2D illustrations and icon construction.
+
+**Key modified code**:
+```glsl
+// Build the body (ellipse intersection)
+float a = sdEllipse(p, vec2(0.0, 0.16), 0.25, 0.25);
+float b = sdEllipse(p, vec2(0.0, -0.03), 0.8, 0.35);
+float body = opIntersect(a, b);
+vec4 layer1 = render(body, vec3(0.32, 0.56, 0.53), fwidth(body) * 2.0);
+
+// Build the handle (line segment + arc subtraction)
+float handle = sdLine(p, vec2(0.0, 0.05), vec2(0.0, -0.42), 0.01);
+float arc = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.04);
+float arcInner = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.03);
+handle = opUnion(handle, opSubtract(arcInner, arc));
+vec4 layer0 = render(handle, vec3(0.4, 0.3, 0.28), STROKE_WIDTH);
+
+// Composite
+vec3 col = bgCol;
+col = mix(col, layer0.rgb, layer0.a);
+col = mix(col, layer1.rgb, layer1.a);
+```
+
+### Variant 3: Hexagonal Grid Tiling
+
+**Difference from the basic version**: Uses non-orthogonal coordinate system domain repetition to tile SDFs across the screen, with each cell having an independent ID for differentiated coloring. Suitable for background textures and geometric patterns.
+
+**Key modified code**:
+```glsl
+// Hexagonal grid function: returns (cellID.xy, edge distance, center distance)
+vec4 hexagon(vec2 p) {
+    vec2 q = vec2(p.x * 2.0 * 0.5773503, p.y + p.x * 0.5773503);
+    vec2 pi = floor(q);
+    vec2 pf = fract(q);
+    float v = mod(pi.x + pi.y, 3.0);
+    float ca = step(1.0, v);
+    float cb = step(2.0, v);
+    vec2 ma = step(pf.xy, pf.yx);
+    float e = dot(ma, 1.0 - pf.yx + ca*(pf.x+pf.y-1.0) + cb*(pf.yx-2.0*pf.xy));
+    p = vec2(q.x + floor(0.5 + p.y / 1.5), 4.0 * p.y / 3.0) * 0.5 + 0.5;
+    float f = length((fract(p) - 0.5) * vec2(1.0, 0.85));
+    return vec4(pi + ca - cb * ma, e, f);
+}
+
+// Usage
+#define HEX_SCALE 8.0          // Adjustable: grid density
+vec4 h = hexagon(HEX_SCALE * p + 0.5 * iTime);
+vec3 col = 0.15 + 0.15 * hash1(h.xy + 1.2);          // Different gray per cell
+col *= smoothstep(0.10, 0.11, h.z);                   // Edge lines
+col *= smoothstep(0.10, 0.11, h.w);                   // Center falloff
+```
+
+### Variant 4: Organic Shapes (Polar Coordinate SDF)
+
+**Difference from the basic version**: Uses polar coordinates `(atan, length)` to define shape boundary functions, enabling creation of hearts, petals, stars, and other non-polygonal organic shapes. Supports pulsing animations.
+
+**Key modified code**:
+```glsl
+// Heart SDF (polar coordinate algebraic curve)
+p.y -= 0.25;
+float a = atan(p.x, p.y) / 3.141593;
+float r = length(p);
+float h = abs(a);
+float d = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+
+// Pulse animation
+float tt = mod(iTime, 1.5) / 1.5;
+float ss = pow(tt, 0.2) * 0.5 + 0.5;
+ss = 1.0 + ss * 0.5 * sin(tt * 6.2831 * 3.0) * exp(-tt * 4.0);  // Adjustable: sin frequency controls pulse count
+
+// Rendering
+vec3 col = mix(bgCol, heartCol, smoothstep(-0.01, 0.01, d - r));
+```
+
+### Variant 5: Bezier Curve SDF
+
+**Difference from the basic version**: Computes the exact signed distance from a point to a quadratic Bezier curve by solving a cubic equation (Cardano's formula). Suitable for curved text, path rendering, and similar scenarios.
+
+**Key modified code**:
+```glsl
+// Cubic equation solver (Cardano's formula)
+vec3 solveCubic(float a, float b, float c) {
+    float p = b - a*a/3.0, p3 = p*p*p;
+    float q = a*(2.0*a*a - 9.0*b)/27.0 + c;
+    float d = q*q + 4.0*p3/27.0;
+    float offset = -a/3.0;
+    if (d >= 0.0) {
+        float z = sqrt(d);
+        vec2 x = (vec2(z,-z) - q) / 2.0;
+        vec2 uv = sign(x) * pow(abs(x), vec2(1.0/3.0));
+        return vec3(offset + uv.x + uv.y);
+    }
+    float v = acos(-sqrt(-27.0/p3)*q/2.0) / 3.0;
+    float m = cos(v), n = sin(v) * 1.732050808;
+    return vec3(m+m, -n-m, n-m) * sqrt(-p/3.0) + offset;
+}
+
+// Bezier SDF (three control points A, B, C)
+float sdBezier(vec2 A, vec2 B, vec2 C, vec2 p) {
+    B = mix(B + vec2(1e-4), B, step(1e-6, abs(B*2.0-A-C)));
+    vec2 a = B-A, b = A-B*2.0+C, c = a*2.0, d = A-p;
+    vec3 k = vec3(3.*dot(a,b), 2.*dot(a,a)+dot(d,b), dot(d,a)) / dot(b,b);
+    vec3 t = clamp(solveCubic(k.x, k.y, k.z), 0.0, 1.0);
+    vec2 pos = A+(c+b*t.x)*t.x; float dis = length(pos-p);
+    pos = A+(c+b*t.y)*t.y; dis = min(dis, length(pos-p));
+    pos = A+(c+b*t.z)*t.z; dis = min(dis, length(pos-p));
+    return dis * signBezier(A, B, C, p);   // signBezier uses barycentric coordinates to determine sign
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Reducing sqrt Calls
+
+In polygon SDFs (such as triangles), by comparing squared distance values first and only taking `sqrt` on the minimum distance at the end, multiple `sqrt` calls are reduced to one. This is the core optimization idea behind the triangle SDF implementation.
+
+```glsl
+// Bad: sqrt on every edge
+float d0 = length(v0 - e0 * h0);
+float d1 = length(v1 - e1 * h1);
+// Good: compare dot(v,v) squares, one sqrt at the end
+float d0 = dot(proj0, proj0);
+float d1 = dot(proj1, proj1);
+return -sqrt(min(d0, d1)) * sign(...);
+```
+
+### 2. fwidth vs Fixed Pixel Width
+
+`fwidth(d)` invokes screen-space partial derivatives. In simple scenes, a fixed `px = 2.0/iResolution.y` can replace it to reduce GPU derivative computation overhead. However, in scenes with coordinate scaling/distortion (such as the hexagonal grid's `pos *= 1.2 + 0.15*length(pos)`), `fwidth` must be used to ensure correct anti-aliasing width.
+
+### 3. Avoiding Excessive Boolean Operation Nesting
+
+Large amounts of `min`/`max` nesting are correct but computing distances for all primitives per pixel per frame can be expensive. You can skip distant primitives by checking rough bounding boxes:
+
+```glsl
+// Only compute precisely when near the shape
+if (length(p - shapeCenter) < shapeRadius + margin) {
+    d = opUnion(d, sdComplexShape(p));
+}
+```
+
+### 4. Supersampling AA Trade-off
+
+Multiple samples (e.g., 2x2 supersampling) yield higher quality anti-aliasing but multiply the fragment shader computation by 4:
+
+```glsl
+#define AA 2  // Adjustable: 1 = no supersampling, 2 = 4x, 3 = 9x
+for (int m = 0; m < AA; m++)
+for (int n = 0; n < AA; n++) {
+    vec2 off = vec2(m, n) / float(AA);
+    // ... computation ...
+    tot += col;
+}
+tot /= float(AA * AA);
+```
+
+For most real-time scenes, single-pixel AA with `smoothstep` or `fwidth` is sufficient. Supersampling is mainly for offline rendering or showcase scenes.
+
+### 5. Step Size Optimization for 2D Soft Shadows
+
+In cone marching 2D soft shadows, use `max(1.0, abs(sd))` instead of a fixed step size to take large leaps in open areas and small precise steps near shapes. Typically 64 steps can cover a large scene:
+
+```glsl
+dt += max(1.0, abs(sd));  // Adaptive step size
+if (dt > dl) break;       // Early exit after reaching the light source
+```
+
+## Combination Suggestions in Detail
+
+### 1. SDF + Noise Textures
+
+Adding noise values to the distance field creates dissolve, erosion, and organic edge effects:
+
+```glsl
+float d = sdCircle(p, 0.4);
+d += noise(p * 10.0 + iTime) * 0.05;  // Organic jittery edges
+```
+
+### 2. SDF + 2D Lighting and Shadows
+
+Cone marching based on the distance field implements real-time soft shadows and multi-light lighting for 2D scenes. The distance field provides "scene query" capability, using `sceneDist()` during ray marching to check occlusion:
+
+```glsl
+// 2D soft shadow (see 4dfXDn for full implementation)
+float shadow(vec2 p, vec2 lightPos, float radius) {
+    vec2 dir = normalize(lightPos - p);
+    float dl = length(p - lightPos);
+    float lf = radius * dl;
+    float dt = 0.01;
+    for (int i = 0; i < 64; i++) {
+        float sd = sceneDist(p + dir * dt);
+        if (sd < -radius) return 0.0;
+        lf = min(lf, sd / dt);
+        dt += max(1.0, abs(sd));
+        if (dt > dl) break;
+    }
+    lf = clamp((lf*dl + radius) / (2.0*radius), 0.0, 1.0);
+    return smoothstep(0.0, 1.0, lf);
+}
+```
+
+### 3. SDF + Normal Mapping / Bump Mapping
+
+By computing normals via finite differences on the distance field, then applying standard lighting models, you can simulate 3D bump/highlight effects on 2D SDFs (as done in the DVD Bounce shader):
+
+```glsl
+vec2 e = vec2(0.8, 0.0) / iResolution.y;
+float fx = sceneDist(p) - sceneDist(p + e);
+float fy = sceneDist(p) - sceneDist(p + e.yx);
+vec3 nor = normalize(vec3(fx, fy, e.x / 0.1));  // 0.1 = bump factor, adjustable
+// Standard Blinn-Phong lighting
+vec3 lig = normalize(vec3(1.0, 2.0, 2.0));
+float dif = clamp(dot(lig, nor), 0.0, 1.0);
+```
+
+### 4. SDF + Domain Repetition (Spatial Tiling)
+
+Use `fract` or `mod` on coordinates for infinite repetition; use `floor` to get cell IDs for differentiated coloring. Suitable for background patterns, particle arrays, etc.:
+
+```glsl
+vec2 cellSize = vec2(0.5);
+vec2 cellID = floor(p / cellSize);
+vec2 cellP = fract(p / cellSize) - 0.5;        // Local coordinate within cell
+float d = sdCircle(cellP, 0.15 + 0.05 * sin(iTime + cellID.x * 3.0));
+```
+
+### 5. SDF + Animation
+
+Distance field parameters (position, radius, rotation angle) naturally support continuous animation. Combine with `sin/cos` periodic motion, `exp` decay, `mod` looping, and other time functions:
+
+```glsl
+// Bouncing
+float y = abs(sin(iTime * 3.0)) * 0.5;
+float d = sdCircle(translate(p, vec2(0.0, y)), 0.2);
+
+// Pulse scaling
+float pulse = 1.0 + 0.1 * sin(iTime * 6.28 * 2.0) * exp(-mod(iTime, 1.0) * 4.0);
+float d = sdCircle(p / pulse, 0.3) * pulse;
+
+// Rotation
+float d = sdBox(rotateCCW(p, iTime), vec2(0.2), 0.03);
+```
+
+## Extended 2D SDF Primitives Reference
+
+### sdRoundedBox — Rounded Box with Independent Corner Radii
+
+**Signature**: `float sdRoundedBox(vec2 p, vec2 b, vec4 r)`
+
+- `p`: query point
+- `b`: half-size of the box
+- `r`: corner radii as `vec4(top-right, bottom-right, top-left, bottom-left)`
+
+Selects the appropriate corner radius based on the quadrant of `p`, then computes a standard rounded box distance. Useful for UI elements where each corner needs a different rounding.
+
+### sdOrientedBox — Oriented Box
+
+**Signature**: `float sdOrientedBox(vec2 p, vec2 a, vec2 b, float th)`
+
+- `p`: query point
+- `a`, `b`: endpoints defining the box's center axis
+- `th`: thickness (full width perpendicular to the axis)
+
+Constructs a local coordinate frame aligned with segment `a`-to-`b`, then evaluates a standard box SDF. Useful for drawing thick line-like rectangles at arbitrary angles without manual rotation.
+
+### sdArc — Arc
+
+**Signature**: `float sdArc(vec2 p, vec2 sc, float ra, float rb)`
+
+- `p`: query point
+- `sc`: `vec2(sin, cos)` of the half-aperture angle
+- `ra`: arc radius
+- `rb`: arc thickness
+
+Computes distance to an arc segment. The aperture is symmetric about the y-axis. Combines angular clamping with radial distance.
+
+### sdPie — Pie / Sector
+
+**Signature**: `float sdPie(vec2 p, vec2 c, float r)`
+
+- `p`: query point
+- `c`: `vec2(sin, cos)` of the half-aperture angle
+- `r`: radius
+
+Returns the signed distance to a filled pie-slice (sector) shape. The sector is symmetric about the y-axis.
+
+### sdRing — Ring
+
+**Signature**: `float sdRing(vec2 p, vec2 n, float r, float th)`
+
+- `p`: query point
+- `n`: `vec2(sin, cos)` of the half-aperture angle
+- `r`: ring radius
+- `th`: ring thickness
+
+Similar to `sdArc` but with capped endpoints and full ring behavior within the aperture.
+
+### sdMoon — Moon Shape
+
+**Signature**: `float sdMoon(vec2 p, float d, float ra, float rb)`
+
+- `p`: query point
+- `d`: distance between circle centers
+- `ra`: radius of outer circle
+- `rb`: radius of inner (subtracted) circle
+
+Creates a crescent/moon shape by subtracting one circle from another. The two circles are offset by distance `d` along the x-axis.
+
+### sdHeart — Heart (Approximate)
+
+**Signature**: `float sdHeart(vec2 p)`
+
+- `p`: query point (centered at origin, roughly unit scale)
+
+An approximate heart SDF composed of two geometric regions stitched together. The shape extends roughly from (0,0) to (0,1) vertically.
+
+### sdVesica — Vesica / Lens Shape
+
+**Signature**: `float sdVesica(vec2 p, float w, float h)`
+
+- `p`: query point
+- `w`: width of the vesica
+- `h`: height of the vesica
+
+A lens-shaped figure (vesica piscis) formed by the intersection of two circles. Symmetric about both axes.
+
+### sdEgg — Egg Shape
+
+**Signature**: `float sdEgg(vec2 p, float he, float ra, float rb)`
+
+- `p`: query point
+- `he`: half-height of the straight section
+- `ra`: radius at bottom
+- `rb`: radius at top
+
+Produces an egg-like shape with different radii at top and bottom, connected by a straight vertical section.
+
+### sdEquilateralTriangle — Equilateral Triangle
+
+**Signature**: `float sdEquilateralTriangle(vec2 p, float r)`
+
+- `p`: query point
+- `r`: side length / scale
+
+An exact SDF for an equilateral triangle centered at the origin using symmetry folding.
+
+### sdPentagon — Pentagon
+
+**Signature**: `float sdPentagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular pentagon SDF using mirror-fold operations along pentagon edge normals. The constants encode cos/sin of 72-degree angles.
+
+### sdHexagon — Hexagon
+
+**Signature**: `float sdHexagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular hexagon SDF. Constants encode cos(30), sin(30), and tan(30). Uses a single mirror fold.
+
+### sdOctagon — Octagon
+
+**Signature**: `float sdOctagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular octagon SDF. Uses two mirror folds at 22.5-degree and 67.5-degree angles.
+
+### sdStar — N-Pointed Star
+
+**Signature**: `float sdStar(vec2 p, float r, int n, float m)`
+
+- `p`: query point
+- `r`: outer radius
+- `n`: number of points
+- `m`: inner radius ratio (controls pointiness; typical range 2.0-6.0)
+
+A general n-pointed star using angular repetition (`mod(atan(...))`) and edge projection. Higher `m` values produce sharper, thinner points.
+
+### sdBezier (Extended) — Quadratic Bezier Curve SDF
+
+**Signature**: `float sdBezier(vec2 pos, vec2 A, vec2 B, vec2 C)`
+
+- `pos`: query point
+- `A`, `B`, `C`: control points of the quadratic Bezier
+
+An alternative Bezier SDF formulation that solves for the closest point on the curve using the cubic formula. Returns unsigned distance (no sign). Note the different parameter order from the Variant 5 version.
+
+### sdParabola — Parabola
+
+**Signature**: `float sdParabola(vec2 pos, float k)`
+
+- `pos`: query point
+- `k`: curvature coefficient (y = k * x^2)
+
+Signed distance to a parabola. Uses a cubic root solution to find the closest point on the curve.
+
+### sdCross — Cross Shape
+
+**Signature**: `float sdCross(vec2 p, vec2 b, float r)`
+
+- `p`: query point
+- `b`: half-extents of each arm (b.x = length, b.y = width)
+- `r`: corner rounding offset
+
+A plus/cross shape formed by the union of two perpendicular rectangles, with an optional rounding parameter.
+
+## 2D SDF Modifiers Reference
+
+### opRound2D — Rounding Modifier
+
+**Signature**: `float opRound2D(float d, float r)`
+
+Subtracts `r` from any SDF, effectively expanding the shape boundary outward by `r` and rounding all corners/edges. Apply to any existing SDF to add uniform rounding.
+
+### opAnnular2D — Annular (Hollowing) Modifier
+
+**Signature**: `float opAnnular2D(float d, float r)`
+
+Takes the absolute value of the distance and subtracts thickness `r`, converting any filled shape into a ring/outline version with wall thickness `2*r`. Stackable: applying twice creates concentric rings.
+
+### opRepeat2D — Grid Repetition
+
+**Signature**: `vec2 opRepeat2D(vec2 p, float s)`
+
+Applies `mod` to fold coordinates into a repeating grid cell of size `s`. Apply to `p` before passing to any SDF to create infinite tiling. Use `floor(p / s)` to obtain cell IDs for per-cell variation.
+
+### opMirror2D — Arbitrary Mirror
+
+**Signature**: `vec2 opMirror2D(vec2 p, vec2 dir)`
+
+Mirrors coordinates across a line through the origin with direction `dir` (should be normalized). Any point on the negative side of the line is reflected to the positive side, effectively creating bilateral symmetry along any arbitrary axis.
--- a/skills/shader-dev/reference/sdf-3d.md
+++ b/skills/shader-dev/reference/sdf-3d.md
@@ -0,0 +1,805 @@
+# 3D Signed Distance Fields (3D SDF) — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform variables (`iTime`, `iResolution`, `iMouse`), `fragCoord` coordinate system
+- **Vector Math**: built-in functions like `dot`, `cross`, `normalize`, `length`, `reflect`
+- **Rays and Cameras**: understanding how to generate rays from screen pixels (ray origin + ray direction)
+- **Implicit Surface Concept**: f(p) = 0 defines the surface, f(p) > 0 is outside, f(p) < 0 is inside
+
+## Step-by-Step Detailed Explanation
+
+### Step 1: SDF Primitive Library
+
+**What**: Define basic geometric distance functions.
+
+**Why**: All SDF scenes are composed of basic primitives. Each primitive is a pure function that takes a point in space and returns the shortest distance to that primitive's surface. The accuracy of these primitives directly determines the efficiency of sphere tracing — accurate SDFs allow larger step sizes.
+
+**Code**:
+
+```glsl
+// Sphere: p=sample point, r=radius
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+// Box: p=sample point, b=half-size (xyz dimensions)
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+// Ellipsoid (approximate): p=sample point, r=three-axis radii
+float sdEllipsoid(vec3 p, vec3 r) {
+    float k0 = length(p / r);
+    float k1 = length(p / (r * r));
+    return k0 * (k0 - 1.0) / k1;
+}
+
+// Torus: p=sample point, t.x=major radius, t.y=tube radius
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// Capsule (two endpoints + radius): useful for skeleton/limb modeling
+float sdCapsule(vec3 p, vec3 a, vec3 b, float r) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h) - r;
+}
+
+// Cylinder (vertical): h.x=radius, h.y=half-height
+float sdCylinder(vec3 p, vec2 h) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - h;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+
+// Plane (y=0)
+float sdPlane(vec3 p) {
+    return p.y;
+}
+```
+
+### Step 2: Boolean Operations and Smooth Blending
+
+**What**: Define combination operations between primitives — union, subtraction, intersection, and their smooth variants.
+
+**Why**: Union merges multiple primitives into one scene; subtraction carves one object out of another; intersection keeps the overlapping region. Smooth variants (`smin`/`smax`) use a control parameter `k` to produce smooth blend transitions — one of SDF's most powerful capabilities over traditional modeling, achieving organic forms without additional geometry.
+
+**Code**:
+
+```glsl
+// === Hard Boolean Operations ===
+
+// Union: take the nearer surface
+float opUnion(float d1, float d2) { return min(d1, d2); }
+
+// Subtraction: subtract d2 from d1
+float opSubtraction(float d1, float d2) { return max(d1, -d2); }
+
+// Intersection: keep the overlapping region
+float opIntersection(float d1, float d2) { return max(d1, d2); }
+
+// Union with material ID (vec2.x stores distance, vec2.y stores material ID)
+vec2 opU(vec2 d1, vec2 d2) { return (d1.x < d2.x) ? d1 : d2; }
+
+// === Smooth Boolean Operations ===
+
+// Smooth union: k=blend radius (larger = smoother, typical values 0.1~0.5)
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+// vec2 version of smin: for smooth blending of vec2(distance, materialID)
+vec2 smin(vec2 a, vec2 b, float k) {
+    float h = max(k - abs(a.x - b.x), 0.0);
+    float d = min(a.x, b.x) - h * h * 0.25 / k;
+    float m = (a.x < b.x) ? a.y : b.y;
+    return vec2(d, m);
+}
+
+// Smooth subtraction / smooth max
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+```
+
+### Step 3: Scene Definition (map Function)
+
+**What**: Write the `map()` function that combines the above primitives and operations into a complete 3D scene.
+
+**Why**: `map(p)` is the core of the SDF rendering pipeline — it returns the distance from any point p in space to the nearest scene surface (plus optional material information). Ray marching, normal computation, shadows, and AO all depend on this function. All geometric complexity of the scene is encapsulated here.
+
+**Code**:
+
+```glsl
+// Returns vec2(distance, materialID)
+vec2 map(vec3 p) {
+    // Ground
+    vec2 res = vec2(p.y, 0.0); // Material 0: ground
+
+    // Sphere (displaced to y=0.5)
+    float d1 = sdSphere(p - vec3(0.0, 0.5, 0.0), 0.4);
+    res = opU(res, vec2(d1, 1.0)); // Material 1: sphere
+
+    // Box
+    float d2 = sdBox(p - vec3(1.5, 0.4, 0.0), vec3(0.3, 0.4, 0.3));
+    res = opU(res, vec2(d2, 2.0)); // Material 2: box
+
+    // Blend two spheres with smin for organic blob effect
+    float d3 = sdSphere(p - vec3(-1.2, 0.5, 0.0), 0.3);
+    float d4 = sdSphere(p - vec3(-1.5, 0.8, 0.2), 0.25);
+    float dBlob = smin(d3, d4, 0.3);
+    res = opU(res, vec2(dBlob, 3.0)); // Material 3: blob
+
+    return res;
+}
+```
+
+### Step 4: Raymarching
+
+**What**: Implement the sphere tracing loop — cast a ray from the camera and step along the ray direction until hitting a surface or exceeding the maximum distance.
+
+**Why**: Sphere tracing exploits the "safe distance" property of SDFs — the current SDF value tells us there is absolutely no surface within that radius, so we can safely advance that far. This is much more efficient than fixed-step volumetric ray marching, typically achieving precise results in 64-128 steps.
+
+**Code**:
+
+```glsl
+#define MAX_STEPS 128      // Adjustable: step count, 64=fast/coarse, 256=precise/slow
+#define MAX_DIST 40.0       // Adjustable: max trace distance
+#define SURF_DIST 0.0001    // Adjustable: surface detection threshold
+
+vec2 raycast(vec3 ro, vec3 rd) {
+    vec2 res = vec2(-1.0, -1.0);
+    float t = 0.01;
+
+    for (int i = 0; i < MAX_STEPS && t < MAX_DIST; i++) {
+        vec2 h = map(ro + rd * t);
+        if (abs(h.x) < SURF_DIST * t) {
+            res = vec2(t, h.y);
+            break;
+        }
+        t += h.x; // Key: step distance = SDF value
+    }
+    return res; // .x=hit distance, .y=materialID; -1 means no hit
+}
+```
+
+### Step 5: Normal Computation
+
+**What**: Compute the surface normal at the hit point by taking the finite-difference gradient of the SDF.
+
+**Why**: The gradient direction of the SDF is the surface normal direction. We use the tetrahedron trick (4 `map` calls) instead of central differences (6 calls), saving performance and avoiding compiler inline bloat from inlining `map()` multiple times.
+
+**Code**:
+
+```glsl
+// Tetrahedron normal computation (recommended, only 4 map calls)
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(1.0, -1.0) * 0.5773 * 0.0005; // Adjustable: epsilon
+    return normalize(
+        e.xyy * map(pos + e.xyy).x +
+        e.yyx * map(pos + e.yyx).x +
+        e.yxy * map(pos + e.yxy).x +
+        e.xxx * map(pos + e.xxx).x
+    );
+}
+
+// Anti-compiler-inline version (suitable for complex map functions)
+// Uses a loop to prevent compiler unrolling, uses a loop to prevent compiler unrolling
+#define ZERO (min(iFrame, 0))
+vec3 calcNormalLoop(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.0005 * e).x;
+    }
+    return normalize(n);
+}
+```
+
+### Step 6: Soft Shadows
+
+**What**: Cast a secondary ray from the surface point toward the light source, and estimate shadow softness based on the minimum distance encountered along the way.
+
+**Why**: Hard shadows only determine "occluded or not" (0/1), while SDF soft shadows use intermediate distance information to estimate "how close to being occluded." In the formula `k*h/t`, `k` controls shadow softness — larger `k` produces sharper shadows, smaller `k` produces softer shadows. This is one of SDF rendering's killer features.
+
+**Code**:
+
+```glsl
+// k=shadow sharpness (2=very soft, 32=near hard), mint=start offset, tmax=max distance
+float calcSoftshadow(vec3 ro, vec3 rd, float mint, float tmax, float k) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 24; i++) { // Adjustable: shadow step count
+        float h = map(ro + rd * t).x;
+        float s = clamp(k * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res); // Smooth Hermite interpolation
+}
+```
+
+### Step 7: Ambient Occlusion (AO)
+
+**What**: Sample several points along the normal direction and compare actual SDF values with expected distances to estimate occlusion.
+
+**Why**: SDFs naturally provide distance information for cheap AO approximation: if the SDF value at a sample point along the normal is much smaller than its distance to the surface, nearby occluding geometry exists. This method is more physically accurate than traditional SSAO and requires only 5 `map` calls.
+
+**Code**:
+
+```glsl
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) { // Adjustable: number of sample layers
+        float h = 0.01 + 0.12 * float(i) / 4.0; // Adjustable: sample spacing
+        float d = map(pos + h * nor).x;
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+### Step 8: Camera and Rendering Pipeline
+
+**What**: Build a look-at camera matrix, generate screen rays, and chain together the entire rendering pipeline.
+
+**Why**: Mapping screen pixels to 3D rays is the starting point of raymarching. The look-at matrix builds an orthonormal basis from the camera position, target point, and up direction, making camera control intuitive. The final pipeline chains all steps: ray generation, ray marching, normals, lighting/shadows/AO, and post-processing.
+
+**Code**:
+
+```glsl
+// Camera look-at matrix
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+// Render: input ray, output color
+vec3 render(vec3 ro, vec3 rd) {
+    // Background color (sky gradient)
+    vec3 col = vec3(0.7, 0.7, 0.9) - max(rd.y, 0.0) * 0.3;
+
+    // Raycast intersection
+    vec2 res = raycast(ro, rd);
+    float t = res.x;
+    float m = res.y; // Material ID
+
+    if (m > -0.5) {
+        vec3 pos = ro + t * rd;
+        vec3 nor = calcNormal(pos);
+
+        // Material color (varies by ID)
+        vec3 mate = 0.2 + 0.2 * sin(m * 2.0 + vec3(0.0, 1.0, 2.0));
+
+        // Lighting
+        vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        dif *= calcSoftshadow(pos, lig, 0.02, 2.5, 8.0);
+        float amb = 0.5 + 0.5 * nor.y;
+        float occ = calcAO(pos, nor);
+
+        col = mate * (dif * vec3(1.3, 1.0, 0.7) + amb * occ * vec3(0.4, 0.6, 1.0) * 0.6);
+
+        // Fog (exponential decay)
+        col = mix(col, vec3(0.7, 0.7, 0.9), 1.0 - exp(-0.0001 * t * t * t));
+    }
+
+    return clamp(col, 0.0, 1.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Dynamic Organic Body (Smooth Blob Animation)
+
+**Difference from the basic version**: Replaces static primitives with multiple animated spheres blended via `smin`, producing lava/fluid-like organic effects. A common technique for organic fluid-like effects.
+
+**Key modified code**:
+
+```glsl
+// Replace scene definition in map()
+vec2 map(vec3 p) {
+    float d = 2.0;
+    for (int i = 0; i < 16; i++) { // Adjustable: number of spheres
+        float fi = float(i);
+        float t = iTime * (fract(fi * 412.531 + 0.513) - 0.5) * 2.0;
+        d = smin(
+            sdSphere(p + sin(t + fi * vec3(52.5126, 64.627, 632.25)) * vec3(2.0, 2.0, 0.8),
+                     mix(0.5, 1.0, fract(fi * 412.531 + 0.5124))),
+            d,
+            0.4 // Adjustable: blend radius
+        );
+    }
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 2: Infinite Repeating Corridor (Domain Repetition)
+
+**Difference from the basic version**: Uses `mod()` to repeat spatial coordinates infinitely. A common domain repetition technique. Can layer `hash()` to introduce random variation per repeating cell.
+
+**Key modified code**:
+
+```glsl
+// Linear domain repetition
+float repeat(float v, float c) {
+    return mod(v, c) - c * 0.5;
+}
+
+// Angular domain repetition (repeat count times in polar coordinate direction)
+float amod(inout vec2 p, float count) {
+    float an = 6.283185 / count;
+    float a = atan(p.y, p.x) + an * 0.5;
+    float c = floor(a / an);
+    a = mod(a, an) - an * 0.5;
+    p = vec2(cos(a), sin(a)) * length(p);
+    return c; // Returns sector index
+}
+
+vec2 map(vec3 p) {
+    // Repeat every 4 units along the z axis
+    p.z = repeat(p.z, 4.0);
+    // Add bending offset along x axis
+    p.x += 2.0 * sin(p.z * 0.1);
+
+    float d = -sdBox(p, vec3(2.0, 2.0, 20.0)); // Invert = corridor interior
+    d = max(d, -sdBox(p, vec3(1.8, 1.8, 1.9))); // Subtract interior space
+    d = min(d, sdCylinder(p - vec3(1.5, -2.0, 0.0), vec2(0.1, 2.0))); // Add pillars
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 3: Character/Creature Modeling (Organic Character Modeling)
+
+**Difference from the basic version**: Uses `sdEllipsoid` + `sdCapsule` (sdStick) to compose body parts, `smin` to connect with smooth transitions, and `smax` to carve indentations (mouth). Combined with procedural animation to drive joints. A standard approach for character SDF modeling.
+
+**Key modified code**:
+
+```glsl
+// Stick primitive (different radii at each end, suitable for limbs)
+vec2 sdStick(vec3 p, vec3 a, vec3 b, float r1, float r2) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return vec2(length(pa - ba * h) - mix(r1, r2, h * h * (3.0 - 2.0 * h)), h);
+}
+
+vec2 map(vec3 pos) {
+    // Body (ellipsoid)
+    float d = sdEllipsoid(pos, vec3(0.25, 0.3, 0.25));
+
+    // Head (sphere, connected with smin)
+    float dHead = sdEllipsoid(pos - vec3(0.0, 0.35, 0.02), vec3(0.12, 0.15, 0.13));
+    d = smin(d, dHead, 0.1);
+
+    // Arms (sdStick)
+    vec2 arm = sdStick(abs(pos.x) > 0.0 ? vec3(abs(pos.x), pos.yz) : pos,
+                       vec3(0.18, 0.2, -0.05),
+                       vec3(0.35, -0.1, -0.15), 0.03, 0.05);
+    d = smin(d, arm.x, 0.04);
+
+    // Mouth (carved with smax)
+    float dMouth = sdEllipsoid(pos - vec3(0.0, 0.3, 0.15), vec3(0.08, 0.03, 0.1));
+    d = smax(d, -dMouth, 0.03);
+
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 4: Symmetry Exploitation
+
+**Difference from the basic version**: Leverages geometric symmetry (mirror/rotational invariance) to reduce N repeated elements' SDF evaluations to N/k. For example, octahedral symmetry can reduce 18 elements to 4 evaluations. The key is mapping the input point to the symmetry's fundamental domain.
+
+**Key modified code**:
+
+```glsl
+// Fold a point into the octahedral fundamental domain
+vec2 rot45(vec2 v) {
+    return vec2(v.x - v.y, v.y + v.x) * 0.707107;
+}
+
+vec2 map(vec3 p) {
+    float d = sdSphere(p, 0.12); // Center sphere
+
+    // Exploit symmetry: original 18 gears reduced to 4 evaluations
+    vec3 qx = vec3(rot45(p.zy), p.x);
+    if (abs(qx.x) > abs(qx.y)) qx = qx.zxy;
+
+    vec3 qy = vec3(rot45(p.xz), p.y);
+    if (abs(qy.x) > abs(qy.y)) qy = qy.zxy;
+
+    vec3 qz = vec3(rot45(p.yx), p.z);
+    if (abs(qz.x) > abs(qz.y)) qz = qz.zxy;
+
+    vec3 qa = abs(p);
+    qa = (qa.x > qa.y && qa.x > qa.z) ? p.zxy :
+         (qa.z > qa.y) ? p.yzx : p.xyz;
+
+    // Only 4 gear() evaluations needed instead of 18
+    d = min(d, gear(qa, 0.0));
+    d = min(d, gear(qx, 1.0));
+    d = min(d, gear(qy, 1.0));
+    d = min(d, gear(qz, 1.0));
+
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 5: PBR Material Rendering Pipeline
+
+**Difference from the basic version**: Replaces simplified Blinn-Phong with GGX microfacet BRDF, combined with a material ID system to assign different roughness/metalness to each primitive. A standard approach for PBR raymarching.
+
+**Key modified code**:
+
+```glsl
+// GGX/Trowbridge-Reitz NDF
+float D_GGX(float NoH, float roughness) {
+    float a = roughness * roughness;
+    float a2 = a * a;
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (3.14159 * d * d);
+}
+
+// Schlick Fresnel approximation
+vec3 F_Schlick(float VoH, vec3 f0) {
+    return f0 + (1.0 - f0) * pow(1.0 - VoH, 5.0);
+}
+
+// Replace lighting section in render()
+vec3 pbrLighting(vec3 pos, vec3 nor, vec3 rd, vec3 albedo, float roughness, float metallic) {
+    vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+    vec3 hal = normalize(lig - rd);
+    vec3 f0 = mix(vec3(0.04), albedo, metallic);
+
+    float NoL = max(dot(nor, lig), 0.0);
+    float NoH = max(dot(nor, hal), 0.0);
+    float VoH = max(dot(-rd, hal), 0.0);
+
+    float D = D_GGX(NoH, roughness);
+    vec3 F = F_Schlick(VoH, f0);
+
+    vec3 spec = D * F * 0.25; // Simplified specular term
+    vec3 diff = albedo * (1.0 - metallic) / 3.14159;
+
+    float shadow = calcSoftshadow(pos, lig, 0.02, 2.5);
+    return (diff + spec) * NoL * shadow * vec3(1.3, 1.0, 0.7) * 3.0;
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Bounding Volume Acceleration
+
+Use an overall AABB or bounding sphere to constrain the search range. Perform analytical ray intersection first to narrow the `tmin`/`tmax` range, avoiding wasted steps in empty regions. A common optimization in advanced raymarching shaders.
+
+```glsl
+// Ray-AABB intersection (call before raycast)
+vec2 iBox(vec3 ro, vec3 rd, vec3 rad) {
+    vec3 m = 1.0 / rd;
+    vec3 n = m * ro;
+    vec3 k = abs(m) * rad;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    return vec2(max(max(t1.x, t1.y), t1.z),
+                min(min(t2.x, t2.y), t2.z));
+}
+```
+
+### 2. Per-Object Bounding
+
+In `map()`, first check with a cheap sdBox whether the current point is near a primitive. Only compute the precise SDF when close. A standard per-object culling technique.
+
+```glsl
+// Inside map():
+if (sdBox(pos - objectCenter, boundingSize) < res.x) {
+    // Only compute precise SDF when bounding box distance is closer than current nearest
+    res = opU(res, vec2(sdComplexShape(pos), matID));
+}
+```
+
+### 3. Adaptive Step Size
+
+Allow larger precision tolerance at distance, stricter up close. Based on the `abs(h.x) < (0.0001 * t)` check found in nearly all advanced shaders.
+
+### 4. Preventing Compiler Inlining
+
+Complex `map()` functions get inlined 4 times inside `calcNormal`, causing compilation time to explode. Use a loop + `ZERO` macro to prevent inlining. A well-known technique to prevent excessive compiler inlining.
+
+```glsl
+#define ZERO (min(iFrame, 0)) // Compiler cannot prove this is 0 at compile time, so it won't unroll the loop
+```
+
+### 5. Symmetry Exploitation
+
+If the scene has rotational/mirror symmetry, fold the point into the fundamental domain and evaluate only once. Achieves significant speedup (e.g., 18-to-4 reduction) or infinite repetition.
+
+## Combination Suggestions in Detail
+
+### 1. SDF + Noise Displacement
+
+Add noise on top of the `map()` return value to add organic details to smooth surfaces (terrain, skin textures).
+
+```glsl
+float d = sdSphere(p, 1.0);
+d += 0.05 * (sin(p.x * 10.0) * sin(p.y * 10.0) * sin(p.z * 10.0)); // Simple displacement
+// Or use fbm noise: d += 0.1 * fbm(p * 4.0);
+```
+
+**Note**: Noise displacement breaks the SDF's Lipschitz condition (|grad f| <= 1). You need to multiply the step size by a safety factor (e.g., 0.5~0.7) to avoid penetration.
+
+### 2. SDF + Bump Mapping
+
+Instead of modifying the SDF itself, add detail perturbation only in the normal computation. Better performance than noise displacement since it doesn't affect ray marching. A common technique in SDF rendering.
+
+```glsl
+vec3 calcNormalBumped(vec3 pos) {
+    vec3 n = calcNormal(pos);
+    // Add high-frequency detail to the normal
+    n += 0.1 * vec3(fbm(pos.yz * 20.0) - 0.5, 0.0, fbm(pos.xy * 20.0) - 0.5);
+    return normalize(n);
+}
+```
+
+### 3. SDF + Domain Warping
+
+Warp spatial coordinates before entering `map()` to achieve bending, twisting, polar coordinate transforms, and other effects. A common spatial warping technique.
+
+```glsl
+// Cartesian to polar ring space: straight corridor becomes a ring structure
+vec2 displaceLoop(vec2 p, float r) {
+    return vec2(length(p) - r, atan(p.y, p.x));
+}
+```
+
+### 4. SDF + Procedural Animation
+
+Bone/joint angles vary with time, driving SDF primitive positions. `smin` ensures smooth transitions at joints. Common techniques for procedural character animation (squash & stretch, bone chain IK).
+
+```glsl
+// Squash and stretch deformation
+float p = 4.0 * t1 * (1.0 - t1); // Parabolic bounce
+float sy = 0.5 + 0.5 * p;        // Stretch in y direction
+float sz = 1.0 / sy;              // Compress in z direction (preserve volume)
+vec3 q = pos - center;
+float d = sdEllipsoid(q, vec3(0.25, 0.25 * sy, 0.25 * sz));
+```
+
+### 5. SDF + Motion Blur
+
+Average multiple frames sampled across the time dimension. A standard temporal supersampling technique.
+
+```glsl
+// Randomly offset time in mainImage
+float time = iTime;
+#if AA > 1
+    time += 0.5 * float(m * AA + n) / float(AA * AA) / 24.0; // Intra-frame time jitter
+#endif
+```
+
+## Extended SDF Primitives Reference
+
+### Rounded Box — `sdRoundBox(vec3 p, vec3 b, float r)`
+
+- `p`: sample point
+- `b`: half-size dimensions (before rounding)
+- `r`: rounding radius — edges and corners are rounded by this amount
+
+### Box Frame — `sdBoxFrame(vec3 p, vec3 b, float e)`
+
+- `p`: sample point
+- `b`: outer half-size dimensions
+- `e`: edge thickness — the wireframe thickness of the box edges
+
+### Cone — `sdCone(vec3 p, vec2 c, float h)`
+
+- `p`: sample point
+- `c`: vec2(sin, cos) of the cone's opening angle
+- `h`: height of the cone
+
+### Capped Cone — `sdCappedCone(vec3 p, float h, float r1, float r2)`
+
+- `p`: sample point
+- `h`: half-height
+- `r1`: bottom radius
+- `r2`: top radius
+
+### Round Cone — `sdRoundCone(vec3 p, float r1, float r2, float h)`
+
+- `p`: sample point
+- `r1`: bottom sphere radius
+- `r2`: top sphere radius
+- `h`: height between sphere centers
+
+### Solid Angle — `sdSolidAngle(vec3 p, vec2 c, float ra)`
+
+- `p`: sample point
+- `c`: vec2(sin, cos) of the solid angle
+- `ra`: radius
+
+### Octahedron — `sdOctahedron(vec3 p, float s)`
+
+- `p`: sample point
+- `s`: size (distance from center to vertex)
+
+### Pyramid — `sdPyramid(vec3 p, float h)`
+
+- `p`: sample point
+- `h`: height of the pyramid (base is a unit square centered at origin)
+
+### Hex Prism — `sdHexPrism(vec3 p, vec2 h)`
+
+- `p`: sample point
+- `h.x`: hexagonal radius (circumradius)
+- `h.y`: half-height along z axis
+
+### Cut Sphere — `sdCutSphere(vec3 p, float r, float h)`
+
+- `p`: sample point
+- `r`: sphere radius
+- `h`: cut plane height (cuts sphere at y=h)
+
+### Capped Torus — `sdCappedTorus(vec3 p, vec2 sc, float ra, float rb)`
+
+- `p`: sample point
+- `sc`: vec2(sin, cos) of the cap angle
+- `ra`: major radius
+- `rb`: tube radius
+
+### Link — `sdLink(vec3 p, float le, float r1, float r2)`
+
+- `p`: sample point
+- `le`: half-length of the elongation
+- `r1`: major radius of the torus cross-section
+- `r2`: tube radius
+
+### Plane (arbitrary) — `sdPlane(vec3 p, vec3 n, float h)`
+
+- `p`: sample point
+- `n`: plane normal (must be normalized)
+- `h`: offset from origin along the normal
+
+### Rhombus — `sdRhombus(vec3 p, float la, float lb, float h, float ra)`
+
+- `p`: sample point
+- `la`, `lb`: half-diagonals of the rhombus in XZ plane
+- `h`: half-height (extrusion in Y)
+- `ra`: rounding radius
+
+### Triangle (unsigned) — `udTriangle(vec3 p, vec3 a, vec3 b, vec3 c)`
+
+- `p`: sample point
+- `a`, `b`, `c`: triangle vertex positions
+- Returns unsigned (non-negative) distance
+
+## Deformation Operators Reference
+
+### Round — `opRound(float d, float r)`
+
+Softens edges of any SDF by subtracting a radius. Apply to the result of any SDF.
+
+```glsl
+// Round a box with radius 0.1
+float d = opRound(sdBox(p, vec3(1.0)), 0.1);
+```
+
+### Onion — `opOnion(float d, float t)`
+
+Hollows out any SDF into a shell of thickness `t`. Can be stacked for concentric shells.
+
+```glsl
+// Hollow sphere shell, 0.1 thick
+float d = opOnion(sdSphere(p, 1.0), 0.1);
+// Double shell
+float d = opOnion(opOnion(sdSphere(p, 1.0), 0.1), 0.05);
+```
+
+### Elongate — `opElongate(vec3 p, vec3 h, vec3 center, vec3 size)`
+
+Stretches a shape along one or more axes by `h`. The shape is stretched without distortion — it inserts a linear segment.
+
+```glsl
+// Elongate along Y to stretch a box
+vec3 q = abs(p) - vec3(0.0, 0.5, 0.0);
+float d = sdBox(max(q, 0.0), vec3(0.3)) + min(max(q.x, max(q.y, q.z)), 0.0);
+```
+
+### Twist — `opTwist(vec3 p, float k)`
+
+Rotates the XZ cross-section around the Y axis proportionally to height. Returns transformed coordinates to pass into any SDF.
+
+```glsl
+// Twisted box: k controls twist rate (radians per unit height)
+vec3 q = opTwist(p, 3.0);
+float d = sdBox(q, vec3(0.5));
+```
+
+### Cheap Bend — `opCheapBend(vec3 p, float k)`
+
+Bends geometry along the X axis. Returns transformed coordinates.
+
+```glsl
+// Bent box
+vec3 q = opCheapBend(p, 2.0);
+float d = sdBox(q, vec3(0.5, 0.3, 0.5));
+```
+
+### Displacement — `opDisplace(float d, vec3 p)`
+
+Adds procedural sinusoidal surface detail. Breaks Lipschitz bound, so reduce ray march step size by 0.5-0.7.
+
+```glsl
+float d = sdSphere(p, 1.0);
+d = opDisplace(d, p); // Adds bumpy surface detail
+```
+
+## 2D-to-3D Constructors Reference
+
+### Revolution — `opRevolution(vec3 p, float sdf2d_result, float o)`
+
+Creates a 3D solid of revolution by rotating a 2D SDF around the Y axis. Compute the 2D SDF at `vec2(length(p.xz) - o, p.y)` and pass the result.
+
+```glsl
+// Create a torus-like shape by revolving a 2D circle
+vec2 q = vec2(length(p.xz) - 1.0, p.y); // offset=1.0
+float d2d = length(q) - 0.3;             // 2D circle radius=0.3
+float d3d = opRevolution(p, d2d, 1.0);   // revolve around Y
+```
+
+### Extrusion — `opExtrusion(vec3 p, float d2d, float h)`
+
+Extends any 2D SDF along the Z axis with finite height `h`. The 2D SDF is evaluated in the XY plane and capped at `+/- h` along Z.
+
+```glsl
+// Extrude a 2D shape 0.2 units in both directions
+float d2d = sdCircle2D(p.xy, 0.5);      // any 2D SDF
+float d3d = opExtrusion(p, d2d, 0.2);    // finite extrusion
+```
+
+## Symmetry Operators Reference
+
+### Mirror X — `opSymX(vec3 p)`
+
+Mirrors across the X axis using `abs(p.x)`. Model only one half and get bilateral symmetry for free. Place at the start of `map()`.
+
+```glsl
+vec2 map(vec3 p) {
+    p = opSymX(p); // Mirror: only model x >= 0 side
+    float d = sdSphere(p - vec3(1.0, 0.5, 0.0), 0.3);
+    // Automatically appears at both x=+1 and x=-1
+    return vec2(d, 1.0);
+}
+```
+
+### Mirror XZ — `opSymXZ(vec3 p)`
+
+Four-fold symmetry across both X and Z axes. Model one quadrant, get four copies.
+
+```glsl
+vec2 map(vec3 p) {
+    p = opSymXZ(p); // Four-fold symmetry
+    float d = sdBox(p - vec3(2.0, 0.5, 2.0), vec3(0.3));
+    // Appears in all four quadrants
+    return vec2(d, 1.0);
+}
+```
+
+### Arbitrary Mirror — `opMirror(vec3 p, vec3 dir)`
+
+Mirrors across an arbitrary plane defined by its normal `dir` (must be normalized). Reflects any point on the negative side to the positive side.
+
+```glsl
+// Mirror across a 45-degree plane
+vec3 q = opMirror(p, normalize(vec3(1.0, 0.0, 1.0)));
+float d = sdSphere(q - vec3(1.0, 0.5, 0.0), 0.3);
+```
--- a/skills/shader-dev/reference/sdf-tricks.md
+++ b/skills/shader-dev/reference/sdf-tricks.md
@@ -0,0 +1,63 @@
+# SDF Tricks Detailed Reference
+
+## Prerequisites
+- Understanding of signed distance fields and ray marching
+- Basic SDF primitives and boolean operations
+- FBM / procedural noise fundamentals
+
+## Lipschitz Condition and FBM Detail
+
+An SDF must satisfy the **Lipschitz condition**: `|f(a) - f(b)| ≤ |a - b|` (gradient magnitude ≤ 1). This guarantees that stepping by the SDF value is always safe — no surface exists within that radius.
+
+When adding FBM noise to an SDF, the noise derivatives can violate Lipschitz:
+- Raw noise amplitude of 0.1 with frequency 20 has gradient ~2.0, breaking the condition
+- This causes ray marching to overshoot, creating holes and artifacts
+
+**Solutions**:
+1. **Amplitude limiting**: Keep `amplitude × frequency < 1.0` across all octaves
+2. **Distance fade**: `d += amp * fbm(p * freq) * smoothstep(fadeStart, 0.0, d)` — detail only appears near the surface where overshoot distance is small
+3. **Step size reduction**: Multiply ray step by 0.5-0.7, trading speed for stability
+
+## Bounding Volume Strategies
+
+### Hierarchical Bounding
+For scenes with N objects, test bounding volumes in order of increasing cost:
+```
+Level 1: Scene bounding sphere (1 evaluation)
+Level 2: Object group bounds (few evaluations)
+Level 3: Individual object SDF (full cost)
+```
+
+### Spatial Partitioning
+For repeating structures, combine domain repetition with bounds:
+```glsl
+float map(vec3 p) {
+    vec3 q = mod(p + 2.0, 4.0) - 2.0;  // repeat every 4 units
+    // Only evaluate detail if within local bounding sphere
+    float bound = length(q) - 1.5;
+    if (bound > 0.2) return bound;
+    return detailedSDF(q);
+}
+```
+
+## Binary Search Convergence
+
+After N iterations of binary search, the position error is `initialStep / 2^N`:
+- 4 iterations: 1/16 of initial step size
+- 6 iterations: 1/64 of initial step size (sub-pixel at typical resolutions)
+- 8 iterations: 1/256 (overkill for most uses)
+
+6 iterations is the practical sweet spot — gives sub-pixel precision without wasting GPU cycles.
+
+## XOR Operation Mathematics
+
+`opXor(a, b) = max(min(a, b), -max(a, b))`
+
+This is equivalent to: `union(a, b) AND NOT intersection(a, b)` — the symmetric difference. Geometry exists where exactly one shape is present but not both. Useful for creating lattice structures and interlocking patterns.
+
+## Interior SDF Pattern Techniques
+
+When the camera is inside an SDF (d < 0), the negative distance still gives useful information:
+- `abs(d)` gives distance to nearest surface from inside
+- Combine with repeating patterns using `fract()` to create infinite interior structures
+- Use `max(outerSDF, innerSDF)` to confine interior patterns within the outer shell
--- a/skills/shader-dev/reference/shadow-techniques.md
+++ b/skills/shader-dev/reference/shadow-techniques.md
@@ -0,0 +1,476 @@
+# SDF Soft Shadow Techniques - Detailed Reference
+
+This document is a complete supplement to [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step detailed explanations, mathematical derivations, variant descriptions, and full code examples for combinations.
+
+## Use Cases
+
+- **Shadow computation in SDF raymarching scenes**: When using signed distance fields (SDF) for ray marching rendering and you need to add soft shadow effects to the scene
+- **Real-time soft shadow / penumbra effects**: Simulating the penumbra gradient produced by real light source area, rather than simple hard shadow binary results
+- **Terrain / heightfield shadows**: Shadow computation for procedural terrain and height maps
+- **Multi-layer shadow compositing**: Combining ground shadows, vegetation shadows, cloud shadows, and other shadow sources into a final result
+- **Volumetric light / God Ray effects**: Reusing the shadow function to sample along the view ray to generate volumetric light scattering effects
+- **Analytical shadows**: Using O(1) analytical shadows for simple geometry like spheres instead of ray marching
+
+## Prerequisites
+
+- **GLSL fundamentals**: uniforms, varyings, built-in functions (`clamp`, `mix`, `smoothstep`, `normalize`, `dot`, `reflect`)
+- **Raymarching**: Understanding SDF scene representation and the basic sphere tracing workflow
+- **SDF basics**: Understanding signed distance fields — `map(p)` returns the distance from point p to the nearest surface
+- **Basic lighting models**: Diffuse (N·L), specular (Blinn-Phong), ambient light
+- **Vector math**: Dot product, cross product, vector normalization, ray parametric equation `ro + rd * t`
+
+## Core Principles in Detail
+
+The core idea of SDF soft shadows is: **march from a surface point toward the light source, using the ratio of "nearest distance to march distance" to estimate penumbra width**.
+
+### Classic Formula (2013)
+
+```
+shadow = min(shadow, k * h / t)
+```
+
+Where:
+- `h` = SDF value at the current march position (distance to nearest surface)
+- `t` = distance already traveled along the shadow ray
+- `k` = constant controlling penumbra softness (larger = harder, smaller = softer)
+
+**Geometric intuition**: The ratio `h/t` approximates "the angular width of the nearest occluder as seen from the current point on the shadow ray." When the ray grazes an object's surface, `h` is small while `t` is large, making `h/t` small and producing a penumbra region; when the ray is far from all objects, `h/t` is large and the area is fully lit.
+
+Taking the minimum `min(res, k*h/t)` across all sample points along the ray yields "the darkest point," which is the final shadow factor.
+
+### Improved Formula (2018)
+
+The classic formula produces overly dark artifacts near sharp edges. The improved version uses SDF values from adjacent steps to perform geometric triangulation, estimating a more accurate nearest point:
+
+```
+y = h² / (2 * ph)           // ph = SDF value from previous step
+d = sqrt(h² - y²)           // true nearest distance perpendicular to ray direction
+shadow = min(shadow, d / (w * max(0, t - y)))
+```
+
+**Mathematical derivation**: Assume the previous step at ray position `t-h_step` had SDF value `ph`, and the current step at position `t` has SDF value `h`. The intersection region of these two SDF spheres (with radii `ph` and `h` respectively) provides a more accurate estimate of the nearest surface point. Through simple triangle geometry:
+- `y` is the distance to step back along the ray from the current sample point to the nearest point projection
+- `d` is the perpendicular distance from the nearest surface point to the ray
+- The corrected effective distance is `t - y` rather than `t`
+
+### Negative Extension (2020)
+
+Allows `res` to drop to negative values (minimum -1), then remaps to [0,1] with a custom smooth mapping:
+
+```
+res = max(res, -1.0)
+shadow = 0.25 * (1 + res)² * (2 - res)
+```
+
+This eliminates the hard crease produced by the classic `clamp(0,1)`, achieving a smoother penumbra transition.
+
+**Why it works**: The classic method produces a C0 continuous (non-smooth) crease at `res=0` due to clamping. By allowing `res` to enter the negative domain [-1, 0], then remapping with the C1 continuous function `0.25*(1+res)²*(2-res)`, a completely smooth penumbra gradient is obtained. This function evaluates to 0 at `res=-1` and 1 at `res=1`, with smooth derivative transitions at both ends.
+
+## Implementation Steps in Detail
+
+### Step 1: Scene SDF Definition
+
+**What**: Define the scene's signed distance function, returning the distance from any point in space to the nearest surface.
+
+**Why**: Shadow ray marching needs `map(p)` queries to determine step size and penumbra estimation.
+
+```glsl
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdPlane(vec3 p) {
+    return p.y;
+}
+
+float sdRoundBox(vec3 p, vec3 b, float r) {
+    vec3 q = abs(p) - b;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+}
+
+float map(vec3 p) {
+    float d = sdPlane(p);
+    d = min(d, sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5));
+    d = min(d, sdRoundBox(p - vec3(-1.2, 0.3, 0.5), vec3(0.3), 0.05));
+    return d;
+}
+```
+
+### Step 2: Classic Soft Shadow Function
+
+**What**: March from a surface point toward the light source, progressively accumulating the minimum `k*h/t` ratio as the shadow factor.
+
+**Why**: This is the foundational framework for all SDF soft shadows. At each step, `h/t` approximates the angular width of occlusion at that point; the minimum across the entire ray serves as the final penumbra estimate. The k value controls penumbra softness.
+
+```glsl
+// Classic SDF soft shadow
+// ro: shadow ray origin (surface position)
+// rd: light direction (normalized)
+// mint: starting offset (to avoid self-shadowing)
+// tmax: maximum march distance
+float calcSoftShadow(vec3 ro, vec3 rd, float mint, float tmax) {
+    float res = 1.0;
+    float t = mint;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float s = clamp(SHADOW_K * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, MIN_STEP, MAX_STEP);    // Step size clamping
+        if (res < 0.004 || t > tmax) break;    // Early exit
+    }
+
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);      // Smoothstep smoothing
+}
+```
+
+### Step 3: Improved Soft Shadow (Geometric Triangulation)
+
+**What**: Use SDF values from the current and previous steps to estimate a more accurate nearest point position via geometric triangulation, eliminating penumbra artifacts near sharp edges.
+
+**Why**: The classic `h/t` formula assumes the nearest surface point is directly below the current sample position, but the actual nearest point may lie between two steps. Using the intersection relationship of SDF spheres from two adjacent steps provides a more accurate estimate of perpendicular distance `d` and corrected depth `t-y` along the ray.
+
+```glsl
+// Improved SDF soft shadow
+float calcSoftShadowImproved(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+    float ph = 1e10;  // Previous step SDF value, initialized large so first step y≈0
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+
+        // Geometric triangulation: estimate corrected nearest distance
+        float y = h * h / (2.0 * ph);         // Step-back distance along ray
+        float d = sqrt(h * h - y * y);         // True nearest distance perpendicular to ray
+        res = min(res, d / (w * max(0.0, t - y)));
+
+        ph = h;                                // Save current h for next step
+        t += h;
+
+        if (res < 0.0001 || t > tmax) break;
+    }
+
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+```
+
+### Step 4: Negative Extension Version (Smoothest Penumbra)
+
+**What**: Allow the shadow factor to drop into the negative range [-1, 0], then remap to [0, 1] with a custom quadratic smooth function, eliminating hard creases.
+
+**Why**: The classic method produces a C0 continuous (non-smooth) crease at `clamp(0,1)`. By allowing `res` to enter the negative domain and remapping with the C1 continuous function `0.25*(1+res)²*(2-res)`, a completely smooth penumbra gradient is achieved.
+
+```glsl
+// Negative extension soft shadow
+float calcSoftShadowSmooth(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, h / (w * t));
+        t += clamp(h, MIN_STEP, MAX_STEP);
+        if (res < -1.0 || t > tmax) break;    // Allow res to drop to -1
+    }
+
+    res = max(res, -1.0);                      // Clamp to [-1, 1]
+    return 0.25 * (1.0 + res) * (1.0 + res) * (2.0 - res);  // Smooth remapping
+}
+```
+
+### Step 5: Bounding Volume Optimization
+
+**What**: Before starting the march, use simple geometric tests (plane clipping or AABB ray intersection) to narrow the shadow ray's effective range.
+
+**Why**: If the shadow ray cannot possibly hit any object outside a bounded region (e.g., above the scene is empty), `tmax` can be shortened early or 1.0 returned immediately, saving many march iterations.
+
+```glsl
+// Method A: Plane clipping — clip ray to scene upper bound plane
+float tp = (SCENE_Y_MAX - ro.y) / rd.y;
+if (tp > 0.0) tmax = min(tmax, tp);
+
+// Method B: AABB bounding box clipping
+vec2 iBox(vec3 ro, vec3 rd, vec3 rad) {
+    vec3 m = 1.0 / rd;
+    vec3 n = m * ro;
+    vec3 k = abs(m) * rad;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    float tN = max(max(t1.x, t1.y), t1.z);
+    float tF = min(min(t2.x, t2.y), t2.z);
+    if (tN > tF || tF < 0.0) return vec2(-1.0);
+    return vec2(tN, tF);
+}
+
+// Usage in shadow function
+vec2 dis = iBox(ro, rd, BOUND_SIZE);
+if (dis.y < 0.0) return 1.0;       // Ray completely misses bounding box
+tmin = max(tmin, dis.x);
+tmax = min(tmax, dis.y);
+```
+
+### Step 6: Shadow Color Rendering (Color Bleeding)
+
+**What**: Instead of using a uniform scalar shadow value, apply different shadow attenuation curves to the RGB channels.
+
+**Why**: In the real world, penumbra regions exhibit a warm color shift due to subsurface scattering and atmospheric effects — red light penetrates the most while blue light is blocked first. By applying per-channel power operations on the shadow value, this physical phenomenon can be approximated at low cost.
+
+```glsl
+// Method A: Classic color shadow
+// sha is a [0,1] shadow factor
+vec3 shadowColor = vec3(sha, sha * sha * 0.5 + 0.5 * sha, sha * sha);
+// R = sha (linear), G = softer quadratic blend, B = sha² (darkest)
+
+// Method B: Per-channel power operation (Woods style)
+vec3 shadowColor = pow(vec3(sha), vec3(1.0, 1.2, 1.5));
+// R = sha^1.0, G = sha^1.2, B = sha^1.5 → penumbra region shifts warm
+```
+
+### Step 7: Integration into the Lighting Model
+
+**What**: Multiply the shadow value into the diffuse and specular lighting contributions.
+
+**Why**: Shadows are essentially an estimate of "light source visibility" and should act as a multiplicative factor on all lighting terms that depend on that light source. Shadows are typically only computed when N·L > 0 (surface faces the light) to avoid wasting GPU cycles on backlit faces.
+
+```glsl
+// Lighting integration
+vec3 sunDir = normalize(vec3(-0.5, 0.4, -0.6));
+vec3 hal = normalize(sunDir - rd);
+
+// Diffuse × shadow
+float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+if (dif > 0.0001)
+    dif *= calcSoftShadow(pos + nor * 0.01, sunDir, 0.02, 8.0);
+
+// Specular is also modulated by shadow
+float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0);
+spe *= dif;  // dif already includes shadow
+
+// Final color compositing
+vec3 col = vec3(0.0);
+col += albedo * 2.0 * dif * vec3(1.0, 0.9, 0.8);       // Sun diffuse
+col += 5.0 * spe * vec3(1.0, 0.9, 0.8);                 // Sun specular
+col += albedo * 0.5 * clamp(0.5 + 0.5 * nor.y, 0.0, 1.0)
+     * vec3(0.4, 0.6, 1.0);                              // Sky ambient (no shadow)
+```
+
+## Variant Details
+
+### Variant 1: Analytical Sphere Shadow
+
+**Difference from base version**: Does not use ray marching; instead performs an O(1) analytical closest-distance computation for spheres. Suitable for scenes containing only spheres or objects that can be approximated by spheres.
+
+**Principle**: For a ray and a sphere, the closest distance from the ray to the sphere surface and the parameter `t` at that closest point along the ray can be computed analytically. These two values directly form the `d/t` ratio without iterative marching.
+
+```glsl
+// Sphere analytical soft shadow
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    return vec2(d, -b - sqrt(max(h, 0.0)));
+}
+
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph, float k) {
+    vec2 r = sphDistances(ro, rd, sph);
+    if (r.y > 0.0)
+        return clamp(k * max(r.x, 0.0) / r.y, 0.0, 1.0);
+    return 1.0;
+}
+// Multi-sphere aggregation: res = min(res, sphSoftShadow(ro, rd, sphere[i], k))
+```
+
+### Variant 2: Terrain Heightfield Shadow
+
+**Difference from base version**: `h` is not obtained from a generic SDF `map()`, but computed as `p.y - terrain(p.xz)`, the height difference between the ray and the terrain. Step size adapts to camera distance.
+
+**Use cases**: Procedural terrain rendering (using FBM noise-generated height maps). Terrain SDF is difficult to define precisely, but height difference serves as an approximate distance estimate.
+
+```glsl
+float terrainShadow(vec3 ro, vec3 rd, float dis) {
+    float minStep = clamp(dis * 0.01, 0.5, 50.0);  // Distance-adaptive minimum step
+    float res = 1.0;
+    float t = 0.01;
+    for (int i = 0; i < 80; i++) {                  // Terrain needs more iterations
+        vec3 p = ro + t * rd;
+        float h = p.y - terrainMap(p.xz);           // Height difference replaces SDF
+        res = min(res, 16.0 * h / t);               // k=16
+        t += max(minStep, h);
+        if (res < 0.001 || p.y > MAX_TERRAIN_HEIGHT) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### Variant 3: Per-Material Hard/Soft Blend
+
+**Difference from base version**: Uses a global variable or extra parameter to control each object's shadow hardness, blending via `mix(1.0, k*h/t, hardness)`. When `hardness=0`, it produces hard shadows; when `hardness=1`, fully soft shadows.
+
+**Use cases**: Characters need sharp hard shadows (to enhance silhouette), while environment objects use softer shadows.
+
+```glsl
+float hsha = 1.0;  // Global variable, set per material in map()
+
+float mapWithShadowHardness(vec3 p) {
+    float d = sdPlane(p);
+    hsha = 1.0;  // Ground: fully soft shadow
+    float dChar = sdCharacter(p);
+    if (dChar < d) { d = dChar; hsha = 0.0; }  // Character: hard shadow
+    return d;
+}
+
+// Inside shadow loop:
+res = min(res, mix(1.0, SHADOW_K * h / t, hsha));
+```
+
+### Variant 4: Multi-Layer Shadow Composition
+
+**Difference from base version**: Different types of occlusion sources are computed separately, then composed multiplicatively. Typical scenario: ground shadow × vegetation shadow × cloud shadow.
+
+**Design rationale**: Different shadow sources have very different characteristics — terrain shadows need high-precision marching, vegetation shadows can use probability/density field approximation, cloud shadows are large-scale planar projections. Layered computation allows using the optimal algorithm for each type.
+
+```glsl
+// Layered computation
+float sha_terrain = terrainShadow(pos, sunDir, 0.02);
+float sha_trees   = treesShadow(pos, sunDir);
+float sha_clouds  = cloudShadow(pos, sunDir);  // Single planar projection + FBM sample
+
+// Multiplicative composition
+float sha = sha_terrain * sha_trees;
+sha *= smoothstep(-0.3, -0.1, sha_clouds);  // Cloud shadow softened with smoothstep
+
+// Apply to lighting
+dif *= sha;
+```
+
+### Variant 5: Volumetric Light / God Ray Reusing Shadow Function
+
+**Difference from base version**: Marches uniformly along the view ray direction, calling the shadow function toward the light at each step, accumulating light energy. Essentially a secondary sampling of the shadow function to produce volumetric scattering effects.
+
+**Principle**: Volumetric light effects come from the scattering of light by airborne particles. At each point along the view ray, if that point is illuminated by the sun (high shadow value), it contributes some scattered light to the final color. Summing the lighting contributions from all sample points along the view ray produces the volumetric light effect.
+
+```glsl
+// Volumetric light (God Rays)
+float godRays(vec3 ro, vec3 rd, float tmax, vec3 sunDir) {
+    float v = 0.0;
+    float dt = 0.15;                                 // View ray step size
+    float t = dt * fract(texelFetch(iChannel0, ivec2(fragCoord) & 255, 0).x); // Jittering
+    for (int i = 0; i < 32; i++) {                   // Number of samples
+        if (t > tmax) break;
+        vec3 p = ro + rd * t;
+        float sha = calcSoftShadow(p, sunDir, 0.02, 8.0); // Reuse shadow function
+        v += sha * exp(-0.2 * t);                    // Exponential distance falloff
+        t += dt;
+    }
+    v /= 32.0;
+    return v * v;                                    // Square to enhance contrast
+}
+// Usage: col += godRayIntensity * godRays(...) * vec3(1.0, 0.75, 0.4);
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The main cost of SDF soft shadows is the **shadow ray marching per pixel**, which involves multiple `map()` calls. For complex scenes, a single `map()` call may contain dozens of SDF combination operations.
+
+### Optimization Techniques
+
+#### 1. Bounding Volume Culling (Most Significant)
+
+- Plane clipping: `tmax = min(tmax, (yMax - ro.y) / rd.y)` restricts the ray within the scene height range
+- AABB clipping: Use `iBox()` to restrict `tmin`/`tmax` within the bounding box; return 1.0 immediately when the ray completely misses
+- Can reduce 30-70% of wasted iterations
+
+#### 2. Step Size Clamping
+
+- `t += clamp(h, minStep, maxStep)` prevents extremely small steps (getting stuck near surface) and extremely large steps (skipping thin objects)
+- Typical `minStep` values: 0.005~0.05, `maxStep`: 0.2~0.5
+- Distance-adaptive: `minStep = clamp(dis * 0.01, 0.5, 50.0)` uses larger steps for distant shadows
+
+#### 3. Early Exit
+
+- Classic version: `res < 0.004` is already dark enough, no need to continue
+- Negative extension: `res < -1.0` is saturated
+- Height upper bound: `pos.y > yMax` means the ray has left the scene
+
+#### 4. Reduced Shadow SDF Precision
+
+- Use a simplified `map2()` that omits material computation and only returns distance
+- For terrain scenes, use a low-resolution `terrainM()` (fewer FBM octaves) instead of full-precision `terrainH()`
+
+#### 5. Conditional Computation
+
+- `if (dif > 0.0001) dif *= shadow(...)` only computes shadow when facing the light
+- Backlit faces are directly 0, no shadow needed
+
+#### 6. Iteration Count Adjustment
+
+- Simple scenes (a few primitives): 16~32 iterations suffice
+- Complex FBM surfaces: Need 64~128 iterations
+- Terrain scenes: With distance-adaptive step sizes, around 80 iterations
+
+#### 7. Loop Unrolling Control
+
+- `#define ZERO (min(iFrame,0))` prevents the compiler from unrolling loops at compile time, reducing instruction cache pressure
+
+## Combination Suggestions with Full Code
+
+### With Ambient Occlusion (AO)
+
+Shadows handle direct light occlusion; AO handles indirect light occlusion. They complement each other:
+
+```glsl
+float sha = calcSoftShadow(pos, sunDir, 0.02, 8.0);
+float occ = calcAO(pos, nor);
+col += albedo * dif * sha * sunColor;       // Direct light × shadow
+col += albedo * sky * occ * skyColor;       // Ambient light × AO
+```
+
+### With Subsurface Scattering (SSS)
+
+Shadow values can modulate SSS intensity, simulating the translucent light-through effect at shadow edges:
+
+```glsl
+float sss = pow(clamp(dot(rd, sunDir), 0.0, 1.0), 4.0);
+sss *= 0.25 + 0.75 * sha;  // SSS reduced but not eliminated in shadow
+col += albedo * sss * vec3(1.0, 0.4, 0.2);
+```
+
+### With Fog / Atmospheric Scattering
+
+Shadows should be "washed out" by fog at distance. The common approach is to complete shadow lighting before applying fog, which naturally blends:
+
+```glsl
+// First complete lighting with shadows
+vec3 col = albedo * lighting_with_shadow;
+// Then apply fog (distance fog naturally weakens shadow contrast)
+col = mix(col, fogColor, 1.0 - exp(-0.001 * t * t));
+```
+
+### With Normal Maps / Bump Mapping
+
+Shadows use the geometric normal (not the perturbed normal) to compute N·L for determining light-facing, but shadow rays are still cast from the actual surface point. Normal maps only affect lighting calculations, not shadows:
+
+```glsl
+vec3 geoNor = calcNormal(pos);              // Geometric normal
+vec3 nor = perturbNormal(geoNor, ...);      // Perturbed normal
+float dif = clamp(dot(nor, sunDir), 0.0, 1.0);  // Use perturbed normal for diffuse
+if (dot(geoNor, sunDir) > 0.0)                    // Use geometric normal to decide shadow
+    dif *= calcSoftShadow(pos + geoNor * 0.01, sunDir, 0.02, 8.0);
+```
+
+### With Reflections
+
+The shadow function can be reused for the reflection direction, occluding specular highlights that should not be visible:
+
+```glsl
+vec3 ref = reflect(rd, nor);
+float refSha = calcSoftShadow(pos + nor * 0.01, ref, 0.02, 8.0);
+col += specular * envColor * refSha * occ;
+```
--- a/skills/shader-dev/reference/simulation-physics.md
+++ b/skills/shader-dev/reference/simulation-physics.md
@@ -0,0 +1,644 @@
+# GPU Physics Simulation — Detailed Reference
+
+This document is the complete reference material for [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, texture sampling (`texture`/`texelFetch`), `fragCoord`/`iResolution` coordinate system
+- **ShaderToy Multi-Pass Mechanism**: Buffer A/B/C/D read/write between each other, `iChannel0~3` binding, Common pass for shared code
+- **Vector Calculus Basics**: gradient, divergence, curl, Laplacian
+- **Numerical Integration**: Forward Euler, semi-implicit methods (Semi-implicit / Verlet)
+- **Textures as Data Storage**: Encoding physical quantities such as position/velocity/density into RGBA channels of texture pixels
+
+## Core Principles in Detail
+
+The core paradigm of GPU physics simulation is **Buffer Feedback**: leveraging ShaderToy's multi-pass architecture to store physical state (position, velocity, density, pressure, etc.) in texture buffers. Each frame reads the previous frame's state, computes new state, and writes it back. Each pixel computes independently in parallel, achieving GPU-level massively parallel physics solving.
+
+### Key Mathematical Tools in Detail
+
+**1. Discrete Laplacian Operator** (used for wave equation, viscous force, diffusion):
+```
+∇²f ≈ f(x+1,y) + f(x-1,y) + f(x,y+1) + f(x,y-1) - 4·f(x,y)
+```
+The Laplacian measures the difference between a point's value and the average of its neighbors. In the wave equation, it drives wave propagation; in fluid simulation, it provides viscous force (velocity diffusion); in the heat equation, it drives temperature equalization.
+
+**2. Semi-Lagrangian Advection** (used for fluid solving):
+```
+f_new(x) = f_old(x - v·dt)    // backward tracing along the velocity field
+```
+Advection is the most critical step in fluid simulation. The semi-Lagrangian method achieves unconditionally stable advection through "backward tracing" — starting from the target position, tracing backward along the velocity field to find the source position, then sampling the value at the source. This avoids the CFL condition limitation of forward Euler advection.
+
+**3. Spring-Damper Force** (used for cloth, soft bodies):
+```
+F_spring = k · (|Δx| - L₀) · normalize(Δx)
+F_damper = c · dot(normalize(Δx), Δv) · normalize(Δx)
+```
+Spring force pulls two mass points back to the rest length L₀; stiffness k determines the restoring force strength. Damper force attenuates relative velocity along the connection direction; coefficient c determines the energy dissipation rate. Combined, they produce stable elastic motion.
+
+**4. Vorticity Confinement** (used for preserving fluid detail):
+```
+curl = ∂v_x/∂y - ∂v_y/∂x
+vorticity_force = ε · (∇|curl| × curl) / |∇|curl||
+```
+Numerical viscosity over-smooths small-scale vortices. Vorticity confinement compensates for this artificial dissipation by applying an additional force in high-vorticity regions, pushing small vortices into more concentrated rotational structures and preserving the visual richness of the fluid.
+
+## Implementation Steps in Detail
+
+### Step 1: Ping-Pong Double Buffer Structure
+
+**What**: Create two Buffers (A and B) that alternate read/write to achieve state persistence.
+
+**Why**: GPU shaders cannot simultaneously read and write the same buffer. The ping-pong strategy reads from one buffer (previous frame's data) and writes to the other each frame, then swaps on the next frame.
+
+**IMPORTANT: Key Difference Between ShaderToy and WebGL2**: In ShaderToy, Buffer A/B are two independent passes with separate write targets, so `iChannel0=self, iChannel1=other` doesn't conflict. However, in WebGL2 there's only one shader program doing ping-pong, and the write target texture cannot be simultaneously read. The solution is **dual-channel encoding** (R=current height, G=previous frame height).
+
+**Code** (WebGL2-safe version, reads only from iChannel0, with RGBA8-compatible encoding):
+```glsl
+// IMPORTANT: Only use iChannel0 (read currentBuf), write to nextBuf (must be different!)
+// IMPORTANT: encode/decode ensure signed values aren't clipped on RGBA8 (no float textures/SwiftShader)
+uniform int useFloatTex;
+float decode(float v) { return useFloatTex == 1 ? v : v * 2.0 - 1.0; }
+float encode(float v) { return useFloatTex == 1 ? v : v * 0.5 + 0.5; }
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 texel = 1.0 / iResolution.xy;
+
+    float current = decode(texture(iChannel0, uv).x);
+    float previous = decode(texture(iChannel0, uv).y);
+
+    float left  = decode(texture(iChannel0, uv - vec2(texel.x, 0.0)).x);
+    float right = decode(texture(iChannel0, uv + vec2(texel.x, 0.0)).x);
+    float down  = decode(texture(iChannel0, uv - vec2(0.0, texel.y)).x);
+    float up    = decode(texture(iChannel0, uv + vec2(0.0, texel.y)).x);
+
+    float laplacian = left + right + down + up - 4.0 * current;
+    float next = 2.0 * current - previous + 0.25 * laplacian;
+
+    next *= 0.995; // damping decay
+    next *= min(1.0, float(iFrame)); // zero on frame 0
+
+    fragColor = vec4(encode(next), encode(current), 0.0, 0.0);
+}
+```
+
+### Step 2: Interaction-Driven (External Force Injection)
+
+**What**: Inject energy into the simulation through mouse clicks or programmatic generation.
+
+**Why**: Physics simulations need external excitation to start and sustain. Mouse interaction is the most intuitive driving method; programmatic methods can simulate raindrops, explosions, etc.
+
+**Code** (insert before wave equation computation):
+```glsl
+float d = 0.0;
+
+if (iMouse.z > 0.0)
+{
+    // Mouse click: create ripple at mouse position
+    d = smoothstep(4.5, 0.5, length(iMouse.xy - fragCoord));
+}
+else
+{
+    // Programmatic raindrop: pseudo-random position + impulse
+    float t = iTime * 2.0;
+    vec2 pos = fract(floor(t) * vec2(0.456665, 0.708618)) * iResolution.xy;
+    float amp = 1.0 - step(0.05, fract(t));
+    d = -amp * smoothstep(2.5, 0.5, length(pos - fragCoord));
+}
+```
+
+### Step 3: Rendering Layer (Height Field Visualization)
+
+**What**: Read simulation results in the Image Pass, compute normals via gradient calculation, and render lighting effects.
+
+**Why**: The simulation result is a height field texture that needs to be transformed into a visible surface effect. Computing gradients via finite differences as normals enables refraction, diffuse reflection, specular highlights, and other water surface effects.
+
+**Code** (Image Pass):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec3 e = vec3(vec2(1.0) / iResolution.xy, 0.0);
+
+    // Read four-neighbor height values from Buffer A
+    float left  = texture(iChannel0, uv - e.xz).x;
+    float right = texture(iChannel0, uv + e.xz).x;
+    float down  = texture(iChannel0, uv - e.zy).x;
+    float up    = texture(iChannel0, uv + e.zy).x;
+
+    // Construct normal from gradient
+    vec3 normal = normalize(vec3(right - left, up - down, 1.0));
+
+    // Lighting computation
+    vec3 light = normalize(vec3(0.2, -0.5, 0.7));
+    float diffuse = max(dot(normal, light), 0.0);
+    float spec = pow(max(-reflect(light, normal).z, 0.0), 32.0);
+
+    // Refraction-offset background texture sampling
+    vec4 bg = texture(iChannel1, uv + normal.xy * 0.35);
+    vec3 waterTint = vec3(0.7, 0.8, 1.0);
+
+    fragColor = mix(bg, vec4(waterTint, 1.0), 0.25) * diffuse + spec;
+}
+```
+
+### Step 4: Chained Multi-Buffer Iteration (Improving Accuracy)
+
+**What**: Chain multiple Buffers together to execute the same solver multiple times per frame.
+
+**Why**: Many physics solvers (fluid pressure projection, constraint solving) require multiple iterations to converge. In ShaderToy, you can chain Buffer A → B → C to execute the same code, equivalent to 3 iterations per frame. This is critical for Eulerian fluid (pressure-divergence elimination) and rigid bodies (impulse constraint solving).
+
+**Full Euler fluid solver code** (Buffer A/B/C share Common pass):
+```glsl
+// === Common Pass ===
+#define dt 0.15                        // adjustable: time step
+#define viscosityThreshold 0.64        // adjustable: viscosity coefficient (larger = thinner)
+#define vorticityThreshold 0.25        // adjustable: vorticity confinement strength
+
+vec4 fluidSolver(sampler2D field, vec2 uv, vec2 step,
+                 vec4 mouse, vec4 prevMouse)
+{
+    float k = 0.2, s = k / dt;
+
+    // Sample center and four neighbors
+    vec4 c  = textureLod(field, uv, 0.0);
+    vec4 fr = textureLod(field, uv + vec2(step.x, 0.0), 0.0);
+    vec4 fl = textureLod(field, uv - vec2(step.x, 0.0), 0.0);
+    vec4 ft = textureLod(field, uv + vec2(0.0, step.y), 0.0);
+    vec4 fd = textureLod(field, uv - vec2(0.0, step.y), 0.0);
+
+    // Divergence and density gradient
+    vec3 ddx = (fr - fl).xyz * 0.5;
+    vec3 ddy = (ft - fd).xyz * 0.5;
+    float divergence = ddx.x + ddy.y;
+    vec2 densityDiff = vec2(ddx.z, ddy.z);
+
+    // Density solve
+    c.z -= dt * dot(vec3(densityDiff, divergence), c.xyz);
+
+    // Viscous force (Laplacian)
+    vec2 laplacian = fr.xy + fl.xy + ft.xy + fd.xy - 4.0 * c.xy;
+    vec2 viscosity = viscosityThreshold * laplacian;
+
+    // Semi-Lagrangian advection
+    vec2 densityInv = s * densityDiff;
+    vec2 uvHistory = uv - dt * c.xy * step;
+    c.xyw = textureLod(field, uvHistory, 0.0).xyw;
+
+    // Mouse external force
+    vec2 extForce = vec2(0.0);
+    if (mouse.z > 1.0 && prevMouse.z > 1.0)
+    {
+        vec2 drag = clamp((mouse.xy - prevMouse.xy) * step * 600.0,
+                          -10.0, 10.0);
+        vec2 p = uv - mouse.xy * step;
+        extForce += 0.001 / dot(p, p) * drag;
+    }
+
+    c.xy += dt * (viscosity - densityInv + extForce);
+
+    // Velocity decay
+    c.xy = max(vec2(0.0), abs(c.xy) - 5e-6) * sign(c.xy);
+
+    // Vorticity confinement
+    c.w = (fd.x - ft.x + fr.y - fl.y); // curl
+    vec2 vorticity = vec2(abs(ft.w) - abs(fd.w),
+                          abs(fl.w) - abs(fr.w));
+    vorticity *= vorticityThreshold / (length(vorticity) + 1e-5) * c.w;
+    c.xy += vorticity;
+
+    // Boundary conditions
+    c.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+    c.x *= smoothstep(0.5, 0.49, abs(uv.x - 0.5));
+
+    // Stability clamping
+    c = clamp(c, vec4(-24.0, -24.0, 0.5, -0.25),
+                 vec4( 24.0,  24.0, 3.0,  0.25));
+
+    return c;
+}
+
+// === Buffer A / B / C (identical code) ===
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 stepSize = 1.0 / iResolution.xy;
+    vec4 prevMouse = textureLod(iChannel0, vec2(0.0), 0.0);
+    fragColor = fluidSolver(iChannel0, uv, stepSize, iMouse, prevMouse);
+
+    // Bottom row stores mouse state
+    if (fragCoord.y < 1.0) fragColor = iMouse;
+}
+```
+
+### Step 5: Texture Data Layout for Particle/Mass-Point Systems
+
+**What**: Encode particle positions, velocities, and other attributes at specific pixel locations in a texture.
+
+**Why**: In GPU physics simulation, each particle/mass point needs to store multiple attributes (position, velocity, force, etc.). By partitioning the texture into regions (e.g., left half for positions, right half for velocities), or encoding different attributes into different RGBA channels, a compact data layout is achieved.
+
+**Code** (cloth simulation data layout example):
+```glsl
+#define SIZX 128.0  // adjustable: cloth width (particle count)
+#define SIZY 64.0   // adjustable: cloth height (particle count)
+
+// Left half [0, SIZX) stores positions, right half [SIZX, 2*SIZX) stores velocities
+// IMPORTANT: In WebGL2, getpos/getvel both read from iChannel0 (currentBuf, read-only),
+//    write target is nextBuf (separate buffer), avoiding read-write conflict
+vec3 getpos(vec2 id)
+{
+    return texture(iChannel0, (id + 0.5) / iResolution.xy).xyz;
+}
+
+vec3 getvel(vec2 id)
+{
+    return texture(iChannel0, (id + 0.5 + vec2(SIZX, 0.0)) / iResolution.xy).xyz;
+}
+
+// In mainImage, decide whether to output position or velocity based on fragCoord
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 fc = floor(fragCoord);
+    vec2 c = fc;
+    c.x = fract(c.x / SIZX) * SIZX; // mass point ID
+
+    vec3 pos = getpos(c);
+    vec3 vel = getvel(c);
+
+    // ... physics computation ...
+
+    // Output: left half stores position, right half stores velocity
+    fragColor = vec4(fc.x >= SIZX ? vel : pos, 0.0);
+}
+```
+
+### Step 6: Spring-Damper Constraint System
+
+**What**: Implement spring forces and damping forces between mass points.
+
+**Why**: Spring-dampers are the core of cloth and soft body simulation. Each mass point is connected to neighbors via springs — spring force maintains structural shape, damping force dissipates oscillation energy. Using near-neighbors (structural springs) + diagonals (shear springs) + skip-connections (bending springs) provides complete constraints.
+
+**Full code**:
+```glsl
+const float SPRING_K = 0.15;  // adjustable: spring stiffness
+const float DAMPER_C = 0.10;  // adjustable: damping coefficient
+const float GRAVITY  = 0.0022; // adjustable: gravitational acceleration
+
+vec3 pos, vel, ovel;
+vec2 c; // current mass point ID
+
+void edge(vec2 dif)
+{
+    // Boundary check
+    if ((dif + c).x < 0.0 || (dif + c).x >= SIZX ||
+        (dif + c).y < 0.0 || (dif + c).y >= SIZY) return;
+
+    float restLen = length(dif); // rest length = initial distance
+    vec3 posdif = getpos(dif + c) - pos;
+    vec3 veldif = getvel(dif + c) - ovel;
+
+    // IMPORTANT: Must check for zero length, otherwise normalize(vec3(0)) produces NaN
+    float plen = length(posdif);
+    if (plen < 0.0001) return;
+    vec3 dir = posdif / plen;
+
+    // Spring force: restore to rest length
+    vel += dir
+         * clamp(plen - restLen, -1.0, 1.0)
+         * SPRING_K;
+
+    // Damping force: attenuate relative velocity along connection direction
+    vel += dir
+         * dot(dir, veldif)
+         * DAMPER_C;
+}
+
+// In mainImage, call 12 edges (near-neighbors + diagonals + skip-connections)
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    // ... initialize pos, vel, c ...
+    ovel = vel;
+
+    // Structural springs (4 near-neighbors)
+    edge(vec2( 0.0, 1.0));
+    edge(vec2( 0.0,-1.0));
+    edge(vec2( 1.0, 0.0));
+    edge(vec2(-1.0, 0.0));
+
+    // Shear/bending springs (diagonals + skip-connections)
+    edge(vec2( 1.0, 1.0));
+    edge(vec2(-1.0,-1.0));
+    edge(vec2( 0.0, 2.0));
+    edge(vec2( 0.0,-2.0));
+    edge(vec2( 2.0, 0.0));
+    edge(vec2(-2.0, 0.0));
+    edge(vec2( 2.0,-2.0));
+    edge(vec2(-2.0, 2.0));
+
+    // Collision detection (sphere)
+    // ... ballcollis() ...
+
+    // Integration
+    pos += vel;
+    vel.y += GRAVITY;
+
+    // Air resistance (normal wind force)
+    vec3 norm = findnormal(c);
+    vec3 windvel = vec3(0.01, 0.0, -0.005); // adjustable: wind direction and speed
+    vel -= norm * (dot(norm, vel - windvel) * 0.05);
+
+    // Fixed boundary (top row pinned as curtain rod)
+    if (c.y == 0.0)
+    {
+        pos = vec3(fc.x * 0.85, fc.y, fc.y * 0.01);
+        vel = vec3(0.0);
+    }
+
+    fragColor = vec4(fc.x >= SIZX ? vel : pos, 0.0);
+}
+```
+
+### Step 7: N-Body Particle Interaction (Biot-Savart Vortex Method)
+
+**What**: Implement all-pairs interaction forces between all particles.
+
+**Why**: Certain physical systems (such as vortex dynamics, gravitational N-body problems) require each particle to interact with all other particles. The Biot-Savart law gives the velocity field generated by vorticity, which is the core of 2D vortex simulation. Uses semi-Newton (Verlet-type) two-step integration for improved accuracy.
+
+**Full code**:
+```glsl
+#define N 20           // adjustable: N×N total particles
+#define Nf float(N)
+#define MARKERS 0.90   // adjustable: passive marker particle ratio
+
+// STRENGTH automatically scales with particle count and marker ratio
+float STRENGTH = 1e3 * 0.25 / (1.0 - MARKERS) * sqrt(30.0 / Nf);
+
+#define tex(i,j) texture(iChannel1, (vec2(i,j) + 0.5) / iResolution.xy)
+#define W(i,j)   tex(i, j + N).z  // vorticity stored in tile(0,1) z channel
+
+void mainImage(out vec4 O, vec2 U)
+{
+    vec2 T = floor(U / Nf);   // tile index
+    U = mod(U, Nf);            // particle ID
+
+    // Pass 1 (Buffer A): half-step integration dt*0.5
+    // Pass 2 (Buffer B): full-step integration using Pass 1 velocity
+
+    vec2 F = vec2(0.0);
+
+    // N×N all-pairs Biot-Savart summation
+    for (int j = 0; j < N; j++)
+        for (int i = 0; i < N; i++)
+        {
+            float w = W(i, j);
+            vec2 d = tex(i, j).xy - O.xy;
+            // Periodic boundary: take nearest image
+            d = (fract(0.5 + d / iResolution.xy) - 0.5) * iResolution.xy;
+            float l = dot(d, d);
+            if (l > 1e-5)
+                F += vec2(-d.y, d.x) * w / l; // Biot-Savart kernel
+        }
+
+    O.zw = STRENGTH * F;  // velocity
+    O.xy += O.zw * dt;    // integrate position
+    O.xy = mod(O.xy, iResolution.xy); // periodic boundary
+}
+```
+
+### Step 8: State Storage in Specific Pixels (Global Variable Trick)
+
+**What**: Store global state (current position, time, mouse history) at fixed pixel locations in the texture.
+
+**Why**: GPU shaders have no global variables. By storing state at agreed-upon pixel coordinates (usually `(0,0)` or the bottom row), the next frame can read these "global variables". This is indispensable for ODE integration (e.g., Lorenz attractor) and interactions that need to track mouse history.
+
+**Full code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    // Pixel (0,0) stores global state (e.g., Lorenz attractor's current 3D position)
+    if (floor(fragCoord) == vec2(0, 0))
+    {
+        if (iFrame == 0)
+        {
+            fragColor = vec4(0.1, 0.001, 0.0, 0.0); // initial conditions
+        }
+        else
+        {
+            vec3 state = texture(iChannel0, vec2(0.0)).xyz;
+            // Execute multi-step ODE integration
+            for (float i = 0.0; i < 96.0; i++)
+            {
+                // Lorenz system: dx/dt = σ(y-x), dy/dt = x(ρ-z)-y, dz/dt = xy-βz
+                vec3 deriv;
+                deriv.x = 10.0 * (state.y - state.x);        // σ = 10
+                deriv.y = state.x * (28.0 - state.z) - state.y; // ρ = 28
+                deriv.z = state.x * state.y - 8.0/3.0 * state.z; // β = 8/3
+                state += deriv * 0.016 * 0.2;
+            }
+            fragColor = vec4(state, 0.0);
+        }
+        return;
+    }
+
+    // Other pixels: accumulate trajectory distance field
+    vec3 last = texture(iChannel0, vec2(0.0)).xyz;
+    float d = 1e6;
+    for (float i = 0.0; i < 96.0; i++)
+    {
+        vec3 next = Integrate(last, 0.016 * 0.2);
+        d = min(d, dfLine(last.xz * 0.015, next.xz * 0.015, uv));
+        last = next;
+    }
+
+    float c = 0.5 * smoothstep(1.0 / iResolution.y, 0.0, d);
+    vec3 prev = texture(iChannel0, fragCoord / iResolution.xy).rgb;
+    fragColor = vec4(vec3(c) + prev * 0.99, 0.0); // decaying accumulation
+}
+```
+
+## Common Variant Details
+
+### Variant 1: Eulerian Fluid Simulation (Smoke / Ink)
+
+**Difference from base version**: Extends from scalar wave equation to full 2D velocity field solving — including advection, viscosity, vorticity confinement, and density tracking. Requires 3+ chained buffer iterations for enhanced convergence.
+
+**Key code**:
+```glsl
+// Buffer storage: xy = velocity, z = density, w = curl
+// Key difference: semi-Lagrangian advection replaces simple neighborhood update
+vec2 uvHistory = uv - dt * velocity.xy * stepSize;
+vec4 advected = textureLod(field, uvHistory, 0.0);
+
+// Vorticity confinement (preserve fluid detail)
+float curl = (fd.x - ft.x + fr.y - fl.y);
+vec2 vortGrad = vec2(abs(ft.w) - abs(fd.w), abs(fl.w) - abs(fr.w));
+vec2 vortForce = vorticityThreshold / (length(vortGrad) + 1e-5) * curl * vortGrad;
+velocity.xy += vortForce;
+```
+
+### Variant 2: Cloth Simulation (Mass-Spring-Damper)
+
+**Difference from base version**: Changes from grid-based field equations to a discrete particle system. Each pixel represents a mass point storing 3D position and velocity. Connected to neighbors via spring-dampers, plus gravity, wind force, and collision. Multi-buffer chained iteration (4 passes) implements multiple sub-steps.
+
+**Key code**:
+```glsl
+// Data layout: left half of texture = position, right half = velocity
+// Spring force core
+vec3 posdif = getpos(neighbor) - pos;
+vec3 veldif = getvel(neighbor) - vel;
+float restLen = length(neighborOffset);
+force += normalize(posdif) * clamp(length(posdif) - restLen, -1.0, 1.0) * 0.15;
+force += normalize(posdif) * dot(normalize(posdif), veldif) * 0.10;
+
+// Sphere collision response
+if (length(pos - ballPos) < ballRadius) {
+    vel -= normalize(pos - ballPos) * dot(normalize(pos - ballPos), vel);
+    pos = ballPos + normalize(pos - ballPos) * ballRadius;
+}
+```
+
+> **IMPORTANT: Common Pitfalls**:
+> - **Cloth Image Pass must project world coordinates to screen**: You cannot use `uv * vec2(SIZX, SIZY)` to map screen UV to grid ID, because particles have moved from their initial positions, producing scattered fragments. You must iterate over mesh faces, projecting vertex world coordinates to screen space for triangle rasterization
+> - GLSL is strictly typed; you cannot write `float / vec2`. Wrong example: `length(dif) / vec2(SIZX, SIZY).x` will first execute float/vec2 causing a compile error; use `length(dif) / SIZX` instead
+> - `normalize(vec3(0))` produces NaN; all `normalize()` calls must include a length check beforehand
+> - In the Image Pass, `getpos`/`getvel` must use the simulation resolution (`iSimResolution`) for UV calculation, not the screen resolution `iResolution`
+> - Texel center sampling should use `+0.5` offset (not `+0.01`)
+
+### Variant 3: Rigid Body Physics Engine (Box2D-lite on GPU)
+
+**Difference from base version**: The most complex variant. Uses structured pixel addressing (ECS data layout) to serialize rigid body attributes, joints, contact points, etc., into textures. Buffer A handles integration + collision detection, Buffer B/C/D handle impulse constraint iteration. Requires Common pass to encapsulate a complete physics library.
+
+**Key code**:
+```glsl
+// Structured memory addressing: map structs to consecutive pixels
+int bodyAddress(int b_id) {
+    return pixel_count_of_Globals + pixel_count_of_Body * b_id;
+}
+Body loadBody(sampler2D buff, int b_id) {
+    int addr = bodyAddress(b_id);
+    vec4 d0 = texelFetch(buff, address2D(res, addr), 0);
+    vec4 d1 = texelFetch(buff, address2D(res, addr+1), 0);
+    b.pos = d0.xy; b.vel = d0.zw;
+    b.ang = d1.x; b.ang_vel = d1.y; // ...
+}
+
+// Contact impulse solving
+float v_n = dot(dv, contact.normal);
+float dp_n = contact.mass_n * (-v_n + contact.bias);
+dp_n = max(0.0, dp_n);
+body.vel += body.inv_mass * dp_n * contact.normal;
+```
+
+### Variant 4: N-Body Vortex Particle Simulation
+
+**Difference from base version**: Changes from field (Eulerian) method to particle (Lagrangian) method. Each particle carries vorticity, and the Biot-Savart law computes the full-field velocity. Uses semi-Newton two-step integration (Buffer A half-step → Buffer B full-step). O(N²) all-pairs interaction.
+
+**Key code**:
+```glsl
+// Biot-Savart kernel: velocity induced by vorticity w at distance d
+// v = w * (-dy, dx) / |d|²
+for (int j = 0; j < N; j++)
+    for (int i = 0; i < N; i++) {
+        float w = W(i, j);
+        vec2 d = tex(i, j).xy - pos;
+        d = (fract(0.5 + d / res) - 0.5) * res; // periodic boundary
+        float l = dot(d, d);
+        if (l > 1e-5) F += vec2(-d.y, d.x) * w / l;
+    }
+```
+
+### Variant 5: 3D SPH Particle Fluid
+
+**Difference from base version**: Extends to 3D. Uses Particle Cluster Grid (PCG) for spatial neighborhood management, custom bit packing (5-bit exponent + 9-bit component) to compress particle data into 4 floats. Buffer A handles advection + clustering, Buffer B computes density, Buffer C computes forces + integration, Buffer D computes shadows.
+
+**Key code**:
+```glsl
+// Map 3D grid to 2D texture
+vec2 dim2from3(vec3 p3d) {
+    float ny = floor(p3d.z / SCALE.x);
+    float nx = floor(p3d.z) - ny * SCALE.x;
+    return vec2(nx, ny) * size3d.xy + p3d.xy;
+}
+
+// SPH pressure force
+float pressure = max(rho / rest_density - 1.0, 0.0);
+float SPH_F = force_coef_a * GD(d, 1.5) * pressure;
+// Friction + surface tension
+float Friction = 0.45 * dot(dir, dvel) * GD(d, 1.5);
+float F = surface_tension * GD(d, surface_tension_rad);
+p.force += force_k * dir * (F + SPH_F + Friction) * irho / rest_density;
+```
+
+## Performance Optimization Details
+
+### 1. Neighborhood Sampling Optimization
+- **Bottleneck**: Each pixel samples 4~12 neighbors; texture bandwidth is the main bottleneck
+- **Optimization**: Use `texelFetch` instead of `texture` (skips filtering), pre-compute `1.0/iResolution.xy` to avoid repeated division
+
+### 2. N-Body O(N²) Loop Optimization
+- **Bottleneck**: All-pairs interaction has O(N²) complexity; N=20 means 400 iterations per frame, N=50 means 2500
+- **Optimization**:
+  - Limit N value (20~30 is enough for good visual results)
+  - Use "cheap" periodic boundary mode (`fract` instead of 3×3 loop traversal)
+  - Passive marker particles (90%) don't participate in force computation, only flow passively
+
+### 3. Iteration Count vs. Accuracy Balance
+- **Bottleneck**: Fluid/rigid body solvers need multiple iterations, but each buffer can only execute once
+- **Optimization**:
+  - Use 3 chained buffers (A→B→C) for 3 iterations/frame
+  - 4 chained buffers for cloth (4 sub-steps/frame, time step = 1/4/60)
+  - More buffers consume more GPU memory; balance accuracy against resources
+
+### 4. Adaptive Precision
+- **Optimization**: Use larger step sizes for screen edges or distant regions
+```glsl
+// Kelvin wave example: distant pixels use 8× step size
+if (abs(U.y * R.y) > 100.0) dx *= 8.0 * abs(U.y);
+```
+
+### 5. Data Packing Compression
+- **Optimization**: When each particle has more than 4 float attributes, use bit operations for packing
+```glsl
+// 3D SPH example: 3 floats compressed into 1 uint (5-bit exponent + 3×9-bit components)
+uint packvec3(vec3 v) {
+    int exp = clamp(int(ceil(log2(max(...)))), -15, 15);
+    float scale = exp2(-float(exp));
+    uvec3 sv = uvec3(round(clamp(v*scale, -1.0, 1.0) * 255.0) + 255.0);
+    return uint(exp + 15) | (sv.x << 5) | (sv.y << 14) | (sv.z << 23);
+}
+```
+
+### 6. Stability Safeguards
+- Apply `clamp` to velocity/density to prevent numerical explosion
+- Use `smoothstep` for soft boundary decay instead of hard cutoff
+- Keep damping coefficients in the 0.95~0.999 range
+
+## Combination Suggestions in Detail
+
+### 1. Physics Simulation + Post-Processing Rendering
+The most common combination. Buffer passes handle physics computation, Image pass handles visualization:
+- **Waves + Refraction/Caustics**: Height field gradient drives refraction-offset sampling
+- **Fluid + Ink Coloring**: Velocity field advects colored ink particles (Buffer D), with HSV random coloring
+- **Cloth + Ray Tracing**: Voxelized spatial tree accelerates cloth surface ray intersection
+
+### 2. Physics Simulation + SDF Rendering
+Rigid body/particle position data is passed to the Image pass, rendered as geometry using SDF functions:
+- `sdBox(p - bodyPos, bodySize)` renders rigid bodies
+- `length(p - particlePos) - radius` renders particles
+- Suitable for Box2D-lite rigid body engine visualization
+
+### 3. Physics Simulation + Volume Rendering
+3D simulations (e.g., SPH) require a volume rendering pipeline:
+- Density field trilinear interpolation → ray marching → normal computation → lighting
+- Shadows via a separate buffer accumulating optical density along light rays
+- Environment map reflections + Fresnel blending
+
+### 4. Multiple Physics System Coupling
+- **Fluid + Rigid Bodies**: Fluid velocity field drives rigid body motion; rigid body occupancy modifies fluid boundaries
+- **Cloth + Colliders**: Sphere/box shapes for collision detection, cloth elastic response
+- **Particles + Fields**: Particles generate fields (density/vorticity), fields in turn drive particles (SPH / Biot-Savart)
+
+### 5. Physics Simulation + Audio Visualization
+- Bind audio texture via `iChannel`, mapping spectrum energy to external forces or parameters
+- Low frequencies drive large-scale motion, high frequencies drive small-scale vortices/ripples
--- a/skills/shader-dev/reference/sound-synthesis.md
+++ b/skills/shader-dev/reference/sound-synthesis.md
@@ -0,0 +1,578 @@
+# Sound Synthesis — Detailed Reference
+
+This document is a complete reference supplement to [SKILL.md](SKILL.md), covering prerequisites, detailed explanations of each step, in-depth variant descriptions, performance optimization analysis, and complete combination code examples.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: Functions, vector operations, `float`/`vec2` types, math functions like `sin()`/`exp()`/`fract()`
+- **Audio Fundamentals**: Sample rate (typically 44100Hz), frequency-to-pitch relationship, waveform concepts (sine, sawtooth, square)
+- **Music Theory Basics**: MIDI note numbers, equal temperament, octave relationship (frequency doubles), chord construction
+- **ShaderToy Sound Mode**: `vec2 mainSound(int samp, float time)` returns a `vec2` stereo sample value in the range `[-1, 1]`
+
+## Implementation Steps
+
+### Step 1: mainSound Entry Point and Basic Framework
+
+**What**: Establish the standard entry function for a sound shader, outputting a stereo signal.
+
+**Why**: ShaderToy requires the fixed signature `vec2 mainSound(int samp, float time)`, where the return value's `.x` and `.y` are the left and right channels respectively, with a range of `[-1, 1]`. `samp` is the sample index, and `time` is the corresponding time (in seconds).
+
+```glsl
+// ShaderToy sound shader basic framework
+#define TAU 6.28318530718
+#define BPM 120.0                    // Adjustable: tempo
+#define SPB (60.0 / BPM)             // Seconds per beat
+
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+
+    // Layer instruments/tracks here
+    // audio += instrument(time);
+
+    // Master volume control + anti-click fade-in
+    audio *= 0.5 * smoothstep(0.0, 0.5, time);
+
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+### Step 2: MIDI Note to Frequency Conversion
+
+**What**: Convert a MIDI note number to its corresponding frequency value.
+
+**Why**: In equal temperament, each semitone up multiplies the frequency by `2^(1/12)`. MIDI 69 = A4 = 440Hz is the standard reference point. This is the foundation of all melodic synthesis.
+
+```glsl
+// MIDI note number to frequency
+// 69 = A4 = 440Hz, every +12 is one octave (frequency doubles)
+float noteFreq(float note) {
+    return 440.0 * pow(2.0, (note - 69.0) / 12.0);
+}
+```
+
+### Step 3: Basic Oscillators
+
+**What**: Implement four standard waveform generators — sine, sawtooth, square, and triangle waves.
+
+**Why**: Different waveforms have different harmonic characteristics. Sine waves are pure (fundamental only), sawtooth waves are rich in all harmonics (bright), square waves contain only odd harmonics (hollow), and triangle waves have faster harmonic decay (soft). These four are the building blocks of all timbre synthesis.
+
+```glsl
+// Sine wave - pure tone, fundamental only
+float osc_sin(float t) {
+    return sin(TAU * t);
+}
+
+// Sawtooth wave - contains all harmonics, bright and sharp
+float osc_saw(float t) {
+    return fract(t) * 2.0 - 1.0;
+}
+
+// Square wave - odd harmonics only, hollow texture
+float osc_sqr(float t) {
+    return step(fract(t), 0.5) * 2.0 - 1.0;
+}
+
+// Triangle wave - fast harmonic decay, soft and warm
+float osc_tri(float t) {
+    return abs(fract(t) - 0.5) * 4.0 - 1.0;
+}
+```
+
+### Step 4: Additive Synthesis Instrument
+
+**What**: Build a timbre by layering multiple harmonics (integer multiples of the fundamental), each with independent amplitude and decay rate.
+
+**Why**: The timbre of real instruments is determined by their harmonic content (spectrum). Layering 3-8 harmonics with faster decay for higher harmonics can simulate piano, bell, and other timbres. This is the core technique for additive timbre synthesis.
+
+```glsl
+// Additive synthesis instrument
+// freq: fundamental frequency, t: time within note
+// Additive synthesis with harmonic layering
+float instrument_additive(float freq, float t) {
+    float y = 0.0;
+
+    // Layer harmonics: fundamental × 1, 2, 4
+    // Decreasing amplitude + frequency-dependent decay (higher harmonics decay faster)
+    y += 0.50 * sin(TAU * 1.00 * freq * t) * exp(-0.0015 * 1.0 * freq * t);
+    y += 0.30 * sin(TAU * 2.01 * freq * t) * exp(-0.0015 * 2.0 * freq * t);
+    y += 0.20 * sin(TAU * 4.01 * freq * t) * exp(-0.0015 * 4.0 * freq * t);
+
+    // Nonlinear waveshaping to enrich harmonics
+    y += 0.1 * y * y * y;                          // Adjustable: 0.0-0.35, higher = more distortion
+
+    // Tremolo
+    y *= 0.9 + 0.1 * cos(40.0 * t);                // Adjustable: 40.0 = tremolo frequency
+
+    // Smooth attack to avoid clicks
+    y *= smoothstep(0.0, 0.01, t);                  // Adjustable: 0.01 = attack time
+
+    return y;
+}
+```
+
+### Step 5: FM Synthesis Instrument
+
+**What**: Use one oscillator's (modulator) output as the phase offset of another oscillator (carrier) to produce rich harmonics.
+
+**Why**: FM synthesis can generate extremely rich timbres with very few oscillators. Varying modulation depth over time can simulate the "bright→dark" decay characteristic of instruments. Electric pianos and sitar-like timbres are both based on this principle.
+
+```glsl
+// FM synthesis electric piano
+// FM electric piano synthesis
+vec2 fm_epiano(float freq, float t) {
+    // Stereo micro-detuning for chorus effect
+    vec2 f0 = vec2(freq * 0.998, freq * 1.002);    // Adjustable: detune amount
+
+    // "Glass" layer - high-frequency FM, fast decay → metallic attack quality
+    vec2 glass = sin(TAU * (f0 + 3.0) * t
+        + sin(TAU * 14.0 * f0 * t) * exp(-30.0 * t)  // Adjustable: 14.0=mod ratio, -30.0=mod decay
+    ) * exp(-4.0 * t);                                 // Adjustable: -4.0 = glass layer decay
+    glass = sin(glass);                                 // Second-order nonlinearity
+
+    // "Body" layer - low-frequency FM, slow decay → sustained warm tone
+    vec2 body = sin(TAU * f0 * t
+        + sin(TAU * f0 * t) * exp(-0.5 * t) * pow(440.0 / f0.x, 0.5)  // Low-frequency compensation
+    ) * exp(-t);                                        // Adjustable: -1.0 = body decay
+
+    return (glass + body) * smoothstep(0.0, 0.001, t) * 0.1;
+}
+
+// FM synthesis generic instrument (struct-parameterized)
+// FM synthesis generic instrument (struct-parameterized)
+struct Instr {
+    float att;      // Attack speed (higher = faster)
+    float fo;       // Decay rate
+    float vibe;     // Vibrato speed
+    float vphas;    // Vibrato phase
+    float phas;     // FM modulation depth
+    float dtun;     // Detune amount
+};
+
+float fm_instrument(float freq, float t, float beatTime, Instr ins) {
+    float f = freq - beatTime * ins.dtun;
+    float phase = f * t * TAU;
+    float vibrato = cos(beatTime * ins.vibe * 3.14159 / 8.0 + ins.vphas * 1.5708);
+    float fm = sin(phase + vibrato * sin(phase * ins.phas));
+    float env = exp(-beatTime * ins.fo) * (1.0 - exp(-beatTime * ins.att));
+    return fm * env * (1.0 - beatTime * 0.125);
+}
+```
+
+### Step 6: Percussion Synthesis
+
+**What**: Synthesize kick drum, snare/clap, and hi-hat percussion instruments.
+
+**Why**: Percussion is typically composed of pitch sweeps (kick) or noise pulses (hi-hat/clap) with fast envelopes. The kick's core is a sine sweep from high to low frequency; hi-hats are noise with exponential decay. Nearly all complete music shaders require these.
+
+```glsl
+// Pseudo-random hash (replaces noise texture)
+float hash(float p) {
+    p = fract(p * 0.1031);
+    p *= p + 33.33;
+    p *= p + p;
+    return fract(p);
+}
+
+// 909-style kick drum
+// 909-style kick drum synthesis
+float kick(float t) {
+    float df = 512.0;                               // Adjustable: frequency sweep depth
+    float dftime = 0.01;                             // Adjustable: sweep time constant
+    float freq = 60.0;                               // Adjustable: base frequency
+
+    // Exponential frequency sweep: rapidly slides from high to base frequency
+    float phase = TAU * (freq * t - df * dftime * exp(-t / dftime));
+    float body = sin(phase) * smoothstep(0.3, 0.0, t) * 1.5;
+
+    // Transient noise click
+    float click = sin(TAU * 8000.0 * fract(t)) * hash(t * 2000.0)
+                * smoothstep(0.007, 0.0, t);
+
+    return body + click;
+}
+
+// Hi-hat (open / closed)
+// Hi-hat synthesis (open / closed)
+float hihat(float t, float decay) {
+    // decay: 5.0 = open hat (long decay), 15.0 = closed hat (short decay)
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * exp(-decay * t) * smoothstep(0.0, 0.02, t);
+}
+
+// Clap / snare
+float clap(float t) {
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * smoothstep(0.1, 0.0, t);
+}
+```
+
+### Step 7: Note Sequence Arrangement
+
+**What**: Implement melody/chord temporal arrangement, determining which note should play at each moment.
+
+**Why**: Music = timbre × timing. ShaderToy has three mainstream arrangement approaches: (A) D() macro accumulation for handwritten melodies, (B) array lookup for complex arrangements, (C) hash pseudo-random for algorithmic composition.
+
+```glsl
+// === Approach A: D() Macro Accumulation ===
+// Usage: D(duration, MIDI note number) arranged sequentially
+// b = accumulated time, x = current note start time, n = current note
+#define D(duration, note) b += float(duration); if(t > b) { x = b; n = float(note); }
+
+float melody_macro(float time) {
+    float t = time / 0.18;                          // Adjustable: 0.18 = seconds per unit duration
+    float n = 0.0, b = 0.0, x = 0.0;
+
+    D(10,71) D(2,76) D(3,79) D(1,78) D(2,76) D(4,83) D(2,81) D(6,78)
+    // ... continue arranging notes ...
+
+    float freq = noteFreq(n);
+    float noteTime = 0.18 * (t - x);
+    return instrument_additive(freq, noteTime);
+}
+
+// === Approach B: Array Lookup ===
+const float NOTES[16] = float[16](
+    60., 62., 64., 65., 67., 69., 71., 72.,         // Adjustable: note sequence
+    60., 64., 67., 72., 65., 69., 64., 60.
+);
+
+float melody_array(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    int idx = int(mod(beat, 16.0));
+    float noteTime = fract(beat);
+    float freq = noteFreq(NOTES[idx]);
+    return instrument_additive(freq, noteTime * 60.0 / bpm);
+}
+
+// === Approach C: Hash Pseudo-Random ===
+float nse(float x) {
+    return fract(sin(x * 110.082) * 19871.8972);
+}
+
+// Scale quantization: filter out dissonant notes
+float scale_filter(float note) {
+    float n2 = mod(note, 12.0);
+    // Major scale: filter out semitones 1,3,6,8,10
+    if (n2==1.||n2==3.||n2==6.||n2==8.||n2==10.) return -100.0;
+    return note;
+}
+
+float melody_random(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    float seqn = nse(floor(beat));
+    float note = 48.0 + floor(seqn * 24.0);         // Adjustable: 48.0=lowest note, 24.0=range
+    note = scale_filter(note);
+    float freq = noteFreq(note);
+    float noteTime = fract(beat) * 60.0 / bpm;
+    return instrument_additive(freq, noteTime);
+}
+```
+
+### Step 8: Chord Construction
+
+**What**: Layer multiple notes according to chord relationships to form harmony.
+
+**Why**: A chord is a combination of multiple pitches sounding simultaneously. The common structure is root + third + fifth (triad), with added seventh and ninth degrees for jazz chords. Jazz chord progressions can be built this way.
+
+```glsl
+// Chord construction
+vec2 chord(float time, float root, float isMinor) {
+    vec2 result = vec2(0.0);
+    float bass = root - 24.0;                        // Root two octaves lower
+
+    // Root (bass)
+    result += fm_epiano(noteFreq(bass), time, 2.0);
+    // Root
+    result += fm_epiano(noteFreq(root), time - SPB * 0.5, 1.25);
+    // Third (major third = 4 semitones, minor third = 3 semitones)
+    result += fm_epiano(noteFreq(root + 4.0 - isMinor), time - SPB, 1.5);
+    // Fifth
+    result += fm_epiano(noteFreq(root + 7.0), time - SPB * 0.5, 1.25);
+    // Seventh
+    result += fm_epiano(noteFreq(root + 11.0 - isMinor), time - SPB, 1.5);
+    // Ninth
+    result += fm_epiano(noteFreq(root + 14.0), time - SPB, 1.5);
+
+    return result;
+}
+```
+
+### Step 9: Delay and Reverb Effects
+
+**What**: Simulate spatial echo and reverb effects by layering time-offset copies of the audio signal.
+
+**Why**: Dry audio sounds "flat". Multi-tap delay creates spatial depth by layering signal copies at different delays and decay amounts. Ping-pong delay bounces alternately between left and right channels, enhancing stereo width.
+
+```glsl
+// Multi-tap echo/reverb
+// Multi-tap echo/reverb
+// NOTE: in GLSL ES 3.00, "sample" is a reserved word — use "samp" instead
+vec2 echo_reverb(float time) {
+    vec2 tot = vec2(0.0);
+    float hh = 1.0;
+    for (int i = 0; i < 6; i++) {                   // Adjustable: 6 = echo count
+        float h = float(i) / 5.0;
+        float delayedTime = time - 0.7 * h;         // Adjustable: 0.7 = echo interval
+
+        // Call your instrument function to get audio at that time point
+        float samp = get_instrument_sample(delayedTime);
+
+        // Stereo spread: each echo has different L/R ratio
+        tot += samp * vec2(0.5 + 0.1 * h, 0.5 - 0.1 * h) * hh;
+        hh *= 0.5;                                   // Adjustable: 0.5 = decay per echo
+    }
+    return tot;
+}
+
+// Ping-pong stereo delay
+// Ping-pong stereo delay
+vec2 pingpong_delay(float time) {
+    vec2 mx = get_stereo_sample(time) * 0.5;
+    float ec = 0.4;                                  // Adjustable: initial echo volume
+    float fb = 0.6;                                  // Adjustable: feedback decay coefficient
+    float delay_time = 0.222;                        // Adjustable: delay time (seconds)
+    float et = delay_time;
+
+    // 4 alternating left/right ping-pong taps
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += delay_time;
+
+    return mx;
+}
+```
+
+### Step 10: Beat and Arrangement Structure
+
+**What**: Define a time grid using BPM, arrange different instruments at different beat positions, and control the overall song structure (intro, verse, interlude, etc.).
+
+**Why**: The rhythmic skeleton of music is built on a uniform beat grid. Using `floor(time * BPM / 60)` gets the current beat number, and `fract()` gets the position within the beat. `smoothstep` gating controls instrument entry and exit at specific sections.
+
+```glsl
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+
+    float beat = time * BPM / 60.0;                  // Current beat count
+    float bar = beat / 4.0;                           // Current bar (4/4 time)
+    float beatInBar = mod(beat, 4.0);                 // Beat position within bar
+
+    // --- Rhythm layer ---
+    // Kick: trigger every beat
+    float kickTime = mod(time, SPB);
+    audio += vec2(kick(kickTime) * 0.5);
+
+    // Hi-hat: trigger every half beat
+    float hatTime = mod(time, SPB * 0.5);
+    audio += vec2(hihat(hatTime, 15.0) * 0.15);
+
+    // --- Melody layer ---
+    audio += vec2(melody_array(time, BPM)) * 0.3;
+
+    // --- Arrangement automation ---
+    // Use smoothstep to control instrument entry/exit
+    float introFade = smoothstep(0.0, 4.0, bar);     // Fade in over first 4 bars
+    float dropGate = smoothstep(16.0, 16.1, bar);    // Drop at bar 16
+
+    audio *= introFade;
+
+    // Master volume + anti-click
+    audio *= 0.35 * smoothstep(0.0, 0.5, time);
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Subtractive Synthesis / TB-303 Acid Synthesizer
+
+**Difference from basic version**: Instead of building timbre by layering harmonics, generates a harmonic-rich waveform (sawtooth) and then sculpts it with a resonant low-pass filter to remove high frequencies. The filter cutoff frequency is modulated by an envelope, producing the classic "wah" sound.
+
+**Key modified code**:
+
+```glsl
+#define NSPC 128                                    // Adjustable: synthesis harmonic count (higher = better quality)
+
+// Resonant low-pass frequency response
+float lpf_response(float h, float cutoff, float reso) {
+    cutoff -= 20.0;
+    float df = max(h - cutoff, 0.0);
+    float df2 = abs(h - cutoff);
+    return exp(-0.005 * df * df) * 0.5              // Adjustable: -0.005 = rolloff slope
+         + exp(df2 * df2 * -0.1) * reso;            // Adjustable: resonance peak
+}
+
+// TB-303 acid synthesizer
+vec2 acid_synth(float freq, float noteTime) {
+    vec2 v = vec2(0.0);
+    // Envelope-driven filter cutoff frequency
+    float cutoff = exp(noteTime * -1.5) * 50.0      // Adjustable: -1.5=envelope speed, 50.0=sweep range
+                 + 10.0;                             // Adjustable: minimum cutoff
+    float sqr = step(0.5, fract(noteTime * 4.5));   // Sawtooth/square switching
+
+    for (int i = 0; i < NSPC; i++) {
+        float h = float(i + 1);
+        float inten = 1.0 / h;                      // Sawtooth spectrum
+        inten = mix(inten, inten * mod(h, 2.0), sqr); // Square wave variant
+        inten *= lpf_response(h, cutoff, 2.2);
+        v.x += inten * sin((TAU + 0.01) * noteTime * freq * h);
+        v.y += inten * sin(TAU * noteTime * freq * h);
+    }
+    float amp = smoothstep(0.05, 0.0, abs(noteTime - 0.31) - 0.26)
+              * exp(noteTime * -1.0);
+    return clamp(v * amp * 2.0, -1.0, 1.0);
+}
+```
+
+### Variant 2: IIR Biquad Filter
+
+**Difference from basic version**: Uses a time-domain IIR filter based on the Audio EQ Cookbook instead of frequency-domain methods. Supports 7 filter types including low-pass, high-pass, band-pass, notch, peak, and shelf — closer to real hardware. Requires maintaining past sample state.
+
+**Key modified code**:
+
+```glsl
+// Sawtooth oscillator (sample-domain, anti-aliasing friendly)
+float waveSaw(float freq, int samp) {
+    return fract(freq * float(samp) / iSampleRate) * 2.0 - 1.0;
+}
+
+// Stereo widening
+vec2 widerSaw(float freq, int samp) {
+    int offset = int(freq) * 64;                    // Adjustable: 64 = width factor
+    return vec2(waveSaw(freq, samp - offset), waveSaw(freq, samp + offset));
+}
+
+// Biquad low-pass filter coefficient calculation
+void biquadLPF(float freq, float Q, float sr,
+    out float b0, out float b1, out float b2,
+    out float a0, out float a1, out float a2) {
+    float omega = TAU * freq / sr;
+    float sn = sin(omega), cs = cos(omega);
+    float alpha = sn / (2.0 * Q);                   // Adjustable: Q = resonance (0.5-20)
+    b0 = (1.0 - cs) * 0.5;
+    b1 = 1.0 - cs;
+    b2 = (1.0 - cs) * 0.5;
+    a0 = 1.0 + alpha;
+    a1 = -2.0 * cs;
+    a2 = 1.0 - alpha;
+}
+```
+
+### Variant 3: Vocal / Formant Synthesis
+
+**Difference from basic version**: Uses a sinusoidal tract model to simulate the human voice. By setting formants at different frequencies with their bandwidths, vowels can be synthesized. Consonants are implemented through fricative noise.
+
+**Key modified code**:
+
+```glsl
+// Vocal tract formant model
+float tract(float x, float formantFreq, float bandwidth) {
+    return sin(TAU * formantFreq * x)
+         * exp(-bandwidth * 3.14159 * x);
+}
+
+// "Ah" vowel synthesis
+float vowel_aah(float t, float pitch) {
+    float period = 1.0 / pitch;
+    float x = mod(t, period);
+    // Formant frequencies and bandwidths (Hz) — adjustable to simulate different vowels
+    float aud = tract(x, 710.0, 70.0) * 0.5         // F1: 710Hz ('a' vowel)
+              + tract(x, 1000.0, 90.0) * 0.6         // F2: 1000Hz
+              + tract(x, 2450.0, 140.0) * 0.4;       // F3: 2450Hz
+    return aud;
+}
+
+// Fricative consonant noise
+float fricative(float t, float formantFreq) {
+    return (hash11(floor(formantFreq * t) * 20.0) - 0.5) * 3.0;
+}
+```
+
+### Variant 4: Algorithmic Composition (Generative Music)
+
+**Difference from basic version**: Does not use handwritten note sequences; instead uses hash functions to generate pseudo-random melodies, with scale quantization to ensure harmonic consistency. Multi-level rhythmic subdivision (1-beat/2-beat/4-beat) produces fractal-like musical structure.
+
+**Key modified code**:
+
+```glsl
+// 8-note pseudo-random loop
+vec2 noteRing(float n) {
+    float r = 0.5 + 0.5 * fract(sin(mod(floor(n), 32.123) * 32.123) * 41.123);
+    n = mod(n, 8.0);
+    // Adjustable: modify these intervals to change the melodic character
+    float note = n<1.?0. : n<2.?5. : n<3.?-2. : n<4.?4. : n<5.?7. : n<6.?4. : n<7.?2. : 0.;
+    return vec2(note, r);                            // (interval, volume)
+}
+
+// FBM-style layered note generation
+vec2 generativeNote(float beat) {
+    float b0 = floor(beat);
+    float b1 = floor(beat * 0.5);
+    float b2 = floor(beat * 0.25);
+    // Large-scale + medium-scale + small-scale layering
+    vec2 note = noteRing(b2 * 0.0625)
+              + noteRing(b2 * 0.25)
+              + noteRing(b2);
+    return note;
+}
+```
+
+### Variant 5: Chord Progression System (Circle of Fifths)
+
+**Difference from basic version**: Automatically generates harmonic progressions based on the circle of fifths interval. Every 4 beats advances one fifth (+7 semitones), automatically alternating major/minor chords with jazz chord extensions (seventh, ninth).
+
+**Key modified code**:
+
+```glsl
+vec2 mainSound(int samp, float time) {
+    float id = floor(time / SPB / 4.0);             // Current chord number
+    float offset = id * 7.0;                         // Circle of fifths: +7 semitones per step
+    float minor = mod(id, 4.0) >= 3.0 ? 1.0 : 0.0; // Every 4th chord is minor
+    float t = mod(time, SPB * 4.0);
+
+    float root = 57.0 + mod(offset, 12.0);           // Adjustable: 57.0 = starting root (A3)
+    vec2 result = chord(t, root, minor);
+
+    // Two-tap ping-pong delay
+    result += vec2(0.5, 0.2) * chord(t - SPB * 0.5, root, minor);
+    result += vec2(0.05, 0.1) * chord(t - SPB, root, minor);
+
+    return result;
+}
+```
+
+## Performance Optimization Details
+
+1. **Reduce Harmonic Count**: In additive synthesis and frequency-domain filters, the harmonic count (`NUM_HARMONICS` / `NSPC`) is the biggest performance bottleneck. Start with 4-8 harmonics and don't add more once the sound is satisfactory. Using 256 harmonics is an extreme case.
+
+2. **Avoid Sample History in Loops**: IIR filters need to process 128 historical samples, meaning each output sample requires 128 loop iterations. Prefer frequency-domain methods or reduce `PAST_SAMPLES`.
+
+3. **Simplify Echo/Delay**: Each delay tap requires recomputing the complete signal chain. 4 taps means 5x computation. Consider reducing the complexity (fewer harmonics) for delayed signals.
+
+4. **Use `fract()` Instead of `mod()`**: When the divisor is 1.0, `fract(x)` is faster than `mod(x, 1.0)`.
+
+5. **Precompute Constants**: Move loop-invariant expressions like `TAU * freq` outside the loop.
+
+6. **Use the Common Pass**: Place constant definitions and shared functions in ShaderToy's Common tab, accessible by both Sound and Image, avoiding redundant computation of BPM/SPB, etc.
+
+## Combination Suggestions
+
+### 1. Combining with Audio Visualization
+
+Sound shader output can be read in the Image shader via `iChannel0` (set to this shader's Sound output). Use `texture(iChannel0, vec2(freq, 0.0))` to get spectrum data to drive visual effects (waveforms, spectrum bar charts, etc.).
+
+### 2. Combining with Raymarching Scenes
+
+Sound-visual synchronization can be achieved by sharing timeline/cue events. Define shared timeline/cue events in the Common Pass, referenced by both Sound and Image shaders simultaneously, ensuring visual-audio synchronization.
+
+### 3. Combining with Particle Systems
+
+Use beat events (kick trigger moments) to drive particle emission. In the Image shader, use the same BPM/SPB to calculate the current beat position, and increase particle count or velocity at the kick trigger moment.
+
+### 4. Combining with Post-Processing Effects
+
+Share Sound shader envelope values (e.g., sidechain compression coefficient) with the Image shader via the Common Pass, driving bloom intensity, color shifting, screen shake, and other effects.
+
+### 5. Combining with Text/Graphic Overlays
+
+Use `message()` functions in the Image shader to render text hints, parameter displays, or interaction instructions to help users understand what is being played.
--- a/skills/shader-dev/reference/terrain-rendering.md
+++ b/skills/shader-dev/reference/terrain-rendering.md
@@ -0,0 +1,839 @@
+# Heightfield Ray Marching Terrain Rendering — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, complete explanations for each step (what/why), variant details, in-depth performance optimization analysis, and complete code examples for combination suggestions.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions (mix, smoothstep, clamp, fract, floor)
+- **Vector Math**: dot product, cross product, matrix transforms, normal calculation
+- **Basic Ray Marching Concepts**: casting rays from the camera, advancing along rays, detecting intersections
+- **Noise Functions**: basic principles of Value Noise / Gradient Noise (grid sampling + interpolation)
+- **FBM (Fractal Brownian Motion)**: layering multiple noise octaves to build fractal detail
+
+## Implementation Steps
+
+### Step 1: Noise and Hash Functions
+
+**What**: Implement 2D Value Noise, providing the fundamental sampling capability for FBM.
+
+**Why**: Terrain shaders build terrain from noise. Value Noise generates a continuous pseudo-random field through grid point hashing + bilinear interpolation. A rotation-based hash avoids precision issues with `sin()` on some GPUs. Interpolation uses Hermite smoothstep `3t²-2t³` to ensure C¹ continuity.
+
+**Code**:
+```glsl
+// === Hash Function ===
+// High-quality hash without sin
+// Uses fract-dot pattern, avoiding sin() precision issues
+float hash(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 19.19);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// === 2D Value Noise ===
+// Grid sampling + Hermite interpolation, returns [0,1]
+float noise(in vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    vec2 u = f * f * (3.0 - 2.0 * f); // Hermite smoothstep
+
+    float a = hash(i + vec2(0.0, 0.0));
+    float b = hash(i + vec2(1.0, 0.0));
+    float c = hash(i + vec2(0.0, 1.0));
+    float d = hash(i + vec2(1.0, 1.0));
+
+    return mix(mix(a, b, u.x), mix(c, d, u.x), u.y);
+}
+```
+
+### Step 2: Noise with Analytical Derivatives (Advanced)
+
+**What**: Return the noise value along with its analytical partial derivatives `∂n/∂x` and `∂n/∂y`.
+
+**Why**: Analytical derivatives are key to implementing "eroded terrain" — accumulating derivatives in FBM can suppress detail layering on steep slopes (used in Step 3). This technique is widely used in terrain shaders. The derivative formula comes from chain rule differentiation of Hermite interpolation: `du = 6f(1-f)`.
+
+**Code**:
+```glsl
+// === 2D Value Noise with Analytical Derivatives ===
+// Returns vec3: .x = noise value, .yz = partial derivatives (dn/dx, dn/dy)
+vec3 noised(in vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+
+    // Hermite interpolation and its derivative
+    vec2 u  = f * f * (3.0 - 2.0 * f);
+    vec2 du = 6.0 * f * (1.0 - f);
+
+    float a = hash(i + vec2(0.0, 0.0));
+    float b = hash(i + vec2(1.0, 0.0));
+    float c = hash(i + vec2(0.0, 1.0));
+    float d = hash(i + vec2(1.0, 1.0));
+
+    float value = a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+    vec2  deriv = du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx);
+
+    return vec3(value, deriv);
+}
+```
+
+### Step 3: FBM Terrain Heightfield (with Derivative Erosion)
+
+**What**: Layer multiple noise octaves to build a terrain heightfield, using derivative accumulation to simulate erosion effects.
+
+**Why**: FBM is the terrain generation core. The key difference is **whether derivative suppression is used**:
+- **Without derivatives**: simple layering, terrain appears more "rough"
+- **With derivative suppression**: the `1/(1+dot(d,d))` term suppresses high-frequency detail on steep slopes, producing realistic ridge/valley structures
+
+The rotation matrix `m2` rotates sampling coordinates between each layer, breaking axis-aligned visual banding. `mat2(0.8,-0.6, 0.6,0.8)` rotates approximately 37° with unit determinant (pure rotation, no scaling) — a standard choice for terrain FBM.
+
+**Code**:
+```glsl
+#define TERRAIN_OCTAVES 9   // Tunable: 3=rough outline, 9=medium detail, 16=highest precision (for normals)
+#define TERRAIN_SCALE 0.003 // Tunable: controls terrain spatial frequency, smaller = "wider" terrain
+#define TERRAIN_HEIGHT 120.0 // Tunable: terrain elevation scale
+
+// Per-layer rotation matrix: ~37° pure rotation, eliminates axis-aligned banding
+const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);
+
+// === FBM Terrain Heightfield (Derivative Erosion Version) ===
+// Input: 2D world coordinates (xz plane)
+// Output: scalar height value
+float terrain(in vec2 p) {
+    p *= TERRAIN_SCALE;
+
+    float a = 0.0;   // Accumulated height
+    float b = 1.0;   // Current amplitude
+    vec2  d = vec2(0.0); // Accumulated derivatives
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        vec3 n = noised(p);          // .x=value, .yz=derivatives
+        d += n.yz;                    // Accumulate gradient
+        a += b * n.x / (1.0 + dot(d, d)); // Derivative suppression: contribution reduced on steep slopes
+        b *= 0.5;                     // Amplitude halved per layer
+        p = m2 * p * 2.0;            // Rotate + double frequency
+    }
+
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Step 4: LOD Multi-Resolution Terrain Functions
+
+**What**: Create terrain functions at different precision levels for different purposes.
+
+**Why**: This is a classic optimization — ray marching only needs rough height (fewer FBM layers), normal calculation needs detail (more FBM layers), and camera placement only needs the coarsest estimate. A dual-function scheme (coarse for marching, fine for normals) is standard practice in terrain shaders.
+
+**Code**:
+```glsl
+#define OCTAVES_LOW 3     // Tunable: for camera placement, fastest
+#define OCTAVES_MED 9     // Tunable: for ray marching
+#define OCTAVES_HIGH 16   // Tunable: for normal calculation, finest detail
+
+// Low precision (camera height, far distance)
+float terrainL(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_LOW; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+
+// Medium precision (ray marching)
+float terrainM(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_MED; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+
+// High precision (normal calculation)
+float terrainH(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_HIGH; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Step 5: Adaptive Step Size Ray Marching
+
+**What**: Cast rays from the camera and advance along the ray with adaptive steps, finding the intersection with the terrain heightfield.
+
+**Why**: Terrain is a heightfield (not an arbitrary SDF), so `ray.y - terrain(ray.xz)` can be used as a conservative step size estimate. Common terrain shaders employ three strategies:
+- **Conservative factor approach**: `step = 0.4 × h` (conservative factor 0.4, prevents overshooting sharp ridges, 300 steps)
+- **Relaxation marching**: `step = h × max(t×0.02, 1.0)`, step size automatically increases with distance (90 steps covering greater range)
+- **Adaptive marching + binary refinement**: adaptive marching + 5 binary refinement steps (150 steps + precise intersection)
+
+This template uses the conservative factor approach + distance-adaptive precision threshold, balancing accuracy and efficiency.
+
+**Code**:
+```glsl
+#define MAX_STEPS 300       // Tunable: march steps, 80=fast, 300=high quality
+#define MAX_DIST 5000.0     // Tunable: maximum render distance
+#define STEP_FACTOR 0.4     // Tunable: march conservative factor, 0.3=safe, 0.8=aggressive
+
+// === Ray Marching ===
+// ro: ray origin, rd: ray direction (normalized)
+// Returns: intersection distance t (-1.0 means miss)
+float raymarch(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+
+    // Upper bound clipping: skip if ray cannot possibly hit terrain
+    // Assumes terrain max height is TERRAIN_HEIGHT
+    if (ro.y > TERRAIN_HEIGHT && rd.y >= 0.0) return -1.0;
+    if (ro.y > TERRAIN_HEIGHT) {
+        t = (ro.y - TERRAIN_HEIGHT) / (-rd.y); // Fast jump to terrain height upper bound
+    }
+
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float h = pos.y - terrainM(pos.xz); // Height difference = ray y - terrain height
+
+        // Adaptive precision: tolerate larger error at distance (screen-space equivalent)
+        if (abs(h) < 0.0015 * t) break;
+        if (t > MAX_DIST) return -1.0;
+
+        t += STEP_FACTOR * h; // Advance proportionally to height difference
+    }
+
+    return t;
+}
+```
+
+### Step 6: Binary Refinement (Optional)
+
+**What**: Perform binary search near the rough intersection found by ray marching to precisely locate the terrain surface.
+
+**Why**: Ray marching only guarantees the intersection is within some interval; binary search converges the error by 2^5=32x. This is especially important for sharp ridge silhouettes. A similar "step-back-and-halve" strategy is common in terrain shaders.
+
+**Code**:
+```glsl
+#define BISECT_STEPS 5 // Tunable: binary search steps, 5 steps = 32x precision improvement
+
+// === Binary Refinement ===
+// ro: ray origin, rd: ray direction
+// tNear: last t above terrain, tFar: first t below terrain
+float bisect(in vec3 ro, in vec3 rd, float tNear, float tFar) {
+    for (int i = 0; i < BISECT_STEPS; i++) {
+        float tMid = 0.5 * (tNear + tFar);
+        vec3 pos = ro + tMid * rd;
+        float h = pos.y - terrainM(pos.xz);
+        if (h > 0.0) {
+            tNear = tMid; // Still above terrain, advance forward
+        } else {
+            tFar = tMid;  // Below terrain, pull back
+        }
+    }
+    return 0.5 * (tNear + tFar);
+}
+```
+
+### Step 7: Normal Calculation
+
+**What**: Compute terrain surface normals at the intersection point using finite differences.
+
+**Why**: Normals are the foundation of all lighting calculations. A key optimization is **epsilon increasing with distance** — using coarser epsilon at distance avoids aliasing from high-frequency noise. The high-precision terrain function `terrainH` is used here for normal detail.
+
+**Code**:
+```glsl
+// === Normal Calculation (Finite Differences) ===
+// pos: surface intersection point, t: distance (for adaptive epsilon)
+vec3 calcNormal(in vec3 pos, float t) {
+    // Adaptive epsilon: fine up close, coarse at distance (avoids aliasing)
+    float eps = 0.02 + 0.00005 * t * t;
+
+    float hC = terrainH(pos.xz);
+    float hR = terrainH(pos.xz + vec2(eps, 0.0));
+    float hU = terrainH(pos.xz + vec2(0.0, eps));
+
+    // Finite difference normal
+    return normalize(vec3(hC - hR, eps, hC - hU));
+}
+```
+
+### Step 8: Material and Color Assignment
+
+**What**: Blend different material colors based on height, slope, noise, and other conditions.
+
+**Why**: Natural terrain color layering is key to visual convincingness. Nearly all terrain shaders follow this layering logic:
+- **Rock**: steep surfaces (small normal y component) → gray rock
+- **Grass**: flat low-altitude surfaces → green
+- **Snow**: high-altitude flat surfaces → white
+- **Sand**: near water level → sand color
+
+Use `smoothstep` for smooth transitions between layers and FBM noise to break up transition line regularity.
+
+**Code**:
+```glsl
+#define SNOW_HEIGHT 80.0     // Tunable: snow line altitude
+#define TREE_HEIGHT 45.0     // Tunable: tree line altitude
+#define BEACH_HEIGHT 1.5     // Tunable: beach height
+
+// === Material Color ===
+// pos: world coordinates, nor: normal
+vec3 getMaterial(in vec3 pos, in vec3 nor) {
+    // Slope factor: nor.y=1 means horizontal, nor.y=0 means vertical
+    float slope = nor.y;
+    float h = pos.y;
+
+    // Noise to break up transition lines
+    float nz = noise(pos.xz * 0.04) * noise(pos.xz * 0.005);
+
+    // Base rock color
+    vec3 rock = vec3(0.10, 0.09, 0.08);
+
+    // Dirt/grass color (flat surfaces)
+    vec3 grass = mix(vec3(0.10, 0.08, 0.04), vec3(0.05, 0.09, 0.02), nz);
+
+    // Snow color
+    vec3 snow = vec3(0.62, 0.65, 0.70);
+
+    // Sand color
+    vec3 sand = vec3(0.50, 0.45, 0.35);
+
+    // --- Layered blending ---
+    vec3 col = rock;
+
+    // Flat areas: rock → grass
+    col = mix(col, grass, smoothstep(0.5, 0.8, slope));
+
+    // High altitude: → snow (slope + height + noise)
+    float snowMask = smoothstep(SNOW_HEIGHT - 20.0 * nz, SNOW_HEIGHT + 10.0, h)
+                   * smoothstep(0.3, 0.7, slope);
+    col = mix(col, snow, snowMask);
+
+    // Low altitude: → sand
+    float beachMask = smoothstep(BEACH_HEIGHT + 1.0, BEACH_HEIGHT - 0.5, h)
+                    * smoothstep(0.5, 0.9, slope);
+    col = mix(col, sand, beachMask);
+
+    return col;
+}
+```
+
+### Step 9: Lighting Model
+
+**What**: Implement multi-component lighting: sun diffuse + hemisphere ambient light + backlight fill + specular.
+
+**Why**: Terrain lighting models share consistent core components:
+- **Lambert Diffuse**: `dot(N, L)` — fundamental component
+- **Hemisphere Ambient**: `0.5 + 0.5 * N.y` — standard terrain ambient lighting
+- **Backlight**: fill light from the horizontal direction opposite the sun
+- **Fresnel Rim Light**: `pow(1+dot(rd,N), 2~5)` — edge glow effect
+- **Specular**: Phong/Blinn-Phong, power ranging from 3 to 500
+
+**Code**:
+```glsl
+#define SUN_DIR normalize(vec3(0.8, 0.4, -0.6)) // Tunable: sun direction
+#define SUN_COL vec3(8.0, 5.0, 3.0)              // Tunable: sun color temperature (warm light)
+#define SKY_COL vec3(0.5, 0.7, 1.0)              // Tunable: sky color
+
+// === Lighting Calculation ===
+vec3 calcLighting(in vec3 pos, in vec3 nor, in vec3 rd, float shadow) {
+    vec3 sunDir = SUN_DIR;
+
+    // Diffuse (Lambert)
+    float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+
+    // Hemisphere ambient: facing up=full brightness, facing down=half brightness
+    float amb = 0.5 + 0.5 * nor.y;
+
+    // Backlight fill (horizontal direction opposite the sun)
+    vec3 backDir = normalize(vec3(-sunDir.x, 0.0, -sunDir.z));
+    float bac = clamp(0.2 + 0.8 * dot(nor, backDir), 0.0, 1.0);
+
+    // Fresnel rim light
+    float fre = pow(clamp(1.0 + dot(rd, nor), 0.0, 1.0), 2.0);
+
+    // Specular (Blinn-Phong)
+    vec3 hal = normalize(sunDir - rd);
+    float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0)
+              * (0.04 + 0.96 * pow(1.0 + dot(hal, rd), 5.0)); // Fresnel term
+
+    // Combine
+    vec3 lin = vec3(0.0);
+    lin += dif * shadow * SUN_COL * 0.1;          // Sun diffuse
+    lin += amb * SKY_COL * 0.2;                    // Sky ambient
+    lin += bac * vec3(0.15, 0.05, 0.04);           // Backlight (warm tone)
+    lin += fre * SKY_COL * 0.3;                    // Rim light
+    lin += spe * shadow * SUN_COL * 0.05;          // Specular
+
+    return lin;
+}
+```
+
+### Step 10: Soft Shadows
+
+**What**: Cast a shadow ray from the surface intersection point toward the sun, computing soft shadows with penumbra.
+
+**Why**: Soft shadows greatly enhance terrain spatial depth. The classic technique — during shadow ray marching, track `min(k*h/t)`, where h is the height distance from the terrain and t is the march distance. A smaller ratio = the ray grazes the terrain surface = penumbra region. The k parameter controls penumbra softness (k=16 for soft, k=64 for hard).
+
+**Code**:
+```glsl
+#define SHADOW_STEPS 80     // Tunable: shadow ray steps, 32=fast, 80=high quality
+#define SHADOW_K 16.0       // Tunable: penumbra softness, 8=very soft, 64=very hard
+
+// === Soft Shadows ===
+// pos: surface point, sunDir: sun direction
+float calcShadow(in vec3 pos, in vec3 sunDir) {
+    float res = 1.0;
+    float t = 1.0; // Start slightly above the surface to avoid self-intersection
+
+    for (int i = 0; i < SHADOW_STEPS; i++) {
+        vec3 p = pos + t * sunDir;
+        float h = p.y - terrainM(p.xz);
+
+        if (h < 0.001) return 0.0; // Full shadow
+
+        // Penumbra estimate: smaller h/t = ray closer to occlusion
+        res = min(res, SHADOW_K * h / t);
+        t += clamp(h, 2.0, 100.0); // Adaptive step size
+    }
+
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### Step 11: Aerial Perspective and Fog
+
+**What**: Blend terrain color toward fog color with increasing distance, achieving an aerial perspective effect.
+
+**Why**: Atmospheric effects are the key visual cue for "pushing" pixels into the distance. Common approaches range from simple to complex:
+- **Exponential fog**: `exp(-0.00005 * t^2)` — simplest
+- **Exponential + height-decay fog**: `exp(-pow(k*t, 1.5))` — denser at low altitude, thinner at high altitude
+- **Wavelength-dependent fog**: `exp(-t * vec3(1,1.5,4) * k)` — blue light attenuates faster, red light travels further, realistic atmospheric dispersion
+- **Full Rayleigh+Mie scattering**: physically accurate but expensive
+
+**Code**:
+```glsl
+#define FOG_DENSITY 0.00025  // Tunable: fog density
+#define FOG_HEIGHT 0.001     // Tunable: height decay coefficient (0=no height dependency)
+
+// === Atmospheric Fog ===
+// col: original color, t: distance, rd: ray direction
+vec3 applyFog(in vec3 col, float t, in vec3 rd) {
+    // Wavelength-dependent attenuation: blue attenuates 4x faster than red
+    vec3 extinction = exp(-t * FOG_DENSITY * vec3(1.0, 1.5, 4.0));
+
+    // Fog color: base blue-gray + sun direction scattering (warm tones)
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    vec3 fogCol = mix(vec3(0.55, 0.55, 0.58),         // Base fog color
+                      vec3(1.0, 0.7, 0.3),              // Sun scatter color
+                      0.3 * pow(sundot, 8.0));
+
+    return col * extinction + fogCol * (1.0 - extinction);
+}
+```
+
+### Step 12: Sky Rendering
+
+**What**: Draw the background sky, including gradients, sun disk, and horizon glow.
+
+**Why**: The sky is an important component of atmospheric mood. All terrain shaders with 3D viewpoints include sky rendering. Key components:
+- Zenith-to-horizon blue→white gradient
+- Horizon glow band (`pow(1-rd.y, n)` family)
+- Sun disk and halo (`pow(sundot, high power)` family)
+
+**Code**:
+```glsl
+// === Sky Color ===
+vec3 getSky(in vec3 rd) {
+    // Base sky gradient: zenith blue → horizon white
+    vec3 col = vec3(0.3, 0.5, 0.85) - rd.y * vec3(0.2, 0.15, 0.0);
+
+    // Horizon glow
+    float horizon = pow(1.0 - max(rd.y, 0.0), 4.0);
+    col = mix(col, vec3(0.8, 0.75, 0.7), 0.5 * horizon);
+
+    // Sun
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    col += vec3(1.0, 0.7, 0.3) * 0.3 * pow(sundot, 8.0);   // Large halo
+    col += vec3(1.0, 0.9, 0.7) * 0.5 * pow(sundot, 64.0);   // Small halo
+    col += vec3(1.0, 1.0, 0.9) * min(pow(sundot, 1150.0), 0.3); // Sun disk
+
+    return col;
+}
+```
+
+### Step 13: Camera Setup
+
+**What**: Build a Look-At camera matrix and define a flight path.
+
+**Why**: Terrain flythrough cameras typically follow Lissajous curves or arc paths, with altitude following the terrain. The Look-At matrix maps screen coordinates to world-space ray directions.
+
+**Code**:
+```glsl
+#define CAM_ALTITUDE 20.0   // Tunable: camera height above ground
+#define CAM_SPEED 0.5       // Tunable: flight speed
+
+// === Camera Path ===
+vec3 cameraPath(float t) {
+    return vec3(
+        100.0 * sin(0.2 * t),  // x: sine curve
+        0.0,                     // y: determined by terrain height
+        -100.0 * t               // z: forward direction
+    );
+}
+
+// === Camera Matrix ===
+mat3 setCamera(in vec3 ro, in vec3 ta) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0.0, 1.0, 0.0)));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+```
+
+## Common Variants
+
+### Variant 1: Relaxation Marching
+
+**Difference from the base version**: Step size automatically increases with distance, covering greater range but with slightly reduced precision. The conservative factor is replaced with a distance-adaptive relaxation factor, while the height estimate is scaled down to prevent penetration.
+
+**Key code**:
+```glsl
+#define RELAX_MAX_STEPS 90       // Fewer steps needed to cover greater distance
+#define RELAX_FAR 400.0
+
+float raymarchRelax(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+    float d = (ro + rd * t).y - terrainM((ro + rd * t).xz);
+
+    for (int i = 0; i < RELAX_MAX_STEPS; i++) {
+        if (abs(d) < t * 0.0001 || t > RELAX_FAR) break;
+
+        float rl = max(t * 0.02, 1.0); // Relaxation factor: larger steps at distance
+        t += d * rl;
+        vec3 pos = ro + rd * t;
+        d = (pos.y - terrainM(pos.xz)) * 0.7; // 0.7 attenuation prevents penetration
+    }
+    return t;
+}
+```
+
+### Variant 2: Sign-Alternating FBM
+
+**Difference from the base version**: Flips the amplitude sign each layer (`w = -w * 0.4`), producing unique alternating ridge/valley patterns. Does not use derivative suppression — the style is distinctly different from the erosion version, producing a more "jagged and twisted" appearance.
+
+**Key code**:
+```glsl
+float terrainSignFlip(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0;
+    float w = 1.0; // Initial weight
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        a += w * noise(p);
+        w = -w * 0.4;    // Sign flip + decay: alternating addition and subtraction
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Variant 3: Texture-Driven Heightfield + 3D Displacement
+
+**Difference from the base version**: Uses texture sampling as the base heightfield, with 3D FBM displacement layered on top to produce cliffs, caves, and other non-heightfield formations. Requires additional texture channel inputs but can create far more terrain diversity than pure FBM. Marching becomes true SDF sphere tracing.
+
+**Key code**:
+```glsl
+// 3D Value Noise
+float noise3D(in vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    // 3D→2D flattening: offset UV by p.z, sample two texture layers and interpolate
+    vec2 uv = (p.xy + vec2(37.0, 17.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+// 3D FBM Displacement
+const mat3 m3 = mat3(0.00, 0.80, 0.60,
+                    -0.80, 0.36,-0.48,
+                    -0.60,-0.48, 0.64);
+
+float displacement(vec3 p) {
+    float f = 0.5 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.25 * noise3D(p);    p = m3 * p * 2.03;
+    f += 0.125 * noise3D(p);   p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p);
+    return f;
+}
+
+// SDF: heightfield + 3D displacement (supports cliffs/caves)
+float mapCanyon(vec3 p) {
+    float h = terrainM(p.xz);
+    float dis = displacement(0.25 * p * vec3(1.0, 4.0, 1.0)) * 3.0;
+    return (dis + p.y - h) * 0.25;
+}
+```
+
+### Variant 4: Directional Erosion Noise
+
+**Difference from the base version**: Uses slope direction as the projection direction for Gabor noise. Each erosion layer adjusts the "water flow direction" based on the previous layer's derivatives, producing realistic dendritic drainage patterns. Requires multi-pass height map precomputation.
+
+**Key code**:
+```glsl
+#define EROSION_OCTAVES 5
+#define EROSION_BRANCH 1.5 // Tunable: branching strength, 0=parallel, 2=strong branching
+
+// Directional Gabor noise
+vec3 erosionNoise(vec2 p, vec2 dir) {
+    vec2 ip = floor(p); vec2 fp = fract(p) - 0.5;
+    float va = 0.0; float wt = 0.0;
+    vec2 dva = vec2(0.0);
+
+    for (int i = -2; i <= 1; i++)
+    for (int j = -2; j <= 1; j++) {
+        vec2 o = vec2(float(i), float(j));
+        vec2 h = hash2(ip - o) * 0.5; // Grid point random offset
+        vec2 pp = fp + o + h;
+        float d = dot(pp, pp);
+        float w = exp(-d * 2.0);       // Gaussian weight
+        float mag = dot(pp, dir);       // Directional projection
+        va += cos(mag * 6.283) * w;     // Directional ripple
+        dva += -sin(mag * 6.283) * dir * w;
+        wt += w;
+    }
+    return vec3(va, dva) / wt;
+}
+
+// Erosion FBM: direction evolves with slope
+float terrainErosion(vec2 p, vec2 baseSlope) {
+    float e = 0.0, a = 0.5;
+    vec2 dir = normalize(baseSlope + vec2(0.001));
+
+    for (int i = 0; i < EROSION_OCTAVES; i++) {
+        vec3 n = erosionNoise(p * 4.0, dir);
+        e += a * n.x;
+        // Branching: curl of previous layer's derivative modifies water flow direction
+        dir = normalize(dir + n.zy * vec2(1.0, -1.0) * EROSION_BRANCH);
+        a *= 0.5;
+        p *= 2.0;
+    }
+    return e;
+}
+```
+
+### Variant 5: Volumetric Clouds + God Rays
+
+**Difference from the base version**: Adds a volumetric cloud layer above the terrain using front-to-back alpha compositing, with god ray factor accumulated during marching. Requires 3D noise and more steps, significantly increasing cost but with excellent visual results.
+
+**Key code**:
+```glsl
+#define CLOUD_STEPS 64        // Tunable: cloud march steps
+#define CLOUD_BASE 200.0      // Tunable: cloud layer base height
+#define CLOUD_TOP 300.0       // Tunable: cloud layer top height
+
+vec4 raymarchClouds(vec3 ro, vec3 rd) {
+    // Calculate intersections with cloud slab
+    float tmin = (CLOUD_BASE - ro.y) / rd.y;
+    float tmax = (CLOUD_TOP  - ro.y) / rd.y;
+    if (tmin > tmax) { float tmp = tmin; tmin = tmp; tmax = tmp; } // swap
+    if (tmin < 0.0) tmin = 0.0;
+
+    float t = tmin;
+    vec4 sum = vec4(0.0); // rgb=color, a=opacity
+    float rays = 0.0;     // God ray accumulation
+
+    for (int i = 0; i < CLOUD_STEPS; i++) {
+        if (sum.a > 0.99 || t > tmax) break;
+        vec3 pos = ro + t * rd;
+
+        // Cloud density: slab shape × FBM carving
+        float hFrac = (pos.y - CLOUD_BASE) / (CLOUD_TOP - CLOUD_BASE);
+        float shape = 1.0 - 2.0 * abs(hFrac - 0.5); // Densest in the middle
+        float den = shape - 1.6 * (1.0 - noise(pos.xz * 0.01)); // Simplified FBM
+
+        if (den > 0.0) {
+            // Cloud lighting: offset sample toward sun direction (self-shadowing)
+            float shadowDen = shape - 1.6 * (1.0 - noise((pos.xz + SUN_DIR.xz * 30.0) * 0.01));
+            float shadow = clamp(1.0 - shadowDen * 2.0, 0.0, 1.0);
+
+            vec3 cloudCol = mix(vec3(0.4, 0.4, 0.45), vec3(1.0, 0.95, 0.8), shadow);
+            float alpha = clamp(den * 0.4, 0.0, 1.0);
+
+            // God rays: brightness of sunlight passing through thin areas
+            rays += 0.02 * shadow * (1.0 - sum.a);
+
+            // Front-to-back compositing
+            cloudCol *= alpha;
+            sum += vec4(cloudCol, alpha) * (1.0 - sum.a);
+        }
+
+        float dt = max(0.5, 0.05 * t);
+        t += dt;
+    }
+
+    // Add god rays to color
+    sum.rgb += pow(rays, 3.0) * 0.4 * vec3(1.0, 0.8, 0.7);
+    return sum;
+}
+```
+
+## In-Depth Performance Optimization
+
+### 1. LOD Layering (Most Important Optimization)
+**Bottleneck**: Each FBM layer requires an independent noise sample; octave count is a direct performance multiplier.
+**Optimization**: Use low octaves for ray marching (3-9 layers), high octaves for normal calculation (16 layers), and lowest for camera placement (3 layers). This is standard practice in terrain shaders.
+
+### 2. Upper Bound Clipping (Bounding Plane)
+**Bottleneck**: Rays waste iterations stepping through open air.
+**Optimization**: Precompute the maximum terrain height and intersect the ray with that plane before starting to march.
+```glsl
+if (ro.y > maxHeight && rd.y >= 0.0) return -1.0; // Skip entirely
+t = (ro.y - maxHeight) / (-rd.y); // Jump to upper bound
+```
+
+### 3. Adaptive Precision Threshold
+**Bottleneck**: Distant pixels still use near-field precision, wasting iterations.
+**Optimization**: Hit threshold grows with distance: `abs(h) < 0.001 * t`. This is common practice, with the coefficient typically ranging from 0.0001 to 0.002.
+
+### 4. Texture Instead of Procedural Noise
+**Bottleneck**: Procedural noise requires multiple hash and interpolation operations.
+**Optimization**: Pre-bake a 256x256 noise texture and sample with `textureLod`. Provides approximately 2-3x speedup over procedural noise.
+
+### 5. Early Exit
+**Bottleneck**: Rays continue iterating after exceeding range.
+**Optimization**:
+- `t > MAX_DIST` break out
+- `alpha > 0.99` break out in volumetric rendering
+- `h < 0` immediately return 0 in shadow rays
+
+### 6. Jittered Start
+**Bottleneck**: Uniform stepping produces visible banding artifacts.
+**Optimization**: Add per-pixel random offset to the starting t: `t += hash(fragCoord) * step_size`. Adds no computational cost but significantly improves visual quality.
+
+## Complete Combination Code Examples
+
+### 1. Terrain + Water Surface
+The most common terrain rendering combination. The water surface serves as a fixed y-plane — march the terrain first, and if the ray intersects terrain below the water surface, render underwater effects; otherwise render water surface reflection/refraction.
+- Key: Water surface normals use multi-frequency noise perturbation to simulate waves; Fresnel controls reflection/refraction mixing
+
+```glsl
+#define WATER_LEVEL 5.0
+
+// Water surface normal (multi-frequency noise perturbation)
+vec3 waterNormal(vec2 p, float t) {
+    float eps = 0.1;
+    float h0 = noise(p * 0.5 + iTime * 0.3) * 0.5
+             + noise(p * 1.5 - iTime * 0.2) * 0.25;
+    float hx = noise((p + vec2(eps, 0.0)) * 0.5 + iTime * 0.3) * 0.5
+             + noise((p + vec2(eps, 0.0)) * 1.5 - iTime * 0.2) * 0.25;
+    float hz = noise((p + vec2(0.0, eps)) * 0.5 + iTime * 0.3) * 0.5
+             + noise((p + vec2(0.0, eps)) * 1.5 - iTime * 0.2) * 0.25;
+    return normalize(vec3(h0 - hx, eps, h0 - hz));
+}
+
+// In the main function:
+// 1. Check water surface intersection first
+float tWater = (ro.y - WATER_LEVEL) / (-rd.y);
+// 2. Compare with terrain intersection
+float tTerrain = raymarch(ro, rd);
+
+vec3 col;
+if (tWater > 0.0 && (tTerrain < 0.0 || tWater < tTerrain)) {
+    // Hit water surface
+    vec3 wpos = ro + tWater * rd;
+    vec3 wnor = waterNormal(wpos.xz, tWater);
+
+    // Fresnel
+    float fresnel = pow(1.0 - max(dot(-rd, wnor), 0.0), 5.0);
+    fresnel = 0.02 + 0.98 * fresnel;
+
+    // Reflection
+    vec3 refl = reflect(rd, wnor);
+    vec3 reflCol = getSky(refl);
+
+    // Underwater color
+    vec3 waterCol = vec3(0.0, 0.04, 0.04);
+
+    col = mix(waterCol, reflCol, fresnel);
+    col = applyFog(col, tWater, rd);
+} else if (tTerrain > 0.0) {
+    // Hit terrain (same as original code)
+    // ...
+}
+```
+
+### 2. Terrain + Volumetric Clouds
+Render the terrain first to get color and depth, then march the cloud slab along the ray, compositing onto the terrain using front-to-back alpha blending.
+- Key: Cloud self-shadowing (offset sampling toward light direction), god ray accumulation
+
+```glsl
+// In the main function:
+vec3 col;
+float t = raymarch(ro, rd);
+
+if (t > 0.0) {
+    // Render terrain...
+    vec3 pos = ro + t * rd;
+    vec3 nor = calcNormal(pos, t);
+    vec3 mate = getMaterial(pos, nor);
+    float sha = calcShadow(pos + nor * 0.5, SUN_DIR);
+    vec3 lin = calcLighting(pos, nor, rd, sha);
+    col = mate * lin;
+    col = applyFog(col, t, rd);
+} else {
+    col = getSky(rd);
+}
+
+// Overlay volumetric clouds
+vec4 clouds = raymarchClouds(ro, rd);
+col = col * (1.0 - clouds.a) + clouds.rgb;
+```
+
+### 3. Terrain + Volumetric Fog/Dust
+Volumetric dust fog can be added after the main marching is complete, additionally sample a 3D FBM density field along the ray with distance-based attenuation. Suitable for desert, volcanic, and similar scenes.
+- Key: Step size adapts to density — smaller steps in dense regions
+
+### 4. Terrain + SDF Object Placement
+SDF ellipsoids can be placed as trees on the terrain. Terrain marching and object marching can be separated or combined. Objects are placed on a 2D grid with hash-based jitter.
+- Key: `floor(p.xz/gridSize)` determines the grid cell, `hash(cell)` determines tree position/size
+
+```glsl
+#define TREE_GRID 30.0
+
+// Place tree SDFs in a grid
+float mapTrees(vec3 p) {
+    vec2 cell = floor(p.xz / TREE_GRID);
+    vec2 cellCenter = (cell + 0.5) * TREE_GRID;
+
+    // Hash to randomize position
+    vec2 jitter = (hash2(cell) - 0.5) * TREE_GRID * 0.6;
+    vec2 treePos = cellCenter + jitter;
+
+    // Tree trunk height
+    float groundH = terrainL(treePos);
+
+    // SDF: ellipsoid tree canopy
+    vec3 treeCenter = vec3(treePos.x, groundH + 8.0, treePos.y);
+    float treeSize = 4.0 + hash(cell) * 3.0;
+    vec3 q = (p - treeCenter) / vec3(treeSize, treeSize * 1.5, treeSize);
+    return (length(q) - 1.0) * treeSize * 0.8;
+}
+```
+
+### 5. Terrain + Temporal Anti-Aliasing (TAA)
+Inter-frame reprojection blending can be used for temporal anti-aliasing. The current frame's camera matrix is stored in buffer pixels, and the next frame uses it to reproject 3D points back to the previous frame's screen coordinates, blending historical colors.
+- Key: blend ratio ~10% new frame + 90% history frame, with increased new frame weight in motion areas
--- a/skills/shader-dev/reference/texture-mapping-advanced.md
+++ b/skills/shader-dev/reference/texture-mapping-advanced.md
@@ -0,0 +1,87 @@
+# Advanced Texture Mapping Detailed Reference
+
+## Prerequisites
+- Screen-space derivatives (`dFdx`, `dFdy`)
+- `textureGrad()` function usage
+- Basic ray marching
+
+## Triplanar vs Biplanar Cost Analysis
+
+| Aspect | Triplanar | Biplanar |
+|--------|-----------|----------|
+| Texture fetches | 3 | 2 |
+| ALU operations | Lower | Higher (axis selection) |
+| Bandwidth | Higher | Lower |
+| Visual quality | Baseline | Equivalent (k≥8) |
+| Best for | Bandwidth-rich GPUs | Mobile, bandwidth-limited |
+
+Modern GPUs are typically bandwidth-limited rather than ALU-limited, making biplanar the better default choice.
+
+### Weight Remapping Mathematics
+
+The biplanar weight formula `clamp((w - 0.5773) / (1.0 - 0.5773), 0, 1)` ensures:
+- At normals aligned with one axis: weight = 1.0 (clean projection)
+- At 45° diagonals where 2 axes are equal: smooth transition
+- At the cube diagonal (1/√3 ≈ 0.5773): weight = 0.0, but this is the point where the third (discarded) projection would be needed — biplanar's approximation error is maximal here but visually acceptable
+
+### Gradient Propagation
+
+Using `textureGrad()` instead of `texture()` is essential because:
+1. Axis selection (`ma`, `me`) creates UV discontinuities at projection boundaries
+2. Hardware `texture()` computes mip from implicit derivatives, which spike at discontinuities → visible seams
+3. `textureGrad()` with manually propagated `dFdx(p)`, `dFdy(p)` bypasses this, keeping gradients smooth across boundaries
+
+## Ray Differential Mathematics
+
+### Problem Statement
+In rasterization, `dFdx`/`dFdy` of texture coordinates work naturally because adjacent pixels map to nearby surface points. In ray marching, adjacent pixels may hit completely different objects → broken mip selection.
+
+### Solution: Tangent Plane Intersection
+
+Given:
+- Primary ray hits surface at `pos` with normal `nor`
+- Neighbor pixel ray `rd_neighbor` originates from `ro_neighbor`
+
+The neighbor ray's intersection with the tangent plane at `pos`:
+```
+t_neighbor = dot(pos - ro_neighbor, nor) / dot(rd_neighbor, nor)
+pos_neighbor = ro_neighbor + rd_neighbor * t_neighbor
+```
+
+The difference `pos_neighbor - pos` gives the world-space footprint of one pixel at the hit point.
+
+### For Perspective Cameras (Common Case)
+```
+ro is the same for all pixels, only rd varies:
+dposdx = t * (rdx * dot(rd, nor) / dot(rdx, nor) - rd)
+dposdy = t * (rdy * dot(rd, nor) / dot(rdy, nor) - rd)
+```
+Where `rdx = rd + dFdx(rd)` and `rdy = rd + dFdy(rd)`.
+
+### Chain Rule for Texture Coordinates
+If texture mapping function is `uv = f(pos)`:
+```
+duvdx = Jacobian(f) × dposdx
+duvdy = Jacobian(f) × dposdy
+```
+For simple planar mapping `uv = pos.xz`:
+```
+duvdx = dposdx.xz
+duvdy = dposdy.xz
+```
+
+## Texture Repetition Theory
+
+### Why Tiling is Visible
+Human vision excels at detecting:
+1. **Periodic patterns**: Regular grid alignment
+2. **Unique features**: Distinctive spots/marks that repeat identically
+3. **Phase alignment**: All tiles start at the same phase
+
+### Breaking Repetition
+Each method targets different cues:
+- **Random offset** (Method A): Breaks phase alignment, 4 fetches
+- **Voronoi blend**: Breaks grid structure entirely, 9 fetches (expensive)
+- **Virtual pattern** (Method B): Breaks unique features cheaply, 2 fetches
+
+Method B is preferred for real-time use — the low-frequency index variation is cache-friendly and the two texture fetches share locality.
--- a/skills/shader-dev/reference/texture-sampling.md
+++ b/skills/shader-dev/reference/texture-sampling.md
@@ -0,0 +1,553 @@
+# Texture Sampling Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, variant details, and complete combination code examples.
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: `vec2`/`vec3`/`vec4`, `uniform sampler2D`, and other types and declarations
+- **UV Coordinate System**: `fragCoord / iResolution.xy` normalizes to `[0,1]`, with origin at the bottom-left corner
+- **Mipmap Concept**: A multi-resolution pyramid of the texture, with each level at half the resolution. The GPU automatically selects the appropriate level based on screen-space derivatives to avoid aliasing
+- **ShaderToy Multi-Pass Architecture**: Image pass is the final output, Buffer A/B/C/D are intermediate computation passes, bound to textures or buffers via `iChannel0~3`
+
+## Implementation Steps
+
+### Step 1: Basic Texture Sampling and UV Normalization
+
+**What**: Convert screen pixel coordinates to UV coordinates and read texture data.
+
+**Why**: `texture()` accepts UV coordinates in the `[0,1]` range. ShaderToy provides pixel coordinates `fragCoord`, which need to be normalized by dividing by the resolution.
+
+```glsl
+// Normalize UV
+vec2 uv = fragCoord / iResolution.xy;
+
+// Basic texture sampling (hardware bilinear filtering)
+vec4 col = texture(iChannel0, uv);
+```
+
+Hardware bilinear filtering automatically performs linear interpolation between the nearest 4 texels. When the UV lands exactly at a texel center, the exact value is returned; when it falls between texels, a weighted average of the surrounding four points is returned.
+
+### Step 2: Using textureLod to Control Mipmap Level
+
+**What**: Explicitly specify the LOD level to control sampling resolution, achieving blur or avoiding automatic mip selection in ray marching.
+
+**Why**: In ray marching, the GPU cannot correctly estimate screen-space derivatives, which leads to incorrect mip level selection and artifacts. Using `textureLod(..., 0.0)` forces sampling at the highest resolution level; using higher LOD values produces blur effects (e.g., depth of field, bloom).
+
+Physical meaning of LOD values:
+- `lod = 0.0`: Original resolution (mip 0)
+- `lod = 1.0`: Half resolution (mip 1), equivalent to a 2x2 area average
+- `lod = N`: Resolution is 1/2^N of the original
+
+```glsl
+// In ray marching: force LOD 0 to avoid artifacts (from Campfire at night)
+vec3 groundCol = textureLod(iChannel2, groundUv * 0.05, 0.0).rgb;
+
+// Depth of field blur: LOD varies with distance (from Heartfelt)
+float focus = mix(maxBlur - coverage, minBlur, smoothstep(.1, .2, coverage));
+vec3 col = textureLod(iChannel0, uv + normal, focus).rgb;
+
+// Bloom: explicitly sample high mip levels (from Campfire at night)
+#define BLOOM_LOD_A 4.0  // Adjustable: bloom first layer mip level
+#define BLOOM_LOD_B 5.0  // Adjustable: bloom second layer mip level
+#define BLOOM_LOD_C 6.0  // Adjustable: bloom third layer mip level
+vec3 bloom = vec3(0.0);
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_A), BLOOM_LOD_A).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_B), BLOOM_LOD_B).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_C), BLOOM_LOD_C).rgb;
+bloom /= 3.0;
+```
+
+### Step 3: Using texelFetch for Exact Pixel Data Access
+
+**What**: Read the value of a specific texel using integer coordinates, bypassing all filtering.
+
+**Why**: When textures are used as data storage (game state, precomputed LUTs, keyboard input), exact values of specific pixels must be read — hardware filtering would corrupt data integrity. `texelFetch` uses `ivec2` integer coordinates instead of `vec2` float UVs, accessing pixels directly by address, similar to array indexing.
+
+```glsl
+// Define data storage addresses (from Bricks Game)
+const ivec2 txBallPosVel = ivec2(0, 0);
+const ivec2 txPaddlePos  = ivec2(1, 0);
+const ivec2 txPoints     = ivec2(2, 0);
+const ivec2 txState      = ivec2(3, 0);
+
+// Read stored data
+vec4 loadValue(in ivec2 addr) {
+    return texelFetch(iChannel0, addr, 0);
+}
+
+// Write data (in buffer pass)
+void storeValue(in ivec2 addr, in vec4 val, inout vec4 fragColor, in ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+// Read keyboard input (ShaderToy keyboard texture)
+float key = texelFetch(iChannel1, ivec2(KEY_SPACE, 0), 0).x;
+```
+
+### Step 4: Manual Bilinear Interpolation + Quintic Hermite Smoothing
+
+**What**: Bypass hardware bilinear filtering by manually sampling 4 texels and interpolating with a quintic Hermite polynomial for C² continuity.
+
+**Why**: Hardware bilinear interpolation is linear (C⁰ continuous), which produces visible grid-like seams when layering noise FBM. Quintic Hermite interpolation has zero first and second derivatives at sample points, eliminating these artifacts.
+
+**Mathematical Derivation**:
+
+Standard bilinear interpolation uses linear weight `u = f` (where `f = fract(x)`), which causes derivative discontinuity at boundaries.
+
+Quintic Hermite polynomial: `u = f³(6f² - 15f + 10)`
+
+Verifying C² continuity:
+- `u(0) = 0`, `u(1) = 1` — Correct interpolation boundaries
+- `u'(f) = 30f²(f-1)²` → `u'(0) = 0`, `u'(1) = 0` — First derivative is zero at boundaries
+- `u''(f) = 60f(f-1)(2f-1)` → `u''(0) = 0`, `u''(1) = 0` — Second derivative is zero at boundaries
+
+```glsl
+// Manual four-point sampling + quintic Hermite interpolation (from up in the cloud sea)
+float noise(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+
+    // Quintic Hermite smoothing (C2 continuous)
+    vec2 u = f * f * f * (f * (f * 6.0 - 15.0) + 10.0);
+
+    // Manual sampling of four corner points (divided by texture resolution for normalization)
+    #define TEX_RES 1024.0  // Adjustable: noise texture resolution
+    float a = texture(iChannel0, (p + vec2(0.0, 0.0)) / TEX_RES).x;
+    float b = texture(iChannel0, (p + vec2(1.0, 0.0)) / TEX_RES).x;
+    float c = texture(iChannel0, (p + vec2(0.0, 1.0)) / TEX_RES).x;
+    float d = texture(iChannel0, (p + vec2(1.0, 1.0)) / TEX_RES).x;
+
+    // Bilinear blending
+    return a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+}
+```
+
+### Step 5: FBM (Fractional Brownian Motion) Noise from Textures
+
+**What**: Build multi-scale procedural noise by layering multiple texture samples at different frequencies.
+
+**Why**: A single noise sample lacks the multi-scale detail found in nature. FBM simulates the 1/f spectral characteristics of natural textures by layering at doubling frequencies with halving amplitudes. Most natural textures (terrain, clouds, rocks) exhibit 1/f noise characteristics — low frequencies contain most of the energy, high frequencies add detail.
+
+FBM formula: `fbm(x) = Σ (persistence^i × noise(2^i × x))` for i = 0..N-1
+
+Parameter effects:
+- **OCTAVES (number of layers)**: More layers add more detail, but each additional layer adds one complete noise call
+- **PERSISTENCE**: Controls the amplitude decay rate at higher frequencies. 0.5 is the classic value; higher values (0.6-0.7) produce rougher textures; lower values (0.3-0.4) produce smoother textures
+
+```glsl
+#define FBM_OCTAVES 5       // Adjustable: number of layers, more = richer detail
+#define FBM_PERSISTENCE 0.5 // Adjustable: amplitude decay rate, higher = stronger high-frequency detail
+
+float fbm(vec2 x) {
+    float v = 0.0;
+    float a = 0.5;          // Initial amplitude
+    float totalWeight = 0.0;
+    for (int i = 0; i < FBM_OCTAVES; i++) {
+        v += a * noise(x);
+        totalWeight += a;
+        x *= 2.0;           // Double frequency
+        a *= FBM_PERSISTENCE;
+    }
+    return v / totalWeight;
+}
+```
+
+### Step 6: Separable Gaussian Blur (Multi-Pass Convolution)
+
+**What**: Decompose a 2D Gaussian blur into horizontal and vertical passes, each performing a 1D convolution.
+
+**Why**: A direct NxN 2D convolution requires N² samples; after separation, only 2N are needed. This leverages the separability of the Gaussian kernel — a 2D Gaussian function can be decomposed into the product of two 1D Gaussian functions: `G(x,y) = G(x) × G(y)`. `fract()` wraps coordinates to implement torus boundary conditions, avoiding edge artifacts.
+
+Optimization trick: Leveraging the "free" interpolation of hardware bilinear filtering — sampling between two texels gives a single `texture()` call the weighted average of both texels, achieving an N-tap effect with `(N+1)/2` samples.
+
+```glsl
+// Horizontal blur pass (from expansive reaction-diffusion)
+#define BLUR_RADIUS 4  // Adjustable: blur radius (kernel width = 2*BLUR_RADIUS+1)
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 d = vec2(1.0 / iResolution.x, 0.0); // Horizontal step
+
+    // 9-tap Gaussian weights (sigma ≈ 2.0)
+    float w[9] = float[9](0.05, 0.09, 0.12, 0.15, 0.16, 0.15, 0.12, 0.09, 0.05);
+
+    vec4 col = vec4(0.0);
+    for (int i = -4; i <= 4; i++) {
+        col += w[i + 4] * texture(iChannel0, fract(uv + float(i) * d));
+    }
+    col /= 0.98; // Weight normalization correction
+    fragColor = col;
+}
+
+// Vertical blur pass: change d to vec2(0.0, 1.0/iResolution.y)
+```
+
+### Step 7: Dispersion Sampling (Wavelength-Dependent Displacement)
+
+**What**: Sample a texture multiple times along a displacement vector with different offsets, weighted by spectral response curves, to simulate prismatic dispersion.
+
+**Why**: Different wavelengths of real light have different refractive indices, causing spatial color separation. By progressively offsetting UV along the displacement direction and accumulating with different weights per RGB channel, this physical phenomenon can be simulated.
+
+Design principles of spectral response weights:
+- **Red channel** `t²`: Enhanced at the long wavelength end; red light is at the far end of the spectrum
+- **Green channel** `46.6666 × ((1-t) × t)³`: Peak at middle wavelengths, simulating the human eye's greatest sensitivity to green
+- **Blue channel** `(1-t)²`: Enhanced at the short wavelength end; blue light is at the near end of the spectrum
+
+```glsl
+#define DISP_SAMPLES 64  // Adjustable: dispersion sample count, more = smoother
+
+// Spectral response weights (simulating human eye cone response)
+vec3 sampleWeights(float i) {
+    return vec3(
+        i * i,                            // Red: long wavelength enhancement
+        46.6666 * pow((1.0 - i) * i, 3.0), // Green: middle wavelength peak
+        (1.0 - i) * (1.0 - i)             // Blue: short wavelength enhancement
+    );
+}
+
+// Dispersion sampling
+vec3 sampleDisp(sampler2D tex, vec2 uv, vec2 disp) {
+    vec3 col = vec3(0.0);
+    vec3 totalWeight = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = sampleWeights(t);
+        col += w * texture(tex, fract(uv + disp * t)).rgb;
+        totalWeight += w;
+    }
+    return col / totalWeight;
+}
+```
+
+### Step 8: IBL Environment Sampling (textureLod + Roughness Mapping)
+
+**What**: Select the cubemap mipmap level based on surface roughness for image-based lighting.
+
+**Why**: In PBR, rough surfaces need to gather lighting from a wider range of the environment (equivalent to a blurred environment map). High mipmap levels naturally correspond to blurred versions of the environment map, so roughness can be directly mapped to LOD level. This is the split-sum approximation method popularized by Epic Games in UE4.
+
+Complete split-sum IBL workflow:
+1. Pre-filter environment map: different roughness values correspond to different mip levels
+2. Pre-compute BRDF LUT: `vec2(NdotV, roughness)` -> `vec2(scale, bias)`
+3. Final compositing: `specular = envColor * (F * brdf.x + brdf.y)`
+
+```glsl
+#define MAX_LOD 7.0     // Adjustable: cubemap maximum mip level
+#define DIFFUSE_LOD 6.5 // Adjustable: diffuse sampling LOD (near the blurriest level)
+
+// Specular IBL (from Old watch)
+vec3 getSpecularLightColor(vec3 N, float roughness) {
+    vec3 raw = textureLod(iChannel0, N, roughness * MAX_LOD).rgb;
+    return pow(raw, vec3(4.5)) * 6.5; // HDR approximation boost
+}
+
+// Diffuse irradiance IBL
+vec3 getDiffuseLightColor(vec3 N) {
+    return textureLod(iChannel0, N, DIFFUSE_LOD).rgb;
+}
+
+// BRDF LUT query (precomputed split-sum approximation)
+vec2 brdf = texture(iChannel3, vec2(NdotV, roughness)).rg;
+vec3 specular = envColor * (F * brdf.x + brdf.y);
+```
+
+## Variant Details
+
+### Variant 1: Anisotropic Flow Field Blur
+
+**Difference from basic version**: Instead of uniform Gaussian blur, performs directional blur along a noise-driven direction field, producing a flowing brushstroke effect. The direction field can come from a noise texture, velocity field, or user-defined vector field. The parabolic weight `4h(1-h)` makes the blur strongest at the path center and weakest at both ends, producing a more natural trailing effect.
+
+```glsl
+#define BLUR_ITERATIONS 32  // Adjustable: number of samples along flow field
+#define BLUR_STEP 0.008     // Adjustable: UV offset per step
+
+vec3 flowBlur(vec2 uv) {
+    vec3 col = vec3(0.0);
+    float acc = 0.0;
+    for (int i = 0; i < BLUR_ITERATIONS; i++) {
+        float h = float(i) / float(BLUR_ITERATIONS);
+        float w = 4.0 * h * (1.0 - h); // Parabolic weight
+        col += w * texture(iChannel0, uv).rgb;
+        acc += w;
+        // Direction from noise texture (or other vector field)
+        vec2 dir = texture(iChannel1, uv).xy * 2.0 - 1.0;
+        uv += BLUR_STEP * dir;
+    }
+    return col / acc;
+}
+```
+
+### Variant 2: Texture as Data Storage (Buffer-as-Data)
+
+**Difference from basic version**: Textures store structured data (positions, velocities, state) instead of colors, using `texelFetch` for exact reads to achieve inter-frame persistent state.
+
+The key to this pattern is the "address-value" mapping: each pixel coordinate is an "address", and the `vec4` is the stored "value". In a buffer pass, the shader executes for every pixel, but only writes a new value when `fragPos == addr`; all other pixels retain their old values. This implements selective writing.
+
+Applicable scenarios: Game state (health, score, position), particle system parameters, physics simulation global variables.
+
+```glsl
+// Address definitions
+const ivec2 txPosition = ivec2(0, 0);
+const ivec2 txVelocity = ivec2(1, 0);
+const ivec2 txState    = ivec2(2, 0);
+
+// Data read/write interface
+vec4 load(ivec2 addr) { return texelFetch(iChannel0, addr, 0); }
+
+void store(ivec2 addr, vec4 val, inout vec4 fragColor, ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+// Usage in mainImage
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 p = ivec2(fragCoord);
+    fragColor = texelFetch(iChannel0, p, 0); // Default: keep old value
+
+    vec4 pos = load(txPosition);
+    vec4 vel = load(txVelocity);
+    // ... update logic ...
+    store(txPosition, pos + vel * 0.016, fragColor, p);
+    store(txVelocity, vel, fragColor, p);
+}
+```
+
+### Variant 3: Chromatic Dispersion
+
+**Difference from basic version**: Samples multiple times along a displacement vector, each at a different offset with wavelength-dependent weighted RGB accumulation, producing a prismatic dispersion effect. `DISP_STRENGTH` controls the spatial range of dispersion — larger values produce more pronounced RGB separation.
+
+```glsl
+#define DISP_SAMPLES 64     // Adjustable: sample count
+#define DISP_STRENGTH 0.05  // Adjustable: dispersion strength
+
+vec3 dispersion(vec2 uv, vec2 displacement) {
+    vec3 col = vec3(0.0);
+    vec3 w_total = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = vec3(t * t, 46.666 * pow((1.0 - t) * t, 3.0), (1.0 - t) * (1.0 - t));
+        col += w * texture(iChannel0, fract(uv + displacement * t * DISP_STRENGTH)).rgb;
+        w_total += w;
+    }
+    return col / w_total;
+}
+```
+
+### Variant 4: Triplanar Texture Mapping
+
+**Difference from basic version**: For 3D surfaces, samples textures using three projection directions (X/Y/Z axes) and blends by normal weights, avoiding seam issues with traditional UV mapping.
+
+`TRIPLANAR_SHARPNESS` controls the blend transition sharpness: higher values produce sharper transitions between projection faces; a value of 1.0 provides the smoothest but potentially blurry transitions. Typical values are 2.0-4.0.
+
+Applicable scenarios: Procedural terrain (where UV unwrapping cannot be done in advance), geometry generated by SDF ray marching.
+
+```glsl
+#define TRIPLANAR_SHARPNESS 2.0  // Adjustable: blend sharpness
+
+vec3 triplanarSample(sampler2D tex, vec3 pos, vec3 normal, float scale) {
+    vec3 w = pow(abs(normal), vec3(TRIPLANAR_SHARPNESS));
+    w /= (w.x + w.y + w.z); // Normalize weights
+
+    vec3 xSample = texture(tex, pos.yz * scale).rgb;
+    vec3 ySample = texture(tex, pos.xz * scale).rgb;
+    vec3 zSample = texture(tex, pos.xy * scale).rgb;
+
+    return xSample * w.x + ySample * w.y + zSample * w.z;
+}
+```
+
+### Variant 5: Temporal Reprojection (TAA)
+
+**Difference from basic version**: Calculates the current frame pixel's UV position in the previous frame, samples the previous frame data from the buffer, and blends to achieve temporal anti-aliasing or accumulation effects.
+
+`TAA_BLEND` controls the history frame weight: higher values (e.g., 0.95) provide better temporal stability but more motion trailing; lower values (e.g., 0.8) provide faster response but more flickering. The clamp operation prevents ghosting — when the history color exceeds the current frame's neighborhood range, it indicates a large scene change, and history weight should be reduced.
+
+```glsl
+#define TAA_BLEND 0.9  // Adjustable: history frame blend ratio (higher = smoother but more trailing)
+
+vec3 temporalBlend(vec2 currUv, vec2 prevUv, vec3 currColor) {
+    vec3 history = textureLod(iChannel0, prevUv, 0.0).rgb;
+    // Simple clamp to prevent ghosting
+    vec3 minCol = currColor - 0.1;
+    vec3 maxCol = currColor + 0.1;
+    history = clamp(history, minCol, maxCol);
+    return mix(currColor, history, TAA_BLEND);
+}
+```
+
+## Performance Optimization Details
+
+### Bottleneck 1: Texture Sampling Bandwidth
+
+- **Problem**: A large number of `texture()` calls (e.g., 64 dispersion samples) is a GPU bandwidth-intensive operation
+- **Optimization**: Reduce sample count and compensate with smarter weight functions; use mipmap (`textureLod` at high LOD) to reduce cache misses
+- **Details**: GPU texture cache works in cache lines; cache hit rates are high when adjacent pixels access similar texture regions. Higher LOD level textures are smaller and more likely to fit entirely in cache. For dispersion sampling, consider performing dispersion in a low-resolution buffer first, then bilinearly upsampling
+
+### Bottleneck 2: Separable Blur
+
+- **Problem**: A 2D Gaussian blur requires N² samples
+- **Optimization**: Always use a separable two-pass approach (horizontal + vertical), reducing complexity from O(N²) to O(2N)
+- **Advanced trick**: Leverage hardware bilinear filtering's "free" interpolation — sampling between two texels causes the hardware to automatically return the weighted average, achieving an N-tap effect with `(N+1)/2` samples. For example, a 9-tap Gaussian requires only 5 texture samples
+
+### Bottleneck 3: Mip Selection in Ray Marching
+
+- **Problem**: The GPU's screen-space derivatives (`dFdx`/`dFdy`) are incorrect inside ray march loops, because adjacent pixels may be at completely different ray march steps, causing incorrect automatic mip level selection
+- **Optimization**: Use `textureLod(..., 0.0)` in all texture queries within ray march loops to force the base level
+- **Alternative**: If mipmap anti-aliasing is needed, manually compute the LOD: estimate screen-space coverage based on ray length and surface tilt angle, then convert to LOD with `log2()`
+
+### Bottleneck 4: Manual Interpolation for High-Frequency Noise
+
+- **Problem**: Manual four-point sampling + Hermite interpolation is approximately 4x slower than hardware bilinear (4 `texture()` calls + math vs. 1 hardware-filtered `texture()` call)
+- **Optimization**: Only use it when the visual difference is noticeable (first 1-2 octaves of FBM); higher-frequency octaves can fall back to `texture()` since the difference is no longer visible
+- **Tradeoff**: For a 6-octave FBM, using Hermite for the first 2 octaves (8 samples) and hardware bilinear for the last 4 (4 samples) totals 12 samples — half of the 24 samples needed for full Hermite
+
+### Bottleneck 5: Multi-Buffer Feedback Latency
+
+- **Problem**: Each buffer in a multi-pass feedback loop adds one frame of latency (because a buffer's output is only readable in the next frame)
+- **Optimization**: Combine mergeable operations into a single pass whenever possible; use `texelFetch` instead of `texture` to read buffer data to avoid unnecessary filtering overhead
+- **Architecture suggestion**: When designing buffer topology, minimize feedback chain length. If A→B→C→A forms a three-frame delay loop, consider whether B and C can be merged into a single pass
+
+## Complete Combination Code Examples
+
+### Combining with SDF Ray Marching
+
+Texture sampling provides surface detail for SDF scenes: sampling noise textures for displacement mapping, material lookup. Key: `textureLod(..., 0.0)` must be used inside ray march loops.
+
+```glsl
+// Using texture noise for detail displacement in an SDF scene
+float map(vec3 p) {
+    float d = length(p) - 1.0; // Base sphere SDF
+
+    // Texture noise displacement (must use textureLod inside ray march)
+    float n = textureLod(iChannel0, p.xz * 0.5, 0.0).x;
+    d += n * 0.1; // Surface detail
+
+    return d;
+}
+
+// Material query also uses textureLod
+vec3 getMaterial(vec3 p, vec3 n) {
+    // Triplanar mapping for material color
+    vec3 w = pow(abs(n), vec3(2.0));
+    w /= (w.x + w.y + w.z);
+    vec3 col = textureLod(iChannel1, p.yz * 0.5, 0.0).rgb * w.x
+             + textureLod(iChannel1, p.xz * 0.5, 0.0).rgb * w.y
+             + textureLod(iChannel1, p.xy * 0.5, 0.0).rgb * w.z;
+    return col;
+}
+```
+
+### Combining with Procedural Noise (Domain Warping)
+
+Texture-based noise (manual Hermite + FBM) serves as the driver for domain warping, used to generate terrain, clouds, flames, and other natural effects. Texture noise is faster than pure mathematical noise (one texture sample vs. multiple hash calculations).
+
+```glsl
+// Domain warping: use FBM to warp FBM's input coordinates
+float domainWarp(vec2 p) {
+    // First warping layer
+    vec2 q = vec2(fbm(p + vec2(0.0, 0.0)),
+                  fbm(p + vec2(5.2, 1.3)));
+
+    // Second warping layer (more complex effect)
+    vec2 r = vec2(fbm(p + 4.0 * q + vec2(1.7, 9.2)),
+                  fbm(p + 4.0 * q + vec2(8.3, 2.8)));
+
+    return fbm(p + 4.0 * r);
+}
+```
+
+### Combining with Post-Processing Pipeline
+
+Multi-LOD sampling for bloom, separable Gaussian blur for depth of field, dispersion sampling for chromatic aberration. These techniques can be chained into a complete post-processing pipeline.
+
+```glsl
+// Complete post-processing chain (single-pass simplified version)
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // 1. Read scene color (from Buffer A)
+    vec3 col = texture(iChannel0, uv).rgb;
+
+    // 2. Bloom (multi-LOD sampling)
+    vec3 bloom = vec3(0.0);
+    bloom += textureLod(iChannel0, uv, 4.0).rgb * 0.5;
+    bloom += textureLod(iChannel0, uv, 5.0).rgb * 0.3;
+    bloom += textureLod(iChannel0, uv, 6.0).rgb * 0.2;
+    col += bloom * 0.3;
+
+    // 3. Chromatic aberration (simplified 3-tap)
+    vec2 dir = uv - 0.5;
+    float strength = length(dir) * 0.02;
+    col.r = texture(iChannel0, uv + dir * strength).r;
+    col.b = texture(iChannel0, uv - dir * strength).b;
+
+    // 4. Tone mapping (Filmic)
+    col = (col * (6.2 * col + 0.5)) / (col * (6.2 * col + 1.7) + 0.06);
+
+    // 5. Vignette
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.2);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+### Combining with PBR/IBL Lighting
+
+`textureLod` samples the cubemap by roughness for image-based lighting, combined with a precomputed BRDF LUT (queried via `texelFetch` or `texture`), forming a complete split-sum IBL pipeline.
+
+```glsl
+// Complete IBL lighting computation
+vec3 computeIBL(vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    float NdotV = max(dot(N, V), 0.0);
+    vec3 R = reflect(-V, N);
+
+    // Fresnel (Schlick approximation)
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 F = F0 + (1.0 - F0) * pow(1.0 - NdotV, 5.0);
+
+    // Specular: sample pre-filtered environment map by roughness
+    vec3 specEnv = textureLod(iChannel0, R, roughness * 7.0).rgb;
+    specEnv = pow(specEnv, vec3(4.5)) * 6.5; // HDR approximation
+
+    // BRDF LUT query
+    vec2 brdf = texture(iChannel3, vec2(NdotV, roughness)).rg;
+    vec3 specular = specEnv * (F * brdf.x + brdf.y);
+
+    // Diffuse irradiance
+    vec3 diffEnv = textureLod(iChannel0, N, 6.5).rgb;
+    vec3 kD = (1.0 - F) * (1.0 - metallic);
+    vec3 diffuse = kD * albedo * diffEnv;
+
+    return diffuse + specular;
+}
+```
+
+### Combining with Simulation/Feedback Systems
+
+Multi-buffer texture sampling for reaction-diffusion, fluid simulation, and other iterative systems. Buffer A stores state, Buffer B/C perform separable blur diffusion, and the Image pass handles final visualization. `fract()` wraps coordinates for torus boundaries.
+
+```glsl
+// Buffer A: Reaction-diffusion state update
+// iChannel0: Buffer A itself (feedback)
+// iChannel1: Buffer B (result after horizontal blur)
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 px = 1.0 / iResolution.xy;
+
+    // Read current state and diffused state
+    vec2 state = texelFetch(iChannel0, ivec2(fragCoord), 0).xy;
+    vec2 diffused = texture(iChannel1, uv).xy; // After separable blur
+
+    // Gray-Scott reaction-diffusion
+    float a = diffused.x;
+    float b = diffused.y;
+    float feed = 0.037;
+    float kill = 0.06;
+
+    float da = 1.0 * (diffused.x - state.x) - a * b * b + feed * (1.0 - a);
+    float db = 0.5 * (diffused.y - state.y) + a * b * b - (kill + feed) * b;
+
+    state += vec2(da, db) * 0.9;
+    state = clamp(state, 0.0, 1.0);
+
+    fragColor = vec4(state, 0.0, 1.0);
+}
+```
--- a/skills/shader-dev/reference/volumetric-rendering.md
+++ b/skills/shader-dev/reference/volumetric-rendering.md
@@ -0,0 +1,608 @@
+# Volumetric Rendering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, normalize
+- **Ray Representation**: `P = ro + t * rd` (ray origin + t × ray direction)
+- **Noise Function Basics**: value noise, Perlin noise, fBM (Fractal Brownian Motion)
+- **Basic Optical Concepts**:
+  - Transmittance: the fraction of light remaining after passing through a medium
+  - Scattering: light changing direction within a medium
+  - Absorption: light energy being converted to heat by the medium
+
+## Core Principles
+
+The core of volumetric rendering is **Ray Marching**: along each view ray, advancing with fixed or adaptive step sizes, querying medium density at each sample point, and accumulating color and opacity.
+
+### Key Mathematical Formulas
+
+#### 1. Beer-Lambert Transmittance Law
+
+Transmittance of light passing through a medium of thickness `d` with extinction coefficient `σe`:
+
+```
+T = exp(-σe × d)
+```
+
+Where `σe = σs + σa` (scattering coefficient + absorption coefficient).
+
+**Physical meaning**: the larger the extinction coefficient or thicker the medium, the less light passes through. This is the fundamental law of all volumetric rendering.
+
+#### 2. Front-to-Back Alpha Compositing
+
+Standard form:
+```
+color_acc += sample_color × sample_alpha × (1.0 - alpha_acc)
+alpha_acc += sample_alpha × (1.0 - alpha_acc)
+```
+
+Equivalent premultiplied alpha form (most commonly used in actual code):
+```glsl
+col.rgb *= col.a;           // Premultiply
+sum += col * (1.0 - sum.a); // Front-to-back compositing
+```
+
+**Why front-to-back?** Because it allows early exit (early ray termination) when accumulated opacity approaches 1.0, saving significant computation.
+
+#### 3. Henyey-Greenstein Phase Function
+
+Describes the directional distribution of light scattering in a medium:
+
+```
+HG(cosθ, g) = (1 - g²) / (1 + g² - 2g·cosθ)^(3/2)
+```
+
+- `g > 0`: forward scattering (e.g., the silver lining effect in clouds) — light primarily continues along its original direction
+- `g < 0`: backward scattering — light primarily reflects back
+- `g = 0`: isotropic scattering — light scatters uniformly in all directions
+
+**Practical application**: Clouds typically use a dual-lobe HG function, mixing a forward scattering lobe (g≈0.8) and a backward scattering lobe (g≈-0.2) to simulate the real light scattering characteristics of cloud layers. Forward scattering produces the silver lining, while backward scattering provides volume definition.
+
+#### 4. Frostbite Improved Integration Formula
+
+In each step, the scattered light is not simply `S × dt`, but a more precise integral:
+
+```
+Sint = (S - S × exp(-σe × dt)) / σe
+```
+
+**Why is improvement needed?** The naive `S × dt` integration overestimates scattered light at larger step sizes or stronger scattering, leading to energy non-conservation (image too bright or too dark). The Frostbite formula ensures energy conservation at any step size through precise integration of the Beer-Lambert law.
+
+## Implementation Steps
+
+### Step 1: Camera and Ray Construction
+
+**What**: Generate a ray from the camera for each pixel.
+
+**Why**: This is the starting point for all ray marching techniques. Camera position determines the viewing angle; ray direction determines the sampling path.
+
+```glsl
+// Normalize screen coordinates to [-1,1], correcting for aspect ratio
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Camera parameters
+vec3 ro = vec3(0.0, 1.0, -5.0);  // Tunable: camera position
+vec3 ta = vec3(0.0, 0.0, 0.0);   // Tunable: look-at target
+
+// Build camera matrix
+vec3 ww = normalize(ta - ro);
+vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+vec3 vv = cross(uu, ww);
+
+// Generate ray direction
+float fl = 1.5; // Tunable: focal length, larger = narrower FOV
+vec3 rd = normalize(uv.x * uu + uv.y * vv + fl * ww);
+```
+
+**Key parameter notes**:
+- `ro`: camera position — changing it orbits around the volume
+- `ta`: look-at target — the camera points toward this position
+- `fl`: focal length — 1.0 ≈ 90° FOV, 1.5 ≈ 67° FOV, 2.0 ≈ 53° FOV
+- Normalizing with `iResolution.y` ensures circles don't distort
+
+### Step 2: Volume Boundary Intersection
+
+**What**: Compute distances `tmin`/`tmax` where the ray enters and exits the volume, limiting the marching range.
+
+**Why**: Avoids wasting samples in empty regions. Different volume shapes use different intersection methods.
+
+```glsl
+// --- Method A: Horizontal plane boundaries (cloud layers) ---
+float yBottom = -1.0; // Tunable: volume bottom Y coordinate
+float yTop    =  2.0; // Tunable: volume top Y coordinate
+float tmin = (yBottom - ro.y) / rd.y;
+float tmax = (yTop    - ro.y) / rd.y;
+if (tmin > tmax) { float tmp = tmin; tmin = tmp; tmax = tmin; tmin = tmp; }
+// In practice, handle edge cases like ray direction parallel to plane
+
+// --- Method B: Sphere boundary (explosions, fur balls, atmospheres) ---
+// Returns intersection distances of ray with sphere centered at origin with radius r
+vec2 intersectSphere(vec3 ro, vec3 rd, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5); // No hit
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+**Selection guide**:
+- Use plane boundaries (Method A) for horizontally distributed volumes like cloud layers
+- Use sphere intersection (Method B) for spherical volumes like explosions or planetary atmospheres
+- AABB (axis-aligned bounding box) intersection can also be used for cuboid-shaped volumes
+
+### Step 3: Density Field Definition
+
+**What**: Define the medium density at each point in space. This is the most core and flexible part of volumetric rendering.
+
+**Why**: The density field determines the volume's shape, texture, and dynamic characteristics. Different density functions produce completely different visual effects.
+
+```glsl
+// 3D Value Noise (classic texture-lookup-based implementation)
+float noise(vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f); // smoothstep interpolation
+
+    vec2 uv = (p.xy + vec2(37.0, 239.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+// fBM (Fractal Brownian Motion) — layering multiple frequency noises
+float fbm(vec3 p) {
+    float f = 0.0;
+    f += 0.50000 * noise(p); p *= 2.02;
+    f += 0.25000 * noise(p); p *= 2.03;
+    f += 0.12500 * noise(p); p *= 2.01;
+    f += 0.06250 * noise(p); p *= 2.02;
+    f += 0.03125 * noise(p);
+    return f;
+}
+
+// Cloud density function example
+float cloudDensity(vec3 p) {
+    vec3 q = p - vec3(0.0, 0.1, 1.0) * iTime; // Wind direction animation
+    float f = fbm(q);
+    // Use Y coordinate to limit cloud height range
+    return clamp(1.5 - p.y - 2.0 + 1.75 * f, 0.0, 1.0);
+}
+```
+
+**Density field design points**:
+- The `noise` function uses texture lookup (`iChannel0`) to implement 3D value noise, faster than pure arithmetic implementations
+- `fbm` layers 5 octaves of noise to produce natural fractal detail
+- Non-integer frequency multipliers (2.02, 2.03) break repetitiveness
+- In `cloudDensity`, `1.5 - p.y - 2.0` establishes a base density field that decreases with height
+- Time offset `iTime` produces a wind-blown effect
+
+### Step 4: Ray Marching Main Loop
+
+**What**: March along the ray from `tmin` to `tmax`, sampling density at each step and accumulating color and opacity.
+
+**Why**: This is the core loop of volumetric rendering. Step count and step size directly affect quality and performance.
+
+```glsl
+#define NUM_STEPS 64        // Tunable: march steps, more = finer
+#define STEP_SIZE 0.05      // Tunable: fixed step size (or use adaptive)
+
+vec4 raymarch(vec3 ro, vec3 rd, float tmin, float tmax, vec3 bgCol) {
+    vec4 sum = vec4(0.0); // rgb = accumulated color (premultiplied alpha), a = accumulated opacity
+
+    // Jitter starting position to eliminate banding artifacts
+    float t = tmin + STEP_SIZE * fract(sin(dot(fragCoord, vec2(12.9898, 78.233))) * 43758.5453);
+
+    for (int i = 0; i < NUM_STEPS; i++) {
+        if (t > tmax || sum.a > 0.99) break; // Early exit: out of range or fully opaque
+
+        vec3 pos = ro + t * rd;
+        float den = cloudDensity(pos);
+
+        if (den > 0.01) {
+            // --- Color and lighting (see Step 5) ---
+            vec4 col = vec4(1.0, 0.95, 0.8, den); // Placeholder color
+
+            // Opacity scaling
+            col.a *= 0.4; // Tunable: density scale factor
+            // Can also multiply by step size: col.a = min(col.a * 8.0 * dt, 1.0);
+
+            // Premultiply alpha and front-to-back compositing
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);
+        }
+
+        t += STEP_SIZE;
+        // Adaptive step variant: t += max(0.05, 0.02 * t);
+    }
+
+    return clamp(sum, 0.0, 1.0);
+}
+```
+
+**Key design decisions**:
+- **Steps vs step size**: fixed step count suits known volume sizes; fixed step size suits uncertain volume sizes
+- **Jittering**: without jittering, visible banding artifacts appear; adding pixel-dependent random offset converts banding into invisible noise
+- **Early exit condition**: `sum.a > 0.99` is one of the most important performance optimizations
+- **Density threshold**: `den > 0.01` skips empty regions, avoiding unnecessary lighting calculations
+- **Adaptive step size**: `max(0.05, 0.02 * t)` gives small steps up close (good detail) and large steps at distance (fast)
+
+### Step 5: Lighting Calculation
+
+**What**: Compute lighting color for each sample point within the volume.
+
+**Why**: Lighting is the determining factor for visual quality in volumetric rendering. Different lighting models suit different scenarios.
+
+```glsl
+// === Method A: Directional derivative lighting (simplest, single extra sample) ===
+// Classic directional derivative method, requires only 1 extra noise sample
+vec3 sundir = normalize(vec3(1.0, 0.0, -1.0)); // Tunable: sun direction
+float dif = clamp((den - cloudDensity(pos + 0.3 * sundir)) / 0.6, 0.0, 1.0);
+vec3 lin = vec3(1.0, 0.6, 0.3) * dif + vec3(0.91, 0.98, 1.05); // Sunlight color + sky light
+```
+
+**Method A details**: Estimates lighting by comparing density at the current point with an offset position along the light direction. The direction where density decreases indicates the light source. This is an approximate method — extremely fast but not very physically accurate. Suitable for stylized clouds or performance-critical scenarios.
+
+```glsl
+// === Method B: Volumetric shadow (secondary ray march) ===
+// Volumetric shadow (Frostbite-style)
+float volumetricShadow(vec3 from, vec3 lightDir) {
+    float shadow = 1.0;
+    float dt = 0.5;            // Tunable: shadow step size
+    float d = dt * 0.5;
+    for (int s = 0; s < 6; s++) { // Tunable: shadow steps (6-16)
+        vec3 pos = from + lightDir * d;
+        float muE = cloudDensity(pos);
+        shadow *= exp(-muE * dt); // Beer-Lambert
+        dt *= 1.3;               // Tunable: step size increase factor
+        d += dt;
+    }
+    return shadow;
+}
+```
+
+**Method B details**: For each sample point, performs a second ray march toward the light source, accumulating transmittance. This is the more physically accurate method but computationally expensive (each primary step requires an additional 6-16 shadow steps). The increasing step size (`dt *= 1.3`) is because distant regions contribute less to shadowing.
+
+```glsl
+// === Method C: Henyey-Greenstein phase function scattering ===
+float HenyeyGreenstein(float cosTheta, float g) {
+    float gg = g * g;
+    return (1.0 - gg) / pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+}
+// Mix forward and backward scattering
+float sundotrd = dot(rd, -sundir);
+float scattering = mix(
+    HenyeyGreenstein(sundotrd, 0.8),   // Tunable: forward scattering g value
+    HenyeyGreenstein(sundotrd, -0.2),  // Tunable: backward scattering g value
+    0.5                                 // Tunable: blend ratio
+);
+```
+
+**Method C details**: The phase function describes the probability distribution of light scattering in different directions. The dual-lobe HG function mixes forward and backward scattering, simulating the cloud silver lining effect (forward scattering lobe) and dark-side volume definition (backward scattering lobe). Forward scattering with `g=0.8` makes the lit side very bright — an important visual characteristic of real clouds.
+
+### Step 6: Color Mapping
+
+**What**: Map density values to colors.
+
+**Why**: Different media (clouds, flames, explosions) require different coloring strategies.
+
+```glsl
+// === Method A: Density interpolation coloring (clouds) ===
+vec3 cloudColor = mix(vec3(1.0, 0.95, 0.8),   // Lit side color (tunable)
+                      vec3(0.25, 0.3, 0.35),   // Dark side color (tunable)
+                      den);
+```
+
+**Method A details**: Low density areas show bright color (near white, simulating thin cloud translucency), high density areas show dark color (gray-blue, simulating thick cloud light blocking). Simple and efficient.
+
+```glsl
+// === Method B: Radial gradient coloring (explosions, flames) ===
+vec3 computeColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8),
+                      vec3(0.4, 0.15, 0.1), density);
+    vec3 colCenter = 7.0 * vec3(0.8, 1.0, 1.0);  // Tunable: core highlight color
+    vec3 colEdge = 1.5 * vec3(0.48, 0.53, 0.5);   // Tunable: edge color
+    result *= mix(colCenter, colEdge, min(radius / 0.9, 1.15));
+    return result;
+}
+```
+
+**Method B details**: Explosion/flame cores are extremely bright (HDR values > 1.0, multiplied by 7.0), while edges are darker. Both density and distance from center determine the color. The core color multiplied by 7.0 creates an overexposure effect that, combined with post-processing tone mapping, produces a searing heat look.
+
+```glsl
+// === Method C: Height-based ambient gradient (production-grade clouds) ===
+vec3 ambientLight = mix(
+    vec3(39., 67., 87.) * (1.5 / 255.),   // Bottom ambient color (tunable)
+    vec3(149., 167., 200.) * (1.5 / 255.), // Top ambient color (tunable)
+    normalizedHeight
+);
+```
+
+**Method C details**: Real cloud bottoms are darker blue (receiving ground reflection and sky scattering), while tops are brighter gray-blue (receiving more sky light). Using normalized height for interpolation produces a natural vertical gradient.
+
+### Step 7: Final Compositing and Post-Processing
+
+**What**: Blend volumetric rendering results with the background, applying tone mapping and post-processing.
+
+**Why**: Post-processing significantly affects final visual quality.
+
+```glsl
+// Background sky
+vec3 bgCol = vec3(0.6, 0.71, 0.75) - rd.y * 0.2 * vec3(1.0, 0.5, 1.0);
+float sun = clamp(dot(sundir, rd), 0.0, 1.0);
+bgCol += 0.2 * vec3(1.0, 0.6, 0.1) * pow(sun, 8.0); // Sun halo
+
+// Composite volume with background
+vec4 vol = raymarch(ro, rd, tmin, tmax, bgCol);
+vec3 col = bgCol * (1.0 - vol.a) + vol.rgb;
+
+// Sun flare
+col += vec3(0.2, 0.08, 0.04) * pow(sun, 3.0);
+
+// Tone mapping (simple smoothstep version)
+col = smoothstep(0.15, 1.1, col);
+
+// Optional: distance fog (inside the marching loop)
+// col.xyz = mix(col.xyz, bgCol, 1.0 - exp(-0.003 * t * t));
+
+// Optional: vignette
+float vignette = 0.25 + 0.75 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+col *= vignette;
+```
+
+**Post-processing details**:
+- **Sky gradient**: `rd.y` controls sky color variation from horizon to zenith
+- **Sun halo**: `pow(sun, 8.0)` produces a narrow, bright halo; higher exponent = narrower halo
+- **Sun flare**: `pow(sun, 3.0)` produces a wider warm-colored flare
+- **Distance fog**: `exp(-0.003 * t * t)` gradually blends distant volumes into the background
+- **Tone mapping**: `smoothstep(0.15, 1.1, col)` lifts shadows, compresses highlights, and increases contrast
+- **Vignette**: simulates lens vignette effect, guiding visual focus to the center of the frame
+
+## Variant Details
+
+### Variant 1: Emissive Volume (Flames/Explosions)
+
+**Difference from the base version**: No external light source; color is entirely determined by density and position. Density maps to emissive color.
+
+**Design concept**: Flames and explosions are self-luminous — no external lighting calculation needed. The core region is extremely bright (HDR), while edges are dim. Color is mapped through a combination of density and distance from center. Bloom effects are achieved by adding distance-attenuated light source contributions in the accumulation loop.
+
+**Key code**:
+```glsl
+// Replace lighting calculation with emissive color mapping
+vec3 emissionColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8), vec3(0.4, 0.15, 0.1), density);
+    vec3 colCenter = 7.0 * vec3(0.8, 1.0, 1.0);
+    vec3 colEdge = 1.5 * vec3(0.48, 0.53, 0.5);
+    result *= mix(colCenter, colEdge, min(radius / 0.9, 1.15));
+    return result;
+}
+// Use bloom effect in the accumulation loop
+vec3 lightColor = vec3(1.0, 0.5, 0.25);
+sum.rgb += lightColor / exp(lDist * lDist * lDist * 0.08) / 30.0;
+```
+
+### Variant 2: Physical Scattering Atmosphere (Rayleigh + Mie)
+
+**Difference from the base version**: Uses nested ray marching to compute optical depth; separates Rayleigh and Mie scattering channels; uses precise Beer-Lambert transmittance.
+
+**Design concept**: Atmospheric scattering requires handling two scattering mechanisms separately:
+- **Rayleigh scattering**: wavelength-dependent (shorter wavelengths scatter more), producing the blue sky effect. Scattering coefficient proportional to λ⁻⁴.
+- **Mie scattering**: wavelength-independent, primarily caused by aerosols/large particles, producing the orange-red of sunsets and white halos around the sun.
+
+Density decreases exponentially with altitude, using different scale height parameters to control the altitude distribution of both scattering types. Nested ray marching (marching toward the sun for each sample point) computes optical depth for precise Beer-Lambert transmittance.
+
+**Key code**:
+```glsl
+// Atmospheric density decreases exponentially with altitude
+float density(vec3 p, float scaleHeight) {
+    return exp(-max(length(p) - R_INNER, 0.0) / scaleHeight);
+}
+// Nested ray march to compute optical depth
+float opticDepth(vec3 from, vec3 to, float scaleHeight) {
+    vec3 s = (to - from) / float(NUM_STEPS_LIGHT);
+    vec3 v = from + s * 0.5;
+    float sum = 0.0;
+    for (int i = 0; i < NUM_STEPS_LIGHT; i++) {
+        sum += density(v, scaleHeight);
+        v += s;
+    }
+    return sum * length(s);
+}
+// Rayleigh phase function
+float phaseRayleigh(float cc) { return (3.0 / 16.0 / PI) * (1.0 + cc); }
+// Combined Rayleigh + Mie
+vec3 scatter = sumRay * kRay * phaseRayleigh(cc) + sumMie * kMie * phaseMie(-0.78, c, cc);
+```
+
+### Variant 3: Frostbite Energy-Conserving Integration
+
+**Difference from the base version**: Uses an improved scattering integration formula that maintains energy conservation in strongly scattering media.
+
+**Design concept**: Naive Euler integration `S × dt` is inaccurate at large step sizes or in dense media. The Frostbite formula performs precise exponential integration for each step's scattering, ensuring that the sum of accumulated scattering and transmittance never exceeds the incident light regardless of step size. This is especially important for dense fog, volumetric lighting, and similar scenarios.
+
+**Key code**:
+```glsl
+// Replace naive integration with Frostbite formula
+vec3 S = evaluateLight(p) * sigmaS * phaseFunction() * volumetricShadow(p, lightPos);
+vec3 Sint = (S - S * exp(-sigmaE * dt)) / sigmaE; // Improved integration
+scatteredLight += transmittance * Sint;
+transmittance *= exp(-sigmaE * dt);
+```
+
+### Variant 4: Production-Grade Clouds (Horizon Zero Dawn Style)
+
+**Difference from the base version**: Uses Perlin-Worley noise textures instead of procedural noise; layered density modeling (base shape + detail erosion); dual-lobe HG phase function; temporal reprojection anti-aliasing.
+
+**Design concept**: Production-grade cloud rendering uses a layered approach:
+1. **Low-frequency shape layer** (`cloudMapBase`): uses Perlin-Worley 3D texture to define the rough cloud shape
+2. **Height gradient** (`cloudGradient`): controls density distribution with altitude based on cloud type (cumulus, stratus, etc.)
+3. **High-frequency detail layer** (`cloudMapDetail`): higher frequency noise erodes edges, adding detail
+4. **Coverage control** (`COVERAGE`): global parameter controlling the proportion of cloud coverage in the sky
+
+Temporal reprojection is key to the production-grade approach: each frame renders only 1/16 of pixels (checkerboard pattern), then reprojects results to the current frame. Combined with 95% historical frame blending, it achieves high-quality results with minimal marching steps.
+
+**Key code**:
+```glsl
+// Layered noise modeling
+float m = cloudMapBase(pos, norY);          // Low-frequency shape
+m *= cloudGradient(norY);                    // Height gradient
+m -= cloudMapDetail(pos) * dstrength * 0.225; // High-frequency detail erosion
+m = smoothstep(0.0, 0.1, m + (COVERAGE - 1.0));
+// Dual-lobe HG scattering
+float scattering = mix(
+    HenyeyGreenstein(sundotrd, 0.8),   // Forward
+    HenyeyGreenstein(sundotrd, -0.2),  // Backward
+    0.5
+);
+// Temporal reprojection (between Buffers)
+vec2 spos = reprojectPos(ro + rd * dist, iResolution.xy, iChannel1);
+vec4 ocol = texture(iChannel1, spos, 0.0);
+col = mix(ocol, col, 0.05); // 5% new frame + 95% history frame
+```
+
+### Variant 5: Gradient Normal Surface Lighting (Fur Ball / Volume Surface)
+
+**Difference from the base version**: Uses central differencing to compute gradient normals within the volume, then applies diffuse + specular lighting as if it were a surface. Suitable for volume objects with a clear "surface" feel (fur, translucent spheres).
+
+**Design concept**: Some volume objects (fur balls, fuzzy surfaces) are volumetric data but visually resemble surfaced objects. In this case, central differencing in the density field computes the gradient (the direction of fastest density change), which serves as the normal for traditional surface lighting models.
+
+- **Half-Lambert**: `dot(N, L) * 0.5 + 0.5` compresses the dark side range, simulating subsurface scattering
+- **Blinn-Phong**: provides specular reflection, adding material definition
+
+**Key code**:
+```glsl
+// Central differencing for normals
+vec3 furNormal(vec3 pos, float density) {
+    float eps = 0.01;
+    vec3 n;
+    n.x = sampleDensity(pos + vec3(eps, 0, 0)) - density;
+    n.y = sampleDensity(pos + vec3(0, eps, 0)) - density;
+    n.z = sampleDensity(pos + vec3(0, 0, eps)) - density;
+    return normalize(n);
+}
+// Half-Lambert diffuse + Blinn-Phong specular
+vec3 N = -furNormal(pos, density);
+float diff = max(0.0, dot(N, L) * 0.5 + 0.5);  // Half-Lambert
+float spec = pow(max(0.0, dot(N, H)), 50.0);     // Tunable: specular sharpness
+```
+
+## In-Depth Performance Optimization
+
+### 1. Early Ray Termination
+
+Immediately break from the loop when accumulated opacity exceeds a threshold (e.g., 0.99). This is the most important optimization — used by all analyzed shaders.
+
+**Effect**: For dense volumes (such as thick cloud layers), many rays can exit within 20-30 steps instead of completing all 80+ steps, achieving 2-4x performance improvement.
+
+### 2. LOD Noise
+
+Reduce the fBM octave count based on ray distance. Distant areas don't need high-frequency detail:
+```glsl
+int lod = 5 - int(log2(1.0 + t * 0.5));
+```
+
+**Effect**: Distant areas use only 2-3 fBM octaves (vs 5 up close), reducing noise sampling by 40-60%. Since distant pixels cover a larger spatial range, high-frequency detail wouldn't be visible anyway.
+
+### 3. Adaptive Step Size
+
+Small steps up close (fine detail), large steps at distance (speed):
+```glsl
+float dt = max(0.05, 0.02 * t);
+```
+
+**Effect**: Significantly reduces the number of distant steps without noticeably degrading near-field quality. However, abrupt step size changes may cause visual discontinuities.
+
+### 4. Dithering
+
+Add pixel-dependent random offset at the ray starting position to eliminate stepping banding artifacts:
+```glsl
+t += STEP_SIZE * hash(fragCoord);
+```
+
+**Note**: Dithering doesn't improve performance but significantly improves visual quality — converting visible banding artifacts into imperceptible high-frequency noise.
+
+### 5. Bounding Volume Clipping
+
+Only march within the interval where the ray intersects the volume (plane clipping, sphere intersection, AABB clipping).
+
+**Effect**: For volumes that occupy a small portion of the screen, many rays can skip marching entirely. Performance improvement depends on the volume's screen coverage area.
+
+### 6. Density Threshold Skip
+
+Skip lighting calculations when density is below a threshold (lighting is often the most expensive part):
+```glsl
+if (den > 0.01) { /* compute lighting and compositing */ }
+```
+
+**Effect**: Lighting calculations (especially secondary volumetric shadow marching) are the most time-consuming part. Skipping lighting for low-density regions saves significant computation.
+
+### 7. Minimal Shadow Step Count
+
+Volumetric self-shadow step counts can be far fewer than the main loop (6-16 steps suffice), with increasing step sizes to cover greater distances.
+
+**Reason**: Human eyes are less sensitive to shadow detail than to shape detail. 6 steps with 1.3x increasing step size can cover approximately 20 units of distance.
+
+### 8. Temporal Reprojection
+
+Reproject the previous frame's results to the current frame for blending, dramatically reducing the required marching steps per frame.
+
+**Typical configuration**: Using only 12 steps + 95% historical frame blending (`mix(oldColor, newColor, 0.05)`) can produce quality far exceeding 12-step single-frame rendering.
+
+**Caveats**:
+- Requires an additional Buffer for storing the historical frame
+- Fast motion may cause ghosting
+- Requires correct reprojection matrix handling for camera movement
+
+## Combination Suggestions
+
+### 1. SDF Terrain + Volumetric Clouds
+
+Render ground/mountains with SDF ray marching, then render cloud layers above using volumetric marching. The two mutually occlude through depth values.
+
+**Implementation points**:
+- Render SDF terrain first, recording hit depth
+- During volumetric marching, stop at the depth value (ground occludes clouds)
+- If the ray passes through the cloud layer before hitting the ground, march within the cloud interval and terminate at the ground
+
+### 2. Volumetric Fog + Scene Lighting
+
+Overlay volumetric fog on existing SDF/polygon scenes, applying `color = color * transmittance + scatteredLight` to already-rendered scenes.
+
+**Implementation points**:
+- After rendering the scene, march fog along the ray for each pixel
+- Accumulate fog scattering and transmittance
+- Final color = scene color × transmittance + fog scattered light
+
+### 3. Multi-Layer Volumes
+
+Different heights or regions use different density functions (e.g., high-altitude cumulus + low-altitude fog layer), each marched independently then composited.
+
+**Implementation points**:
+- Each layer has its own boundaries and density function
+- Can be processed in the same marching loop (checking which layer the current point is in), or marched separately then composited
+- Separate marching is more flexible but requires correct inter-layer occlusion handling
+
+### 4. Particle System + Volume
+
+Particles provide macro-scale motion and shape; volumetric rendering adds internal detail and lighting to particles.
+
+### 5. Post-Process Light Shafts (God Rays)
+
+After volumetric rendering, add light shaft effects using radial blur or screen-space ray marching to enhance volume definition.
+
+**Implementation points**:
+- In screen space, sample radially outward from the sun position, accumulating brightness
+- Or for each pixel, march a short distance along the light source direction, sampling occluder depth
+- Light shaft intensity is multiplied by the dot product of light direction and view direction to control visible angles
+
+### 6. Procedural Sky + Volumetric Clouds
+
+First render a procedural sky/atmospheric scattering as background, then overlay volumetric clouds on top. The transition between the two is achieved through distance fog for natural blending.
+
+**Implementation points**:
+- Use an atmospheric scattering model (Variant 2) or a simplified gradient model for the sky
+- Apply distance fog within the volumetric marching loop: `mix(litCol, bgCol, 1.0 - exp(-0.003 * t * t))`
+- Distant clouds naturally blend into the sky color, avoiding abrupt boundaries
--- a/skills/shader-dev/reference/voronoi-cellular-noise.md
+++ b/skills/shader-dev/reference/voronoi-cellular-noise.md
@@ -0,0 +1,486 @@
+# Voronoi & Cellular Noise — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, variant descriptions, performance analysis, and complete combination code.
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: `vec2/vec3`, `floor/fract`, `dot`, `smoothstep` and other built-in functions
+- **Vector Math**: dot product, distance calculation, vector normalization
+- **Pseudo-Random Hash Function Concepts**: input coordinates -> pseudo-random values, deterministic but appearing random
+- **fBm (Fractional Brownian Motion) Basics**: multi-layer noise summation, used for advanced variants
+
+## Core Principles in Detail
+
+The essence of Voronoi noise is **spatial partitioning**: scatter a set of feature points across 2D/3D space, and each pixel belongs to the "cell" defined by its nearest feature point.
+
+**Core Algorithm Flow:**
+
+1. Divide space into an integer grid (`floor`), placing one randomly offset feature point in each grid cell
+2. For the current pixel, search all feature points in the surrounding 3x3 (2D) or 3x3x3 (3D) neighborhood
+3. Calculate the distance to each feature point, recording the nearest distance F1 (and optionally the second-nearest distance F2)
+4. Use F1, F2, or their combination (e.g., F2-F1) as the output value, mapping to color/height/shape
+
+**Key Mathematics:**
+- Distance metrics: Euclidean `length(r)` or `dot(r,r)` (squared distance, faster), Manhattan `abs(r.x)+abs(r.y)`, Chebyshev `max(abs(r.x), abs(r.y))`
+- Exact border distance (two-pass algorithm): `dot(0.5*(mr+r), normalize(r-mr))` (perpendicular bisector projection)
+- Rounded borders (harmonic mean): `1/(1/(d2-d1) + 1/(d3-d1))`
+
+## Implementation Steps — Detailed Explanation
+
+### Step 1: Hash Function — Generating Pseudo-Random Feature Points
+
+**What**: Define a hash function that maps 2D integer coordinates to a pseudo-random `vec2` in the [0,1] range.
+
+**Why**: Feature point positions within each grid cell need to be deterministic but appear random. Hash functions provide this "reproducible randomness". Different hash functions affect distribution uniformity and visual quality.
+
+**Code**:
+```glsl
+// Classic sin-dot hash (concise and efficient, suitable for most scenarios)
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return fract(sin(p) * 43758.5453);
+}
+
+// 3D version (for 3D Voronoi)
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+// High-quality integer hash (more uniform distribution, for production-grade noise)
+vec3 hash3_uint(vec3 p) {
+    uvec3 q = uvec3(ivec3(p)) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    q = (q.x ^ q.y ^ q.z) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    return vec3(q) / float(0xffffffffU);
+}
+```
+
+### Step 2: Grid Partitioning and Neighborhood Search — F1 Distance
+
+**What**: Split input coordinates into integer part (grid ID) and fractional part (position within cell), iterate over the 3x3 neighborhood to compute distances to all feature points, and find the nearest distance F1.
+
+**Why**: `floor/fract` discretizes continuous space into a grid. Since feature points are offset within the [0,1] range, the nearest point can only be in the current cell or its 8 neighbors, so a 3x3 search covers all cases.
+
+**Code**:
+```glsl
+// Basic 2D Voronoi — returns (F1 distance, cell ID)
+vec2 voronoi(vec2 x) {
+    vec2 n = floor(x);   // Current grid coordinate
+    vec2 f = fract(x);   // Offset within cell [0,1)
+
+    vec3 m = vec3(8.0);  // (min distance, corresponding hash value) — initialized to large value
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));       // Neighbor offset
+        vec2 o = hash2(n + g);                    // Feature point position in that cell [0,1)
+        vec2 r = g - f + o;                       // Vector from current pixel to that feature point
+        float d = dot(r, r);                      // Squared distance (avoids sqrt)
+
+        if (d < m.x) {
+            m = vec3(d, o);                       // Update nearest distance and cell ID
+        }
+    }
+
+    return vec2(sqrt(m.x), m.y + m.z);  // (distance, ID)
+}
+```
+
+### Step 3: F1 + F2 Tracking — Edge Detection
+
+**What**: Simultaneously record the nearest distance F1 and second-nearest distance F2 during the search, using F2-F1 to extract cell boundaries.
+
+**Why**: The value of F2-F1 is large inside cells (far from boundaries) and approaches 0 at cell junctions (two feature points equidistant). This is the most common Voronoi edge detection method.
+
+**Code**:
+```glsl
+// F1 + F2 Voronoi — returns vec2(F1, F2)
+vec2 voronoi_f1f2(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+
+    vec2 res = vec2(8.0); // res.x = F1, res.y = F2
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 b = vec2(i, j);
+        vec2 r = b - f + hash2(p + b);
+        float d = dot(r, r); // Can substitute other distance metrics
+
+        if (d < res.x) {
+            res.y = res.x;   // Previous F1 becomes F2
+            res.x = d;       // Update F1
+        } else if (d < res.y) {
+            res.y = d;       // Update F2
+        }
+    }
+
+    res = sqrt(res);
+    return res;
+    // Edge value = res.y - res.x (F2 - F1)
+}
+```
+
+### Step 4: Exact Border Distance — Two-Pass Algorithm
+
+**What**: First pass finds the nearest feature point; second pass calculates the exact distance to all neighboring cell boundaries.
+
+**Why**: Simple F2-F1 is only an approximation of the boundary. For geometrically exact equidistant lines and smooth boundary rendering, the distance to the perpendicular bisector must be computed. The second pass requires a 5x5 search range to ensure geometric correctness.
+
+**Code**:
+```glsl
+// Exact border distance Voronoi — returns vec3(border distance, nearest point offset)
+vec3 voronoi_border(vec2 x) {
+    vec2 ip = floor(x);
+    vec2 fp = fract(x);
+
+    // === Pass 1: Find nearest feature point ===
+    vec2 mg, mr;
+    float md = 8.0;
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+        float d = dot(r, r);
+
+        if (d < md) {
+            md = d;
+            mr = r;    // Vector to nearest point
+            mg = g;    // Grid offset of nearest point
+        }
+    }
+
+    // === Pass 2: Calculate shortest distance to border ===
+    md = 8.0;
+
+    for (int j = -2; j <= 2; j++)
+    for (int i = -2; i <= 2; i++) {
+        vec2 g = mg + vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+
+        // Skip self
+        if (dot(mr - r, mr - r) > 0.00001)
+            // Distance to perpendicular bisector = midpoint projected onto direction vector
+            md = min(md, dot(0.5 * (mr + r), normalize(r - mr)));
+    }
+
+    return vec3(md, mr);
+}
+```
+
+### Step 5: Feature Point Animation
+
+**What**: Make feature points move smoothly over time, producing organic dynamic effects.
+
+**Why**: Static Voronoi is suitable for texture maps, but real-time effects usually require animation. Using `sin(iTime + 6.2831*hash)` makes each point oscillate at a different phase while staying within the [0,1] range.
+
+**Code**:
+```glsl
+// Within the neighborhood search loop, replace static hash with animated version:
+vec2 o = hash2(n + g);
+o = 0.5 + 0.5 * sin(iTime + 6.2831 * o); // Animation: each point has a different phase
+vec2 r = g - f + o;
+```
+
+### Step 6: Coloring and Visualization
+
+**What**: Map Voronoi distance values to colors, rendering cell fills, border lines, and feature point markers.
+
+**Why**: Different mapping methods produce dramatically different visual effects. Distance values can be used directly as grayscale, or transformed into rich colors through palette functions.
+
+**Code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // Must use iTime, otherwise the compiler optimizes away this uniform
+    float time = iTime * 1.0;
+    vec2 p = fragCoord.xy / iResolution.xy;
+    vec2 uv = p * SCALE; // SCALE controls cell density
+
+    // Compute Voronoi
+    vec2 c = voronoi(uv);
+    float dist = c.x;   // F1 distance
+    float id   = c.y;   // Cell ID
+
+    // --- Cell coloring (ID-driven palette) ---
+    vec3 col = 0.5 + 0.5 * cos(id * 6.2831 + vec3(0.0, 1.0, 2.0));
+
+    // --- Distance falloff (cell center bright, edges dark) ---
+    col *= clamp(1.0 - 0.4 * dist * dist, 0.0, 1.0);
+
+    // --- Border lines (draw black line when distance below threshold) ---
+    col -= (1.0 - smoothstep(0.08, 0.09, dist));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: 3D Voronoi + fBm Fire
+
+Difference from base version: extends 2D Voronoi to 3D space, multi-layer fBm summation produces volumetric feel, combined with blackbody radiation palette for rendering fire/nebula.
+
+Key modified code:
+```glsl
+#define NUM_OCTAVES 5  // Tunable: fBm layer count
+
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+float voronoi3D(vec3 p) {
+    vec3 g = floor(p);
+    p = fract(p);
+    float d = 1.0;
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++)
+    for (int k = -1; k <= 1; k++) {
+        vec3 b = vec3(i, j, k);
+        vec3 r = b - p + hash3(g + b);
+        d = min(d, dot(r, r));
+    }
+    return d;
+}
+
+float fbmVoronoi(vec3 p) {
+    vec3 t = vec3(0.0, 0.0, p.z + iTime * 1.5);
+    float tot = 0.0, sum = 0.0, amp = 1.0;
+    for (int i = 0; i < NUM_OCTAVES; i++) {
+        tot += voronoi3D(p + t) * amp;
+        p *= 2.0;
+        t *= 1.5; // Time frequency differs from spatial frequency -> parallax effect
+        sum += amp;
+        amp *= 0.5;
+    }
+    return tot / sum;
+}
+
+// Blackbody radiation palette
+vec3 firePalette(float i) {
+    float T = 1400.0 + 1300.0 * i;
+    vec3 L = vec3(7.4, 5.6, 4.4);
+    L = pow(L, vec3(5.0)) * (exp(1.43876719683e5 / (T * L)) - 1.0);
+    return 1.0 - exp(-5e8 / L);
+}
+```
+
+### Variant 2: Rounded Borders (3rd-Order Voronoi)
+
+Difference from base version: simultaneously tracks F1, F2, and F3 (three nearest distances), using a harmonic mean formula to produce smoother, more uniform cell boundaries instead of standard Voronoi's sharp intersections.
+
+Key modified code:
+```glsl
+float voronoiRounded(vec2 p) {
+    vec2 g = floor(p);
+    p -= g;
+    vec3 d = vec3(1.0); // d.x=F1, d.y=F2, d.z=F3
+
+    for (int y = -1; y <= 1; y++)
+    for (int x = -1; x <= 1; x++) {
+        vec2 o = vec2(x, y);
+        o += hash2(g + o) - p;
+        float r = dot(o, o);
+
+        // Maintain top 3 nearest distances simultaneously
+        d.z = max(d.x, max(d.y, min(d.z, r))); // F3
+        d.y = max(d.x, min(d.y, r));             // F2
+        d.x = min(d.x, r);                       // F1
+    }
+
+    d = sqrt(d);
+
+    // Harmonic mean formula -> rounded borders
+    return min(2.0 / (1.0 / max(d.y - d.x, 0.001)
+                    + 1.0 / max(d.z - d.x, 0.001)), 1.0);
+}
+```
+
+### Variant 3: Voronoise (Unified Noise-Voronoi Framework)
+
+Difference from base version: through two parameters `u` (jitter amount) and `v` (smoothness), continuously interpolates between Cell Noise, Perlin Noise, and Voronoi. Uses weighted accumulation instead of `min()` operation, requiring a 5x5 search range.
+
+Key modified code:
+```glsl
+#define JITTER 1.0    // Tunable: 0=regular grid, 1=fully random
+#define SMOOTH 0.0    // Tunable: 0=sharp Voronoi, 1=smooth noise
+
+float voronoise(vec2 p, float u, float v) {
+    float k = 1.0 + 63.0 * pow(1.0 - v, 6.0); // Smoothness kernel
+
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+
+    vec2 a = vec2(0.0);
+    for (int y = -2; y <= 2; y++)
+    for (int x = -2; x <= 2; x++) {
+        vec2 g = vec2(x, y);
+        vec3 o = hash3(i + g) * vec3(u, u, 1.0); // u controls jitter
+        vec2 d = g - f + o.xy;
+        float w = pow(1.0 - smoothstep(0.0, 1.414, length(d)), k);
+        a += vec2(o.z * w, w); // Weighted accumulation
+    }
+
+    return a.x / a.y;
+}
+
+// hash3 needs to return vec3
+vec3 hash3(vec2 p) {
+    vec3 q = vec3(dot(p, vec2(127.1, 311.7)),
+                  dot(p, vec2(269.5, 183.3)),
+                  dot(p, vec2(419.2, 371.9)));
+    return fract(sin(q) * 43758.5453);
+}
+```
+
+### Variant 4: Crack Textures (Multi-Layer Recursive Voronoi)
+
+Difference from base version: uses extended jitter range to generate irregular cells, two-pass algorithm for exact boundaries, then overlays Perlin fBm perturbation on crack paths. Multi-layer recursion (rotation + scaling) produces fractal crack networks.
+
+Key modified code:
+```glsl
+#define CRACK_DEPTH 3.0    // Tunable: recursion depth
+#define CRACK_WIDTH 0.0    // Tunable: crack width
+#define CRACK_SLOPE 50.0   // Tunable: crack sharpness
+
+// Extended jitter range makes cell shapes more irregular
+float ofs = 0.5;
+#define disp(p) (-ofs + (1.0 + 2.0 * ofs) * hash2(p))
+
+// Main loop: multi-layer crack overlay
+vec4 O = vec4(0.0);
+vec2 U = uv;
+for (float i = 0.0; i < CRACK_DEPTH; i++) {
+    vec2 D = fbm22(U) * 0.67;           // fBm perturbation of crack paths
+    vec3 H = voronoiBorder(U + D);       // Exact border distance
+    float d = H.x;
+    d = min(1.0, CRACK_SLOPE * pow(max(0.0, d - CRACK_WIDTH), 1.0));
+    O += vec4(1.0 - d) / exp2(i);       // Layer weight decay
+    U *= 1.5 * rot(0.37);               // Rotate + scale into next layer
+}
+```
+
+### Variant 5: Tileable 3D Worley (Cloud Noise)
+
+Difference from base version: implements domain wrapping via `mod()` to generate seamlessly tileable 3D Worley noise. Combined with Perlin-Worley remapping for volumetric cloud rendering. Uses high-quality integer hash.
+
+Key modified code:
+```glsl
+#define TILE_FREQ 4.0  // Tunable: tiling frequency
+
+float worleyTileable(vec3 uv, float freq) {
+    vec3 id = floor(uv);
+    vec3 p = fract(uv);
+    float minDist = 1e4;
+
+    for (float x = -1.0; x <= 1.0; x++)
+    for (float y = -1.0; y <= 1.0; y++)
+    for (float z = -1.0; z <= 1.0; z++) {
+        vec3 offset = vec3(x, y, z);
+        // mod() implements domain wrapping -> seamless tiling
+        vec3 h = hash3_uint(mod(id + offset, vec3(freq))) * 0.5 + 0.5;
+        h += offset;
+        vec3 d = p - h;
+        minDist = min(minDist, dot(d, d));
+    }
+    return 1.0 - minDist; // Inverted Worley
+}
+
+// Worley fBm (GPU Pro 7 cloud approach)
+float worleyFbm(vec3 p, float freq) {
+    return worleyTileable(p * freq, freq) * 0.625
+         + worleyTileable(p * freq * 2.0, freq * 2.0) * 0.25
+         + worleyTileable(p * freq * 4.0, freq * 4.0) * 0.125;
+}
+
+// Perlin-Worley remapping
+float remap(float x, float a, float b, float c, float d) {
+    return (((x - a) / (b - a)) * (d - c)) + c;
+}
+// cloud = remap(perlinNoise, worleyFbm - 1.0, 1.0, 0.0, 1.0);
+```
+
+## Performance Optimization Details
+
+### 1. Avoid sqrt in Distance Comparisons
+
+Use `dot(r,r)` (squared distance) during the comparison phase, only taking `sqrt` for the final output. Saves 9 `sqrt` calls per pixel.
+
+### 2. Unroll 3D Voronoi Loops
+
+GPUs are not efficient with deeply nested loops. The 3x3x3 loop for 3D can be manually unrolled along the z-axis:
+```glsl
+// Instead of 3-level nesting, manually unroll z=-1, 0, 1
+for (int j = -1; j <= 1; j++)
+for (int i = -1; i <= 1; i++) {
+    b = vec3(i, j, -1); r = b - p + hash3(g+b); d = min(d, dot(r,r));
+    b.z = 0.0;          r = b - p + hash3(g+b); d = min(d, dot(r,r));
+    b.z = 1.0;          r = b - p + hash3(g+b); d = min(d, dot(r,r));
+}
+```
+
+### 3. Minimize Search Range
+
+- Basic F1: 3x3 is sufficient
+- Exact border / rounded border: second pass needs 5x5
+- Voronoise (smooth blending): needs 5x5 to cover kernel radius
+- Extended jitter (`ofs>0`): must use 5x5
+- Don't blindly use 5x5; searching 16 extra cells means 16 extra hash computations
+
+### 4. Hash Function Selection
+
+- `sin(dot(...))` hash: fastest, but insufficient precision on some GPUs
+- Texture lookup hash (`textureLod(iChannel0, ...)`): high quality but requires texture resources
+- Integer hash (`uvec3`): high quality without textures, but requires ES 3.0+
+
+### 5. Layer Count Control for Multi-Layer fBm
+
+Each additional fBm layer adds a complete Voronoi search. 3 layers usually provide sufficient detail, 5 layers is the visual upper limit, and beyond 5 layers is rarely worth the performance cost.
+
+## Combination Suggestions in Detail
+
+### 1. Voronoi + fBm Perturbation
+
+Use fBm noise to perturb Voronoi input coordinates, producing organic, irregular cell shapes (like stone textures, magma):
+```glsl
+vec2 distorted_uv = uv + 0.5 * fbm22(uv * 2.0);
+vec2 v = voronoi(distorted_uv * SCALE);
+```
+
+### 2. Voronoi + Bump Mapping
+
+Use Voronoi distance values as a height map, compute normals via finite differences for pseudo-3D bump effects:
+```glsl
+float h0 = voronoiRounded(uv);
+float hx = voronoiRounded(uv + vec2(0.004, 0.0));
+float hy = voronoiRounded(uv + vec2(0.0, 0.004));
+float bump = max(hx - h0, 0.0) * 16.0; // Simple bump value
+```
+
+### 3. Voronoi + Palette Mapping
+
+Use cell ID or distance values to drive the cosine palette, quickly producing rich procedural colors:
+```glsl
+vec3 palette(float t) {
+    return 0.5 + 0.5 * cos(6.2831 * (t + vec3(0.0, 0.33, 0.67)));
+}
+col = palette(cellId * 0.1 + iTime * 0.1);
+```
+
+### 4. Voronoi + Raymarching
+
+Use Voronoi distance as part of an SDF in raymarching scenes to sculpt cellular surface textures or crack effects.
+
+### 5. Multi-Scale Voronoi Stacking
+
+Compute multiple Voronoi layers at different frequencies and stack them for rich detail. Low-frequency layers control large structures, high-frequency layers add fine detail:
+```glsl
+float detail = voronoiRounded(uv * 6.0);       // Main structure
+float fine   = voronoiRounded(uv * 16.0) * 0.5; // Fine detail
+float result = detail + fine * detail;           // Stacking (detail modulated by main structure)
+```
--- a/skills/shader-dev/reference/voxel-rendering.md
+++ b/skills/shader-dev/reference/voxel-rendering.md
@@ -0,0 +1,701 @@
+# Voxel Rendering — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+### GLSL Fundamentals
+- GLSL basic syntax (uniforms, varyings, built-in functions)
+- Vector math: dot product, cross product, normalize, reflect
+- Understanding of step functions like `floor()`, `sign()`, `step()`
+
+### Ray-AABB Intersection (Ray-Box Intersection)
+The foundation of voxel rendering is ray tracing. You need to understand how a ray `P(t) = O + t * D` intersects with an axis-aligned bounding box (AABB). The DDA algorithm is essentially an extension of this test to the entire grid space.
+
+### Basic Lighting Models
+- Lambert diffuse: `diffuse = max(dot(normal, lightDir), 0.0)`
+- Phong specular: `specular = pow(max(dot(reflect(-lightDir, normal), viewDir), 0.0), shininess)`
+
+### SDF (Signed Distance Field) Basics
+An SDF function returns the signed distance from a point to the nearest surface (negative inside, positive outside). In voxel rendering, SDF is commonly used to define voxel occupancy: `d < 0.0` means occupied.
+
+Common SDF primitives:
+```glsl
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+```
+
+SDF boolean operations:
+- Union: `min(d1, d2)`
+- Intersection: `max(d1, d2)`
+- Subtraction: `max(d1, -d2)`
+
+## Implementation Steps
+
+### Step 1: Camera Ray Construction
+
+**What**: Convert each pixel coordinate into a world-space ray origin and direction.
+
+**Why**: Voxel rendering follows the ray tracing paradigm, with each pixel independently casting a ray. Screen coordinates must first be normalized to the [-1, 1] range, then transformed through camera parameters (focal length, plane vectors) to construct world-space ray directions.
+
+**Mathematical derivation**:
+1. `screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0` normalizes pixel coordinates to [-1, 1]
+2. The z component of `cameraDir` controls focal length: larger values = smaller FOV (more "telephoto")
+3. `cameraPlaneV` is multiplied by aspect ratio correction to ensure square voxels aren't stretched
+4. Final ray direction = camera forward + screen offset, no normalization needed (the DDA algorithm handles it naturally)
+
+**Code**:
+```glsl
+vec2 screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0;
+vec3 cameraDir = vec3(0.0, 0.0, 0.8);  // Tunable: focal length, larger = smaller FOV
+vec3 cameraPlaneU = vec3(1.0, 0.0, 0.0);
+vec3 cameraPlaneV = vec3(0.0, 1.0, 0.0) * iResolution.y / iResolution.x;
+vec3 rayDir = cameraDir + screenPos.x * cameraPlaneU + screenPos.y * cameraPlaneV;
+vec3 rayPos = vec3(0.0, 2.0, -12.0);  // Tunable: camera position
+```
+
+### Step 2: DDA Initialization
+
+**What**: Compute the initial parameters needed for grid traversal by the ray.
+
+**Why**: The DDA algorithm requires precomputing the step direction, step cost, and distance to the first boundary for each axis. These values are incrementally updated throughout traversal, avoiding per-step division.
+
+**Key variable details**:
+
+- **`mapPos = floor(rayPos)`**: grid coordinate of the cell containing the ray origin. `floor()` discretizes continuous coordinates to the integer grid.
+
+- **`rayStep = sign(rayDir)`**: step direction for each axis. `sign()` returns +1 or -1, determining whether the ray advances in the positive or negative direction on that axis.
+
+- **`deltaDist = abs(1.0 / rayDir)`**: the t cost for the ray to traverse one full grid cell on each axis. If the ray is normalized (length=1), use `1.0/rayDir` directly; when unnormalized, it's equivalent to `abs(vec3(length(rayDir)) / rayDir)`.
+
+- **`sideDist`**: the t distance from the ray origin to the next grid boundary on each axis. The formula `(sign(rayDir) * (mapPos - rayPos) + sign(rayDir) * 0.5 + 0.5) * deltaDist` computes the distance ratio from the ray origin to the next boundary on that axis, then multiplies by deltaDist to get the actual t value.
+
+**Code**:
+```glsl
+ivec3 mapPos = ivec3(floor(rayPos));        // Current grid coordinate
+vec3 rayStep = sign(rayDir);                 // Step direction per axis (+1/-1)
+vec3 deltaDist = abs(1.0 / rayDir); // t cost to traverse one cell (ray already normalized)
+// Initial t distance to next boundary
+vec3 sideDist = (sign(rayDir) * (vec3(mapPos) - rayPos) + (sign(rayDir) * 0.5) + 0.5) * deltaDist;
+```
+
+### Step 3: DDA Traversal Loop (Branchless Version)
+
+**What**: Traverse the grid cell by cell, checking for hits.
+
+**Why**: The branchless version uses `lessThanEqual` + `min` vector comparisons to determine the minimum axis in one pass, avoiding nested if-else statements and improving GPU efficiency (reduces warp divergence).
+
+**Algorithm logic**:
+1. Each iteration first checks if the current cell is occupied
+2. If no hit, find the axis corresponding to the smallest component in `sideDist`
+3. `lessThanEqual(sideDist.xyz, min(sideDist.yzx, sideDist.zxy))` generates a bvec3 where the minimum axis is true
+4. Add `deltaDist` to that axis's `sideDist`, and add `rayStep` to `mapPos`
+5. `mask` records the axis of the last step, used later for normal calculation
+
+**Code**:
+```glsl
+#define MAX_RAY_STEPS 64  // Tunable: maximum traversal steps, affects maximum view distance
+
+bvec3 mask;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    if (getVoxel(mapPos)) break;  // Hit detection
+
+    // Branchless axis selection: choose the axis with smallest sideDist
+    mask = lessThanEqual(sideDist.xyz, min(sideDist.yzx, sideDist.zxy));
+
+    sideDist += vec3(mask) * deltaDist;
+    mapPos += ivec3(vec3(mask)) * ivec3(rayStep);
+}
+```
+
+**Alternative form (step version, common in compact demos)**:
+```glsl
+vec3 mask = step(sideDist.xyz, sideDist.yzx) * step(sideDist.xyz, sideDist.zxy);
+sideDist += mask * deltaDist;
+mapPos += mask * rayStep;
+```
+
+`step(a, b)` returns `a <= b ? 1.0 : 0.0`; multiplying two steps is equivalent to "this axis is simultaneously <= both other axes," i.e., it is the minimum axis.
+
+### Step 4: Voxel Occupancy Function
+
+**What**: Determine whether a given grid coordinate is occupied.
+
+**Why**: This is the sole "scene definition" interface. By replacing this function, you can generate voxel worlds from any data source — procedural SDF, heightmaps, noise, etc. This design completely decouples scene content from the rendering algorithm.
+
+**Design points**:
+- Input is integer grid coordinates; add 0.5 to get the voxel center point
+- Returns a boolean (simple version) or material ID (advanced version)
+- Can use any combination of SDFs, noise functions, or texture sampling internally
+- Performance-critical: this function is called once per DDA step, so keep it concise
+
+**Code**:
+```glsl
+// Basic version: solid cube (use this when user requests a "voxel cube")
+// NOTE: getVoxel receives ivec3, but internal calculations must all use float!
+bool getVoxel(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);  // ivec3 → vec3 conversion (required!)
+    float d = sdBox(p, vec3(6.0));  // Solid 12x12x12 block
+    return d < 0.0;
+}
+
+// SDF boolean version: sphere carving out a block (keeping only edges)
+bool getVoxelCarved(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);
+    float d = max(-sdSphere(p, 7.5), sdBox(p, vec3(6.0)));  // box ∩ ¬sphere
+    return d < 0.0;
+}
+
+// Advanced version: heightmap terrain with material IDs
+// NOTE: Two correct approaches:
+// Approach 1: Use vec3 parameter (recommended)
+int getVoxelMaterial(vec3 c) {
+    float height = getTerrainHeight(c.xz);
+    if (c.y < height) return 1;       // Ground (c.y is float)
+    if (c.y < height + 4.0) return 7;  // Tree trunk
+    return 0;                          // Air
+}
+
+// Approach 2: Use ivec3 parameter (requires explicit conversion)
+int getVoxelMaterial(ivec3 c) {
+    vec3 p = vec3(c);  // ivec3 → vec3 conversion (required!)
+    float height = getTerrainHeight(p.xz);
+    if (float(c.y) < height) return 1;       // int → float comparison
+    if (float(c.y) < height + 4.0) return 7; // int → float comparison
+    return 0;
+}
+```
+
+### Step 5: Face Shading (Normal + Base Color)
+
+**What**: Assign different brightness levels to different faces based on the hit face's normal direction.
+
+**Why**: This is the simplest voxel shading approach — three distinct face brightnesses produce the classic "Minecraft-style" visual effect. No additional lighting calculations needed; face orientation alone provides differentiation.
+
+**Principle**:
+- `mask` records the axis of the last DDA step
+- Normal = reverse direction of the step axis: `-mask * rayStep`
+- X-axis faces (sides) are darkest, Y-axis faces (top/bottom) brightest, Z-axis faces (front/back) medium brightness
+- This fixed three-value shading simulates basic lighting under overhead illumination
+
+**Code**:
+```glsl
+// Face normal derived directly from mask
+vec3 normal = -vec3(mask) * rayStep;
+
+// Three faces with different brightness
+vec3 color;
+if (mask.x) color = vec3(0.5);   // Side face (X axis) darkest
+if (mask.y) color = vec3(1.0);   // Top face (Y axis) brightest
+if (mask.z) color = vec3(0.75);  // Front/back face (Z axis) medium
+
+fragColor = vec4(color, 1.0);
+```
+
+### Step 6: Precise Hit Position and Face UV
+
+**What**: Compute the precise intersection point of the ray with the voxel surface, and the UV coordinates within that face.
+
+**Why**: The precise intersection point is used for texture mapping and AO interpolation, rather than just grid coordinates. Face UV provides continuous coordinates (0 to 1) within a single voxel face — the basis for texture mapping and smooth AO.
+
+**Mathematical derivation**:
+1. `sideDist - deltaDist` steps back to get the t value of the hit face
+2. `dot(sideDist - deltaDist, mask)` selects the hit axis's t
+3. `hitPos = rayPos + rayDir * t` gives the precise intersection point
+4. `uvw = hitPos - mapPos` gives voxel-local coordinates [0,1]^3
+5. UV is obtained by projecting uvw onto the two tangent axes of the hit face:
+   - If X face is hit, UV = (uvw.y, uvw.z)
+   - If Y face is hit, UV = (uvw.z, uvw.x)
+   - If Z face is hit, UV = (uvw.x, uvw.y)
+   - `dot(mask * uvw.yzx, vec3(1.0))` cleverly uses mask to select the correct components
+
+**Code**:
+```glsl
+// Precise t value: step back one step using sideDist
+float t = dot(sideDist - deltaDist, vec3(mask));
+vec3 hitPos = rayPos + rayDir * t;
+
+// Face UV (for texturing, AO interpolation)
+vec3 uvw = hitPos - vec3(mapPos);  // Voxel-local coordinates [0,1]^3
+vec2 uv = vec2(dot(vec3(mask) * uvw.yzx, vec3(1.0)),
+               dot(vec3(mask) * uvw.zxy, vec3(1.0)));
+```
+
+### Step 7: Neighbor Voxel Ambient Occlusion (AO)
+
+**What**: Sample the 8 neighboring voxels around the hit face (4 edges + 4 corners), compute an occlusion value for each vertex, then bilinearly interpolate.
+
+**Why**: This is the core technique for Minecraft-style smooth lighting. When neighboring voxels are present at edges or corners, those vertex areas should appear darker. This AO requires no additional ray tracing — it's entirely based on neighbor queries, with low computational cost and good results.
+
+**Algorithm details**:
+1. For each vertex of the hit face, check the adjacent 2 edges and 1 corner
+2. `vertexAo(side, corner)` formula: `(side.x + side.y + max(corner, side.x * side.y)) / 3.0`
+   - `side.x * side.y`: when both edges are occupied, even if the corner is empty, there should be full occlusion (prevents light leaking)
+   - `max(corner, side.x * side.y)`: takes the larger of the corner and edge product
+3. Store the 4 vertex AO values in a vec4
+4. Bilinearly interpolate using the face UV for a continuous AO value
+5. `pow(ao, gamma)` controls AO contrast
+
+**Code**:
+```glsl
+// Per-vertex AO: two edges + one corner
+float vertexAo(vec2 side, float corner) {
+    return (side.x + side.y + max(corner, side.x * side.y)) / 3.0;
+}
+
+// Sample AO for 4 vertices of a face
+vec4 voxelAo(vec3 pos, vec3 d1, vec3 d2) {
+    vec4 side = vec4(
+        getVoxel(pos + d1), getVoxel(pos + d2),
+        getVoxel(pos - d1), getVoxel(pos - d2));
+    vec4 corner = vec4(
+        getVoxel(pos + d1 + d2), getVoxel(pos - d1 + d2),
+        getVoxel(pos - d1 - d2), getVoxel(pos + d1 - d2));
+    vec4 ao;
+    ao.x = vertexAo(side.xy, corner.x);
+    ao.y = vertexAo(side.yz, corner.y);
+    ao.z = vertexAo(side.zw, corner.z);
+    ao.w = vertexAo(side.wx, corner.w);
+    return 1.0 - ao;
+}
+
+// Bilinear interpolation using face UV
+vec4 ambient = voxelAo(mapPos - rayStep * mask, mask.zxy, mask.yzx);
+float ao = mix(mix(ambient.z, ambient.w, uv.x), mix(ambient.y, ambient.x, uv.x), uv.y);
+ao = pow(ao, 1.0 / 3.0);  // Tunable: gamma correction controls AO intensity
+```
+
+### Step 8: DDA Shadow Ray
+
+**What**: Cast a second DDA ray from the hit point toward the light source to detect occlusion.
+
+**Why**: Reusing the same DDA algorithm achieves hard shadows without requiring additional ray tracing infrastructure. Shadow rays typically use fewer steps (e.g., 16-32) to save performance.
+
+**Implementation details**:
+- The origin must be offset by `normal * 0.01` to avoid self-intersection
+- Shadow rays only need to determine 0/1 occlusion (hard shadows), no precise intersection needed
+- Returns 0.0 (occluded) or 1.0 (unoccluded)
+- Step count can be lower than the primary ray since only occlusion detection is needed
+
+**Code**:
+```glsl
+#define MAX_SHADOW_STEPS 32  // Tunable: shadow ray steps
+
+float castShadow(vec3 ro, vec3 rd) {
+    vec3 pos = floor(ro);
+    vec3 ri = 1.0 / rd;
+    vec3 rs = sign(rd);
+    vec3 dis = (pos - ro + 0.5 + rs * 0.5) * ri;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        if (getVoxel(ivec3(pos))) return 0.0;  // Occluded
+        vec3 mm = step(dis.xyz, dis.yzx) * step(dis.xyz, dis.zxy);
+        dis += mm * rs * ri;
+        pos += mm * rs;
+    }
+    return 1.0;  // Unoccluded
+}
+
+// Usage during shading
+vec3 sundir = normalize(vec3(-0.5, 0.6, 0.7));
+float shadow = castShadow(hitPos + normal * 0.01, sundir);
+float diffuse = max(dot(normal, sundir), 0.0) * shadow;
+```
+
+## Variant Details
+
+### Variant 1: Glowing Voxels (Glow Accumulation)
+
+**Difference from the base version**: During DDA traversal, accumulates a distance-based glow value at each step, producing a semi-transparent glow effect even without a hit.
+
+**Use cases**: Neon light effects, energy fields, particle clouds, sci-fi style
+
+**Principle**: Using the SDF distance field, glow contribution is large near the voxel surface (small distance → large 1/d²) and small far away. Accumulating contributions from all steps produces a continuous glow field.
+
+**Key parameters**:
+- `0.015`: glow intensity coefficient — larger = brighter
+- `0.01`: minimum distance threshold — prevents division by zero and controls glow "sharpness"
+- Glow color `vec3(0.4, 0.6, 1.0)`: can vary based on distance or material
+
+**Code**:
+```glsl
+float glow = 0.0;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    float d = sdSomeShape(vec3(mapPos));  // Distance to nearest surface
+    glow += 0.015 / (0.01 + d * d);      // Tunable: glow falloff
+    if (d < 0.0) break;
+    // ... normal DDA stepping ...
+}
+vec3 col = baseColor + glow * vec3(0.4, 0.6, 1.0); // Overlay glow color
+```
+
+### Variant 2: Rounded Voxels (Intra-Voxel SDF Refinement)
+
+**Difference from the base version**: After DDA hit, performs a few SDF ray march steps inside the voxel, rendering rounded blocks instead of perfect cubes.
+
+**Use cases**: Organic-style voxels, building block/LEGO effects, chibi characters
+
+**Principle**: After DDA hit, we know which voxel the ray entered, but the precise shape inside is defined by the SDF. Starting SDF ray marching from the voxel entry point, using `sdRoundedBox` to define a rounded cube, marching to the surface yields the precise rounded intersection and normal.
+
+**Key parameters**:
+- `w` (corner radius): 0.0 = perfect cube, 0.5 = sphere
+- 6 internal march steps are typically sufficient for convergence
+- `hash31(mapPos)` randomizes the corner radius per voxel, adding variety
+
+**Code**:
+```glsl
+// Refine inside the voxel after DDA hit
+float id = hash31(mapPos);
+float w = 0.05 + 0.35 * id;  // Tunable: corner radius
+
+float sdRoundedBox(vec3 p, float w) {
+    return length(max(abs(p) - 0.5 + w, 0.0)) - w;
+}
+
+// Start 6-step SDF march from voxel entry
+vec3 localP = hitPos - mapPos - 0.5;
+for (int j = 0; j < 6; j++) {
+    float h = sdRoundedBox(localP, w);
+    if (h < 0.025) break;  // Hit rounded surface
+    localP += rd * max(0.0, h);
+}
+```
+
+### Variant 3: Hybrid SDF-Voxel Traversal
+
+**Difference from the base version**: Uses SDF sphere-tracing (large steps) when far from surfaces, switching to precise DDA voxel traversal when close. Greatly improves traversal efficiency in open areas.
+
+**Use cases**: Large open worlds, long-distance voxel terrain, scenes requiring high view distance
+
+**Principle**:
+1. In open areas far from any voxel surface, SDF values are large, allowing sphere-tracing to skip large distances in one step
+2. When the SDF value approaches `sqrt(3) * voxelSize` (voxel diagonal length), we may be about to enter a voxel region
+3. Switch to DDA to ensure no voxels are skipped
+4. If DDA finds the ray has left the dense region (SDF value increases again), switch back to sphere-tracing
+
+**Key parameters**:
+- `VOXEL_SIZE`: voxel dimensions
+- `SWITCH_DIST = VOXEL_SIZE * 1.732`: switching threshold, sqrt(3) is the voxel diagonal safety factor
+
+**Code**:
+```glsl
+#define VOXEL_SIZE 0.0625       // Tunable: voxel size
+#define SWITCH_DIST (VOXEL_SIZE * 1.732)  // sqrt(3) * voxelSize
+
+bool useVoxel = false;
+for (int i = 0; i < MAX_STEPS; i++) {
+    vec3 pos = ro + rd * t;
+    float d = mapSDF(useVoxel ? voxelCenter : pos);
+
+    if (!useVoxel) {
+        t += d;
+        if (d < SWITCH_DIST) {
+            useVoxel = true;              // Switch to DDA
+            voxelPos = getVoxelPos(pos);
+        }
+    } else {
+        if (d < 0.0) { /* hit */ break; }
+        if (d > SWITCH_DIST) {
+            useVoxel = false;             // Switch back to SDF
+            t += d;
+            continue;
+        }
+        // DDA step one cell
+        vec3 exitT = (voxelPos - ro * ird + ird * VOXEL_SIZE * 0.5);
+        // ... select minimum axis and advance ...
+    }
+}
+```
+
+### Variant 4: Voxel Cone Tracing
+
+**Difference from the base version**: Builds a multi-level mipmap hierarchy of voxels (e.g., 64→32→16→8→4→2), casts cone-shaped rays from hit points, samples coarser LOD levels as distance increases, achieving diffuse/specular global illumination.
+
+**Use cases**: High-quality global illumination, colored indirect lighting, real-time GI for dynamic scenes
+
+**Principle**:
+1. Precompute mipmap levels of voxel data (resolution halved per level)
+2. Cast multiple cone-shaped rays from the hit point across the normal hemisphere (typically 5-7 cones)
+3. Each cone's diameter increases linearly with distance during traversal
+4. Diameter maps to mipmap level: `lod = log2(diameter)`
+5. Sample the corresponding mipmap level
+6. Front-to-back compositing accumulates lighting and occlusion
+
+**Key parameters**:
+- `coneRatio`: cone angle — diffuse uses wide cones (~1.0), specular uses narrow cones (~0.1)
+- 58 steps is a common balance value
+- `voxelFetch(sp, lod)` requires a custom mipmap query function
+
+**Code**:
+```glsl
+// Cone tracing: cast a cone-shaped ray along direction d
+vec4 traceCone(vec3 origin, vec3 dir, float coneRatio) {
+    vec4 light = vec4(0.0);
+    float t = 1.0;
+    for (int i = 0; i < 58; i++) {
+        vec3 sp = origin + dir * t;
+        float diameter = max(1.0, t * coneRatio);  // Cone diameter
+        float lod = log2(diameter);                  // Corresponding mipmap level
+        vec4 sample = voxelFetch(sp, lod);           // LOD sample
+        light += sample * (1.0 - light.w);           // Front-to-back compositing
+        t += diameter;
+    }
+    return light;
+}
+```
+
+### Variant 5: PBR Lighting + Multi-Bounce Reflections
+
+**Difference from the base version**: Uses GGX BRDF instead of Lambert, supports metallic/roughness material parameters, and casts a second DDA ray for reflections.
+
+**Use cases**: Realistic voxel rendering, metallic/glass materials, architectural visualization
+
+**Principle**:
+1. GGX (Trowbridge-Reitz) microfacet model provides physically correct light distribution
+2. Roughness parameter controls specular sharpness: 0.0 = perfect mirror, 1.0 = fully diffuse
+3. Schlick Fresnel approximation: `F = F0 + (1 - F0) * (1 - cos(theta))^5`
+4. Reflection ray reuses the `castRay` function with reduced step count (64 steps typically sufficient)
+5. Multi-bounce reflections can call recursively, but 1-2 bounces usually suffice
+
+**Key parameters**:
+- `roughness`: roughness [0, 1]
+- `F0 = 0.04`: base reflectance for non-metals
+- 64 steps for reflection ray (fewer than primary ray to save performance)
+
+**Code**:
+```glsl
+// GGX diffuse term
+float ggxDiffuse(float NoL, float NoV, float LoH, float roughness) {
+    float FD90 = 0.5 + 2.0 * roughness * LoH * LoH;
+    float a = 1.0 + (FD90 - 1.0) * pow(1.0 - NoL, 5.0);
+    float b = 1.0 + (FD90 - 1.0) * pow(1.0 - NoV, 5.0);
+    return a * b / 3.14159;
+}
+
+// Reflection ray - needs a separate shading function to handle HitInfo
+vec3 shadeHit(HitInfo h, vec3 rd, vec3 sunDir, vec3 skyColor) {
+    if (!h.hit) return skyColor;
+    vec3 matCol = getMaterialColor(h.mat, h.uv);
+    float diff = max(dot(h.normal, sunDir), 0.0);
+    return matCol * diff;
+}
+
+vec3 rd2 = reflect(rd, normal);
+HitInfo reflHit = castRay(hitPos + normal * 0.001, rd2, 64);
+vec3 reflColor = shadeHit(reflHit, rd2, sunDir, skyColor);
+
+// Schlick Fresnel blending
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(normal, -rd), 0.0), 5.0);
+col += fresnel * reflColor;
+```
+
+## In-Depth Performance Optimization
+
+### Main Bottlenecks
+
+1. **DDA Loop Step Count**: Each pixel needs to traverse tens to hundreds of cells — the largest performance cost. Step count is proportional to scene size and openness.
+
+2. **Voxel Query Function**: `getVoxel()` is called once per step; if using noise/textures, texture fetch overhead is significant. The complexity of procedural SDF functions directly impacts frame rate.
+
+3. **AO Neighbor Sampling**: Each hit point requires 8 additional `getVoxel()` queries. Manageable for simple scenes, but with a complex `getVoxel`, these 8 queries may exceed the main traversal cost.
+
+4. **Shadow Rays**: Equivalent to a second full DDA traversal. Dual traversal doubles the pixel shader burden.
+
+### Optimization Techniques
+
+#### Early Exit
+Break immediately when `mapPos` exceeds scene boundaries, avoiding continued traversal in meaningless space:
+```glsl
+if (any(lessThan(mapPos, vec3(-GRID_SIZE))) || any(greaterThan(mapPos, vec3(GRID_SIZE)))) break;
+```
+
+#### Reduce Shadow Steps
+Shadow rays only need to determine occlusion — 16-32 steps usually suffice. No need for the same step count as the primary ray:
+```glsl
+#define MAX_SHADOW_STEPS 32  // Instead of MAX_RAY_STEPS of 128
+```
+
+#### Distance-Based Quality Scaling
+Use high step counts for precise traversal up close, low step counts or LOD at distance. Dynamically adjust the step limit based on screen pixel size.
+
+#### Hybrid Traversal
+Use SDF sphere-tracing for large steps in open areas, switching to DDA near surfaces (see Variant 3). Can reduce traversal steps by 80%+ in large scenes.
+
+#### Avoid Complex Computation Inside the Loop
+Material queries, AO, normals, etc. are all done only after a hit. The traversal loop should only perform the simplest occupancy detection.
+
+#### Leverage GPU Texture Hardware
+Replace procedural voxel queries with texture sampling (`texelFetch`). 3D textures can store precomputed voxel data and are cache-friendly on hardware.
+
+#### Temporal Accumulation
+Multi-frame accumulation — each frame only needs a small number of samples, combined with reprojection for low-noise results. Suitable for scenarios requiring many rays (GI, soft shadows).
+
+## Complete Combination Code Examples
+
+### Procedural Noise Terrain
+Use FBM/Perlin noise inside `getVoxel()` to generate heightmaps, producing Minecraft-style infinite terrain:
+```glsl
+// Recommended approach: use vec3 parameter (simple, no type conversion issues)
+int getVoxel(vec3 c) {
+    // FBM noise heightmap
+    float height = 0.0;
+    float amp = 8.0;
+    float freq = 0.05;
+    vec2 xz = c.xz;
+    for (int i = 0; i < 4; i++) {
+        height += amp * noise(xz * freq);
+        amp *= 0.5;
+        freq *= 2.0;
+    }
+
+    if (c.y > height) return 0;           // Air
+    if (c.y > height - 1.0) return 1;     // Grass
+    if (c.y > height - 4.0) return 2;     // Dirt
+    return 3;                              // Stone
+}
+
+// ivec3 parameter version (requires type conversion)
+int getVoxel(ivec3 c) {
+    vec3 p = vec3(c);  // ivec3 → vec3 conversion
+    float height = 0.0;
+    float amp = 8.0;
+    float freq = 0.05;
+    // NOTE: p.xz returns vec2, must pass vec2 version of noise!
+    // If noise only has vec3 version, use noise(vec3(p.xz * freq, 0.0))
+    vec2 xz = p.xz;
+    for (int i = 0; i < 4; i++) {
+        height += amp * noise(xz * freq);
+        amp *= 0.5;
+        freq *= 2.0;
+    }
+
+    if (float(c.y) > height) return 0;           // int → float comparison
+    if (float(c.y) > height - 1.0) return 1;    // int → float comparison
+    if (float(c.y) > height - 4.0) return 2;    // int → float comparison
+    return 3;
+}
+```
+
+### Texture Mapping
+Sample textures using face UV after hit, achieving a retro pixel art style:
+```glsl
+// During the shading stage
+vec2 texUV = hit.uv;
+// 16x16 pixel texture atlas
+int tileX = mat % 4;
+int tileY = mat / 4;
+vec2 atlasUV = (vec2(tileX, tileY) + texUV) / 4.0;
+vec3 texCol = texture(iChannel0, atlasUV).rgb;
+col *= texCol;
+```
+
+### Atmospheric Scattering / Volumetric Fog
+Accumulate medium density during DDA traversal, achieving volumetric lighting and fog effects:
+```glsl
+float fogAccum = 0.0;
+vec3 fogColor = vec3(0.0);
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    // ... DDA stepping ...
+    float density = getDensity(mapPos);  // Atmospheric density
+    if (density > 0.0) {
+        float dt = length(vec3(mask) * deltaDist);  // Current step size
+        fogAccum += density * dt;
+        // Volumetric light: compute lighting within fog
+        float shadowInFog = castShadow(vec3(mapPos) + 0.5, sunDir);
+        fogColor += density * dt * shadowInFog * sunColor * exp(-fogAccum);
+    }
+    if (getVoxel(mapPos) > 0) break;
+}
+// Apply fog effect
+col = col * exp(-fogAccum) + fogColor;
+```
+
+### Water Surface Rendering (Voxel Water Scene)
+A complete voxel water scene with surface wave reflections, underwater refraction, sand, and seaweed:
+```glsl
+float waterY = 0.0;
+
+// Underwater voxel scene (sand + seaweed)
+// IMPORTANT: c.xz returns vec2, which only has .x/.y components — never use .z!
+int getVoxel(vec3 c) {
+    float sandHeight = -3.0 + 0.5 * sin(c.x * 0.3) * cos(c.z * 0.4);
+    if (c.y < sandHeight) return 1;      // Sand interior
+    if (c.y < sandHeight + 1.0) return 2; // Sand surface
+    // Seaweed
+    float grassHash = fract(sin(dot(floor(c.xz), vec2(12.9898, 78.233))) * 43758.5453);
+    if (grassHash > 0.85 && c.y >= sandHeight + 1.0 && c.y < sandHeight + 1.0 + 3.0 * grassHash) {
+        return 3;
+    }
+    return 0;
+}
+
+// Check if ray intersects water surface
+float tWater = (waterY - ro.y) / rd.y;
+bool hitWater = tWater > 0.0 && (tWater < hit.t || !hit.hit);
+
+if (hitWater) {
+    vec3 waterPos = ro + rd * tWater;
+    vec3 waterNormal = vec3(0.0, 1.0, 0.0);
+    // NOTE: waterPos.xz is vec2, access with .x/.y (not .x/.z)
+    vec2 waveXZ = waterPos.xz;  // vec2: waveXZ.x = worldX, waveXZ.y = worldZ
+    waterNormal.x += 0.05 * sin(waveXZ.x * 3.0 + iTime);
+    waterNormal.z += 0.05 * cos(waveXZ.y * 2.0 + iTime * 0.7);
+    waterNormal = normalize(waterNormal);
+
+    // Fresnel
+    float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(waterNormal, -rd), 0.0), 5.0);
+
+    // Reflection
+    vec3 reflDir = reflect(rd, waterNormal);
+    HitInfo reflHit = castRay(waterPos + waterNormal * 0.01, reflDir, 64);
+    vec3 reflCol = reflHit.hit ? getMaterialColor(reflHit.mat, reflHit.uv) : skyColor;
+
+    // Refraction (underwater voxels: sand, seaweed)
+    vec3 refrDir = refract(rd, waterNormal, 1.0 / 1.33);
+    HitInfo refrHit = castRay(waterPos - waterNormal * 0.01, refrDir, 64);
+    vec3 refrCol;
+    if (refrHit.hit) {
+        vec3 matCol = getMaterialColor(refrHit.mat, refrHit.uv);
+        // Underwater color attenuation (bluer with distance)
+        float underwaterDist = length(refrHit.pos - waterPos);
+        refrCol = mix(matCol, vec3(0.0, 0.15, 0.3), 1.0 - exp(-0.1 * underwaterDist));
+    } else {
+        refrCol = vec3(0.0, 0.1, 0.3);  // Deep water color
+    }
+
+    col = mix(refrCol, reflCol, fresnel);
+    col = mix(col, vec3(0.0, 0.3, 0.5), 0.2);
+}
+```
+
+### Global Illumination (Monte Carlo Hemisphere Sampling)
+Use random hemisphere direction sampling for diffuse indirect lighting:
+```glsl
+vec3 indirectLight = vec3(0.0);
+int numSamples = 4;  // Few samples per frame, accumulate across frames
+for (int s = 0; s < numSamples; s++) {
+    // Cosine-weighted hemisphere sampling
+    vec2 xi = hash22(vec2(fragCoord) + float(iFrame) * 0.618 + float(s));
+    float cosTheta = sqrt(xi.x);
+    float sinTheta = sqrt(1.0 - xi.x);
+    float phi = 6.28318 * xi.y;
+
+    vec3 sampleDir = cosTheta * normal
+                   + sinTheta * cos(phi) * tangent
+                   + sinTheta * sin(phi) * bitangent;
+
+    HitInfo giHit = castRay(hitPos + normal * 0.01, sampleDir, 32);
+    if (giHit.hit) {
+        vec3 giColor = getMaterialColor(giHit.mat, giHit.uv);
+        float giDiff = max(dot(giHit.normal, sunDir), 0.0);
+        indirectLight += giColor * giDiff;
+    } else {
+        indirectLight += skyColor;
+    }
+}
+indirectLight /= float(numSamples);
+col += matCol * indirectLight * 0.5;  // Indirect light contribution
+```
--- a/skills/shader-dev/reference/water-ocean.md
+++ b/skills/shader-dev/reference/water-ocean.md
@@ -0,0 +1,445 @@
+# Water & Ocean Rendering — Detailed Reference
+
+This document is the complete reference for [SKILL.md](SKILL.md), covering prerequisites, detailed explanations for each step, variant descriptions, in-depth performance optimization analysis, and complete code examples for combination suggestions.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, reflection/refraction vectors
+- **Basic Raymarching Concepts**
+- **FBM (Fractal Brownian Motion) / Multi-octave Noise Layering Basics**
+- **Physical Intuition of the Fresnel Effect**: strong reflection at grazing angles, strong transmission at normal incidence
+
+## Core Principles
+
+The essence of water rendering is solving three core problems: **water surface shape generation**, **light-water surface interaction**, and **water body color compositing**.
+
+### 1. Wave Generation: Exponential Sine Layering + Derivative Domain Warping
+
+Traditional sum-of-sines uses `sin(x)` to produce symmetric waveforms, but real ocean waves have **sharp crests and broad troughs**. The core formula:
+
+```
+wave(x) = exp(sin(x) - 1)
+```
+
+- When `sin(x) = 1` (crest): `exp(0) = 1.0`, sharp peak
+- When `sin(x) = -1` (trough): `exp(-2) ≈ 0.135`, broad flat valley
+
+This naturally produces a **trochoidal profile** similar to Gerstner waves, but at much lower computational cost.
+
+When layering multiple waves, the key innovation is **derivative domain warping (Drag)**:
+
+```
+position += direction * derivative * weight * DRAG_MULT
+```
+
+Each wave layer's sampling position is offset by the previous layer's derivative, causing small ripples to naturally cluster on the crests of larger waves — simulating the real-ocean phenomenon of capillary waves riding on gravity waves.
+
+### 2. Lighting Model: Schlick Fresnel + Subsurface Scattering Approximation
+
+**Schlick Fresnel Approximation**:
+```
+F = F0 + (1 - F0) * (1 - dot(N, V))^5
+```
+Where water's F0 ≈ 0.04 (only 4% reflection at normal incidence).
+
+**Subsurface Scattering (SSS)** is approximated through water thickness: troughs have thicker water layers with stronger blue-green scattering; crests have thinner layers with weaker scattering — naturally producing the visual effect of transparent crests and deep blue troughs.
+
+### 3. Water Surface Intersection: Bounded Heightfield Marching
+
+The water surface is constrained within a bounding box of `[0, -WATER_DEPTH]`, and rays only march between the intersection points of two planes. Step size is adaptive: `step = ray_y - wave_height` — large steps when far from the surface, small precise steps when close.
+
+## Implementation Steps
+
+### Step 1: Exponential Sine Wave Function
+
+**What**: Define a single directional wave's value and derivative calculation function.
+
+**Why**: `exp(sin(x) - 1)` transforms the symmetric sine into a realistic waveform with sharp crests and broad troughs. It also returns the analytical derivative, used for subsequent domain warping and normal calculation.
+
+**Code**:
+```glsl
+vec2 wavedx(vec2 position, vec2 direction, float frequency, float timeshift) {
+    float x = dot(direction, position) * frequency + timeshift;
+    float wave = exp(sin(x) - 1.0);     // Sharp crest, broad trough waveform
+    float dx = wave * cos(x);            // Analytical derivative = exp(sin(x)-1) * cos(x)
+    return vec2(wave, -dx);              // Return (value, negative derivative)
+}
+```
+
+### Step 2: Multi-Octave Wave Layering with Domain Warping
+
+**What**: Layer multiple waves with different directions, frequencies, and speeds, applying derivative-driven position offset (drag) between each layer.
+
+**Why**: A single wave is too regular. Multi-octave layering produces natural complex waveforms. Domain warping is the key — it causes small waves to cluster on top of large waves, which is the core technique distinguishing "good-looking ocean" from "ordinary noise." The frequency growth rate of 1.18 (instead of the traditional FBM 2.0) creates smoother transitions between wave layers.
+
+**Code**:
+```glsl
+#define DRAG_MULT 0.38  // Tunable: domain warp strength, 0=none, 0.5=strong clustering
+
+float getwaves(vec2 position, int iterations) {
+    float wavePhaseShift = length(position) * 0.1; // Break long-distance phase synchronization
+    float iter = 0.0;
+    float frequency = 1.0;
+    float timeMultiplier = 2.0;
+    float weight = 1.0;
+    float sumOfValues = 0.0;
+    float sumOfWeights = 0.0;
+    for (int i = 0; i < iterations; i++) {
+        vec2 p = vec2(sin(iter), cos(iter));  // Pseudo-random wave direction
+
+        vec2 res = wavedx(position, p, frequency, iTime * timeMultiplier + wavePhaseShift);
+
+        // Core: offset sampling position based on derivative (small waves ride big waves)
+        position += p * res.y * weight * DRAG_MULT;
+
+        sumOfValues += res.x * weight;
+        sumOfWeights += weight;
+
+        weight = mix(weight, 0.0, 0.2);      // Tunable: weight decay, 0.2 = 80% retained per layer
+        frequency *= 1.18;                     // Tunable: frequency growth rate
+        timeMultiplier *= 1.07;                // Tunable: higher frequency waves animate faster (dispersion)
+        iter += 1232.399963;                   // Large irrational increment ensures uniform direction distribution
+    }
+    return sumOfValues / sumOfWeights;
+}
+```
+
+### Step 3: Bounded Bounding Box Ray Marching
+
+**What**: Constrain the water surface between two horizontal planes and only march between the entry and exit points.
+
+**Why**: Much faster than unbounded SDF marching. The step size `pos.y - height` automatically adapts — large jumps when far from the surface, fine convergence when close. Precomputing bounding box intersections avoids wasting steps in open air.
+
+**Code**:
+```glsl
+#define WATER_DEPTH 1.0  // Tunable: water body thickness, affects SSS and wave amplitude
+
+float intersectPlane(vec3 origin, vec3 direction, vec3 point, vec3 normal) {
+    return clamp(dot(point - origin, normal) / dot(direction, normal), -1.0, 9991999.0);
+}
+
+float raymarchwater(vec3 camera, vec3 start, vec3 end, float depth) {
+    vec3 pos = start;
+    vec3 dir = normalize(end - start);
+    for (int i = 0; i < 64; i++) {         // Tunable: march steps, 64 is usually sufficient
+        float height = getwaves(pos.xz, ITERATIONS_RAYMARCH) * depth - depth;
+        if (height + 0.01 > pos.y) {
+            return distance(pos, camera);
+        }
+        pos += dir * (pos.y - height);      // Adaptive step size
+    }
+    return distance(start, camera);          // If missed, assume hit at top surface
+}
+```
+
+### Step 4: Normal Calculation with Distance Smoothing
+
+**What**: Compute water surface normals using finite differences, and interpolate toward the up direction based on distance to eliminate distant aliasing.
+
+**Why**: Normals determine all lighting details. Using more wave iterations for normals than for ray marching (36 vs 12) is a core performance technique — marching only needs coarse shape, normals need fine detail. The farther away, the more high-frequency normals cause flickering; smoothing toward `(0,1,0)` is equivalent to implicit LOD.
+
+**Code**:
+```glsl
+#define ITERATIONS_RAYMARCH 12  // Tunable: wave iterations for marching (fewer = faster)
+#define ITERATIONS_NORMAL 36    // Tunable: wave iterations for normals (more = finer detail)
+
+vec3 normal(vec2 pos, float e, float depth) {
+    vec2 ex = vec2(e, 0);
+    float H = getwaves(pos.xy, ITERATIONS_NORMAL) * depth;
+    vec3 a = vec3(pos.x, H, pos.y);
+    return normalize(
+        cross(
+            a - vec3(pos.x - e, getwaves(pos.xy - ex.xy, ITERATIONS_NORMAL) * depth, pos.y),
+            a - vec3(pos.x, getwaves(pos.xy + ex.yx, ITERATIONS_NORMAL) * depth, pos.y + e)
+        )
+    );
+}
+
+// Distance smoothing: distant normals approach (0,1,0)
+// N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+```
+
+### Step 5: Fresnel Reflection and Subsurface Scattering
+
+**What**: Use Schlick Fresnel approximation to calculate reflection/scattering weights, combining sky reflection with depth-dependent blue-green scattering color.
+
+**Why**: The Fresnel effect is key to water surface realism — nearly fully transparent up close, nearly fully reflective at a distance. The SSS color `(0.0293, 0.0698, 0.1717)` comes from empirical values of deep-sea scattering spectra. Troughs have thicker water layers with stronger SSS; crests have thinner layers with weaker SSS, naturally producing light-dark variation.
+
+**Code**:
+```glsl
+// Schlick Fresnel, F0 = 0.04 (water's normal incidence reflectance)
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(0.0, dot(-N, ray)), 5.0);
+
+// Reflection direction, force upward to avoid self-intersection
+vec3 R = normalize(reflect(ray, N));
+R.y = abs(R.y);
+
+// Sky reflection + sun specular
+vec3 reflection = getAtmosphere(R) + getSun(R);
+
+// Subsurface scattering: deeper (trough) = bluer color
+vec3 scattering = vec3(0.0293, 0.0698, 0.1717) * 0.1
+                * (0.2 + (waterHitPos.y + WATER_DEPTH) / WATER_DEPTH);
+
+// Final compositing
+vec3 C = fresnel * reflection + scattering;
+```
+
+### Step 6: Atmosphere and Tone Mapping
+
+**What**: Add a cheap atmospheric scattering model and ACES tone mapping.
+
+**Why**: The water surface reflects the sky, so sky quality directly affects the water's appearance. `1/(ray.y + 0.1)` approximates optical path length, `vec3(5.5, 13.0, 22.4)/22.4` represents Rayleigh scattering coefficient ratios. ACES tone mapping maps HDR values to display range, preserving highlight detail while compressing shadows.
+
+**Code**:
+```glsl
+vec3 extra_cheap_atmosphere(vec3 raydir, vec3 sundir) {
+    float special_trick = 1.0 / (raydir.y * 1.0 + 0.1);
+    float special_trick2 = 1.0 / (sundir.y * 11.0 + 1.0);
+    float raysundt = pow(abs(dot(sundir, raydir)), 2.0);
+    float sundt = pow(max(0.0, dot(sundir, raydir)), 8.0);
+    float mymie = sundt * special_trick * 0.2;
+    vec3 suncolor = mix(vec3(1.0), max(vec3(0.0), vec3(1.0) - vec3(5.5, 13.0, 22.4) / 22.4),
+                        special_trick2);
+    vec3 bluesky = vec3(5.5, 13.0, 22.4) / 22.4 * suncolor;
+    vec3 bluesky2 = max(vec3(0.0), bluesky - vec3(5.5, 13.0, 22.4) * 0.002
+                   * (special_trick + -6.0 * sundir.y * sundir.y));
+    bluesky2 *= special_trick * (0.24 + raysundt * 0.24);
+    return bluesky2 * (1.0 + 1.0 * pow(1.0 - raydir.y, 3.0));
+}
+
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(
+        0.59719, 0.07600, 0.02840,
+        0.35458, 0.90834, 0.13383,
+        0.04823, 0.01566, 0.83777);
+    mat3 m2 = mat3(
+        1.60475, -0.10208, -0.00327,
+       -0.53108,  1.10813, -0.07276,
+       -0.07367, -0.00605,  1.07602);
+    vec3 v = m1 * color;
+    vec3 a = v * (v + 0.0245786) - 0.000090537;
+    vec3 b = v * (0.983729 * v + 0.4329510) + 0.238081;
+    return pow(clamp(m2 * (a / b), 0.0, 1.0), vec3(1.0 / 2.2));
+}
+```
+
+## Common Variants
+
+### Variant 1: 2D Underwater Caustic Texture
+
+Difference from the base version: No 3D ray marching — purely a 2D screen-space effect. Uses an iterative triangular feedback loop to generate caustic light patterns, suitable as a ground projection texture for underwater scenes or as an overlay layer.
+
+Key code:
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5       // Tunable: iteration count, more = finer caustics
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    float time = iTime * 0.5 + 23.0;
+    vec2 uv = fragCoord.xy / iResolution.xy;
+    vec2 p = mod(uv * TAU, TAU) - 250.0;   // mod TAU ensures tileability
+    vec2 i = vec2(p);
+    float c = 1.0;
+    float inten = 0.005;  // Tunable: caustic line width (smaller = thinner)
+
+    for (int n = 0; n < MAX_ITER; n++) {
+        float t = time * (1.0 - (3.5 / float(n + 1)));
+        i = p + vec2(cos(t - i.x) + sin(t + i.y), sin(t - i.y) + cos(t + i.x));
+        c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten), p.y / (cos(i.y + t) / inten)));
+    }
+    c /= float(MAX_ITER);
+    c = 1.17 - pow(c, 1.4);
+    vec3 colour = vec3(pow(abs(c), 8.0));
+    colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0); // Aqua blue tint
+    fragColor = vec4(colour, 1.0);
+}
+```
+
+### Variant 2: FBM Bump-Mapped Lake Surface (Plane Intersection + Bump Mapping)
+
+Difference from the base version: No per-pixel ray marching — uses analytical plane intersection + FBM bump mapping instead. Extremely fast, suitable for distant lake surfaces or situations where water must be embedded in complex scenes (e.g., with volumetric cloud reflections).
+
+Key code:
+```glsl
+// Water surface heightmap (FBM + abs folding produces ridge-like ripples)
+float waterMap(vec2 pos) {
+    mat2 m2 = mat2(0.60, -0.80, 0.80, 0.60); // Rotation matrix to avoid axis alignment
+    vec2 posm = pos * m2;
+    return abs(fbm(vec3(8.0 * posm, iTime)) - 0.5) * 0.1;
+}
+
+// Analytical plane intersection replaces ray marching
+float t = -ro.y / rd.y;  // Water surface at y=0
+vec3 hitPos = ro + rd * t;
+
+// Finite difference normals (central differencing)
+float eps = 0.1;
+vec3 normal = vec3(0.0, 1.0, 0.0);
+normal.x = -bumpfactor * (waterMap(hitPos.xz + vec2(eps, 0.0)) - waterMap(hitPos.xz - vec2(eps, 0.0))) / (2.0 * eps);
+normal.z = -bumpfactor * (waterMap(hitPos.xz + vec2(0.0, eps)) - waterMap(hitPos.xz - vec2(0.0, eps))) / (2.0 * eps);
+normal = normalize(normal);
+
+// Bump strength fades with distance (LOD)
+float bumpfactor = 0.1 * (1.0 - smoothstep(0.0, 60.0, distance(ro, hitPos)));
+
+// Refraction uses the built-in refract() function
+vec3 refracted = refract(rd, normal, 1.0 / 1.333);
+```
+
+### Variant 3: Ridged Noise Coastal Waves
+
+Difference from the base version: Uses `1 - abs(noise)` instead of `exp(sin)` to generate waveforms, combined with in-loop domain warping. Suitable for coastal scenes with sharper, more impactful waves that naturally connect to shore foam.
+
+Key code:
+```glsl
+float sea(vec2 p) {
+    float f = 1.0;
+    float r = 0.0;
+    float time = -iTime;
+    for (int i = 0; i < 8; i++) {        // Tunable: 8 octaves
+        r += (1.0 - abs(noise(p * f + 0.9 * time))) / f;  // Ridged noise
+        f *= 2.0;
+        p -= vec2(-0.01, 0.04) * (r - 0.2 * time / (0.1 - f)); // In-loop domain warping
+    }
+    return r / 4.0 + 0.5;
+}
+
+// Shore foam: based on distance between water surface and terrain
+float dh = seaDist - rockDist; // Water-terrain SDF difference
+float foam = 0.0;
+if (dh < 0.0 && dh > -0.02) {
+    foam = 0.5 * exp(20.0 * dh);   // Exponentially decaying shoreline glow
+}
+```
+
+### Variant 4: Flow Map Water Animation (Rivers/Streams)
+
+Difference from the base version: Adds flow-field-driven FBM animation. Uses a two-phase time cycle to eliminate texture stretching, with water flow direction procedurally generated from terrain gradients. Suitable for rivers, streams, and other water bodies with a clear flow direction.
+
+Key code:
+```glsl
+// FBM with analytical derivatives + flow field offset
+vec3 FBM_DXY(vec2 p, vec2 flow, float persistence, float domainWarp) {
+    vec3 f = vec3(0.0);
+    float tot = 0.0;
+    float a = 1.0;
+    for (int i = 0; i < 4; i++) {
+        p += flow;
+        flow *= -0.75;          // Negate + shrink each layer to prevent uniform sliding
+        vec3 v = SmoothNoise_DXY(p);
+        f += v * a;
+        p += v.xy * domainWarp; // Gradient domain warping
+        p *= 2.0;
+        tot += a;
+        a *= persistence;
+    }
+    return f / tot;
+}
+
+// Two-phase flow cycle (eliminates stretching)
+float t0 = fract(time);
+float t1 = fract(time + 0.5);
+vec4 sample0 = SampleWaterNormal(uv + Hash2(floor(time)),     flowRate * (t0 - 0.5));
+vec4 sample1 = SampleWaterNormal(uv + Hash2(floor(time+0.5)), flowRate * (t1 - 0.5));
+float weight = abs(t0 - 0.5) * 2.0;
+vec4 result = mix(sample0, sample1, weight);
+```
+
+### Variant 5: Beer's Law Water Absorption + Volumetric Scattering
+
+Difference from the base version: Replaces the simple SSS approximation with physically correct Beer-Lambert exponential decay for underwater color absorption, plus a forward scattering term. Suitable for realistic scenes requiring tunable clear/turbid water.
+
+Key code:
+```glsl
+// Beer-Lambert attenuation: red light absorbed fastest, blue light slowest
+vec3 GetWaterExtinction(float dist) {
+    float fOpticalDepth = dist * 6.0;     // Tunable: larger = more turbid water
+    vec3 vExtinctCol = vec3(0.5, 0.6, 0.9); // Tunable: absorption spectrum (R decays fast, B slow)
+    return exp2(-fOpticalDepth * vExtinctCol);
+}
+
+// Volumetric in-scattering
+vec3 vInscatter = vSurfaceDiffuse * (1.0 - exp(-refractDist * 0.1))
+               * (1.0 + dot(sunDir, viewDir));  // Forward scattering enhancement
+
+// Final underwater color
+vec3 underwaterColor = terrainColor * GetWaterExtinction(waterDepth) + vInscatter;
+
+// Fresnel compositing
+vec3 finalColor = mix(underwaterColor, reflectionColor, fresnel);
+```
+
+## In-Depth Performance Optimization
+
+### 1. Dual Iteration Count Strategy (Most Critical Optimization)
+
+Ray marching uses few iterations (12), normal calculation uses many (36). Marching only needs a rough intersection point; normals need fine wave detail. This single technique can halve render time with virtually no visual quality loss.
+
+### 2. Distance-Adaptive Normal Smoothing
+
+```glsl
+N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+```
+
+Distant normals approach `(0,1,0)`, eliminating high-frequency flickering at distance (equivalent to implicit normal mipmapping), while saving expensive normal calculations at long range.
+
+### 3. Bounding Box Clipping
+
+Precompute ray intersections with the top and bottom horizontal planes, and only march between the two intersection points. Rays pointing skyward (`ray.y >= 0`) skip water surface calculations entirely — the simplest and most effective early-out.
+
+### 4. Adaptive Step Size
+
+`pos += dir * (pos.y - height)` uses the current height difference as step size — potentially jumping large distances when far from the surface, automatically shrinking when close. 3-5x faster than fixed step size.
+
+### 5. Filter Width-Aware Normal Attenuation (Advanced)
+
+For scenes requiring more precise LOD:
+```glsl
+vec2 vFilterWidth = max(abs(dFdx(uv)), abs(dFdy(uv)));
+float fScale = 1.0 / (1.0 + max(vFilterWidth.x, vFilterWidth.y) * max(vFilterWidth.x, vFilterWidth.y) * 2000.0);
+normalStrength *= fScale;
+```
+
+Uses screen-space derivatives to automatically detect pixel coverage area — the larger the area, the flatter the normal. This is a precise implementation of manual mipmapping.
+
+### 6. LOD Conditional Detail
+
+```glsl
+if (distanceToSurface < threshold) {
+    // Only compute high-frequency detail when close to the water surface
+    for (int i = 0; i < detailOctaves; i++) { ... }
+}
+```
+
+High-frequency displacement of the water surface SDF is only calculated when close to the surface; at distance, the base plane is used directly, avoiding unnecessary noise sampling.
+
+## Combination Suggestions
+
+### 1. Combining with Volumetric Clouds
+
+Including cloud reflections in the water surface is key to enhancing realism. Steps: first perform volumetric cloud raymarching along the reflection direction `R`, then mix the cloud color as part of `reflection` in the Fresnel compositing. This is a common technique in water rendering shaders.
+
+### 2. Combining with Terrain Systems
+
+Shoreline rendering requires interaction between the water surface SDF and terrain SDF. Key technique: maintain `dh = waterSDF - terrainSDF`, and render foam when `dh ≈ 0` (`exp(k * dh)` produces exponentially decaying coastal glow). A standard technique in shoreline rendering.
+
+### 3. Combining with Caustics
+
+In underwater scenes, project the caustic texture from Variant 1 onto the underwater terrain surface. Modulate caustic intensity as `caustic * exp(-waterDepth * absorption)` for depth-based attenuation.
+
+### 4. Combining with Fog/Atmospheric Scattering
+
+Distant water surfaces must blend into atmospheric fog. Use an independent extinction + in-scatter fog model (not a simple lerp), with each RGB channel attenuating independently:
+```glsl
+vec3 fogExtinction = exp2(fogExtCoeffs * -distance);
+vec3 fogInscatter = fogColor * (1.0 - exp2(fogInCoeffs * -distance));
+finalColor = finalColor * fogExtinction + fogInscatter;
+```
+
+### 5. Combining with Post-Processing
+
+- **Bloom**: Sun specular highlights on the water surface need bloom to look natural; Fibonacci spiral blur works better than simple Gaussian
+- **Tone Mapping**: ACES is the standard choice for ocean scenes, preserving sun highlights while compressing shadows
+- **Depth of Field (DOF)**: Focusing on mid-ground waves with near and far blur greatly enhances cinematic quality (post-process bokeh DOF)
--- a/skills/shader-dev/reference/webgl-pitfalls.md
+++ b/skills/shader-dev/reference/webgl-pitfalls.md
@@ -0,0 +1,41 @@
+# WebGL2 Pitfalls Reference
+
+This is a reference document for the [webgl-pitfalls](../techniques/webgl-pitfalls.md) technique.
+
+## Complete Error Message Reference
+
+| Error Message | Likely Cause | Solution |
+|---|---|---|
+| `'fragCoord' : undeclared identifier` | Using `fragCoord` instead of `gl_FragCoord.xy` in WebGL2 | Replace with `gl_FragCoord.xy` |
+| `'' : Missing main()` | Fragment shader has no `main()` function | Add `void main() { mainImage(fragColor, gl_FragCoord.xy); }` wrapper |
+| `'functionName' : no matching overloaded function found` | Wrong argument types OR function declared after use | Check types; reorder or forward-declare functions |
+| `'return' : function return is not matching type:` | Return expression type doesn't match declared return type | Verify `vec3 foo()` returns `vec3`, not `float` |
+| `#version` must be first | Leading whitespace when extracting from script tag | Use `.trim()` on shader source string |
+| Uniform returns `null` from `getUniformLocation` | Uniform optimized away for being unused | Ensure uniform is actually referenced in shader code |
+
+## Type Mismatch Examples
+
+```glsl
+// ERROR: terrainM expects vec2, passing vec3
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos + h * nor);  // Wrong: pos + h*nor is vec3
+}
+// FIX: Extract xz components
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos.xz + h * nor.xz);  // Correct: vec2
+}
+```
+
+```glsl
+// ERROR: can't access .z on vec2
+vec2 uv = vec2(1.0, 2.0);
+float z = uv.z;  // Wrong: vec2 has no .z
+// FIX: use proper swizzle or conversion
+float z = uv.y;  // Or if you need third component, use vec3
+```
+
+## GLSL ES 3.0 Specific Notes
+
+- All declared `uniform` variables must be used in shader code, otherwise compiler may optimize them away
+- When `gl.getUniformLocation()` returns `null`, setting that uniform triggers `INVALID_OPERATION`
+- Loop counters must be deterministic at runtime — avoid compile-time constant folding issues