# Spore Animation Studio **A procedural authorship surface for the multiplane primitive. Feature-length film for pennies in Haiku calls + your own electricity.** The studio is not a video pipeline. It's an engineering instance of the **multiplane primitive** — z-ordered flat objects + camera traversal across them + per-layer modulation under attentional constraint — which recurs across visual phenomenology (Husserl's horizonal structure), cinematic technique (Disney multiplane camera, 1937), interactive video games (Paper Mario, 2000), and now procedural code. Same primitive at four substrates. See [`docs/multiplane_substrate.md`](docs/multiplane_substrate.md) for the working paper. Turn an English script into character-persistent video on a laptop CPU, in minutes, for the electricity alone. Characters are built by **parametric programs** that read YAML and draw sprites deterministically — no AI required for rendering. One Claude Haiku call (~$0.0015 cached) formalizes a free-text script into a structured project YAML; the procedural engine renders it across multi-core CPU with parallel post-process passes. The industry spends billions on per-frame neural inference trying to approximate character persistence; we ship a procedural mint that reads the character's parameters and draws the same person every time, and exposes the phenomenological knobs (depth, atmospheric haze, focal sharpness, felt-velocity) as direct API. **Live state (2026-04-27):** dashboard at `localhost:5000` (run `STUDIO.bat`) — 11 sections covering matrix / projects / styles / cast / docs / tools / sources / pipeline / commits / outputs / gaps. **Studio Console v0.2** with 17-threat security model (loopback only, CSRF + Origin + Host validated, allowlisted markdown viewer, `X-Frame-Options` / `nosniff` / `same-origin` headers, theme cookie HttpOnly). **SPOREFACE theme** toggle for 2005-vibes mode (✦ button in topbar). **Compute target dispatch** — Render button defaults to Box A via SSH+rsync; flip to laptop with the dropdown or `STUDIO_COMPUTE_TARGET=local`. **Parallel-by-default** — `studio.parallel` is the canonical helper for new code; same topology on every server. 17 video types; 10 style backends MVP + 1 photoreal scaffold; 6 source modules; 56-character cast (4 originals + 26 alphabet humanoid citizens + 26 letter-form glyph beings); Compose tab v0.1 (form) + v0.2 (drag-drop canvas) + v0.3 (auto-render); SRT auto-export; multi-aspect (7 platforms) parallel emission; Box A SSH render bridge; Blender photoreal scaffold + one-shot preflight script (`scripts/setup_photoreal_box_a.sh`). Landing site at [`site/`](site/) ready to deploy. --- ## Table of contents 1. [See it work](#see-it-work) 2. [Thesis](#thesis) 3. [Repository layout](#repository-layout) 4. [Quickstart](#quickstart) 5. [Smoke tests — each track in isolation](#smoke-tests) 6. [End-to-end — article to film](#end-to-end) 7. [Architecture](#architecture) 8. [Characters](#characters) 9. [Scenes — layered parallax backdrops](#scenes) 10. [Camera primitives](#camera-primitives) 11. [Renderer backends](#renderer-backends) 12. [Caption / UI rendering](#caption-ui) 13. [Mouth animation (2D and 3D)](#mouth-animation) 14. [Authoring new characters](#authoring) 15. [Running on Box A (headless Linux)](#box-a) 16. [Roadmap](#roadmap) --- ## See it work | MP4 | What it proves | |---|---| | [`tv/output/camera_smoke.mp4`](tv/output/camera_smoke.mp4) | All 6 camera primitives in 36 seconds — locked_wide, talking_head, zoom_reveal, parallax_pan, tracking_shot, zoom_push | | [`tv/output/scenes_smoke.mp4`](tv/output/scenes_smoke.mp4) | All 3 layered backdrops in 27 seconds — shoreline, forest, cityscape, each with depth-per-layer parallax | | [`tv/output/mushroom_smoke.mp4`](tv/output/mushroom_smoke.mp4) | Non-humanoid template (mycelial_sage) speaking in a forest with camera zoom + caption that survives the crop | | [`tv/output/shoreline_ape_voiced.mp4`](tv/output/shoreline_ape_voiced.mp4) | 16:50 narrated essay-film, 4 characters in rotation, walk cycles, ambient scenes, phonetic bridges, AAC audio | | [`tv/output/blender_polish_v6.png`](tv/output/blender_polish_v6.png) | 3D polished humanoid — Cycles render on Box A, 7-head-heights proportions, face geometry, mouth-swap | | [`tv/output/backend_comparison.png`](tv/output/backend_comparison.png) | Side-by-side of the 3D backend vs the 2D pipeline | ## Thesis Sora, Veo, Runway, Pika, Kling, and Luma compete on **per-frame inference quality**. Character consistency is not a property of any single frame; it's a property of the *sequence*. No amount of per-frame inference gives you a character that remains the same character across two minutes, a hundred shots, or fifty episodes. The incumbents add parameters and compute to approximate the right answer; character-identity drift is still the #1 failure mode of every public video model in 2026. This library takes a different route entirely. Character identity is a **data object** (YAML parameters + deterministic draw program). The sprite generator is a **parametric program** that reads those parameters and produces the same sprite set every time — no training, no inference, no stochastic sampling. Rendering is **deterministic composition** (Pillow + ffmpeg, optionally Blender). AI is not in the hot path — it is one optional *backend* for the sprite-drawing step alongside the procedural backend, the Blender backend, and future SVG / watercolor backends. The reference implementation is the [mushroom_creature mint](tv/sprites/mint_mushroom.py): a ~200-line Python program that reads a YAML, draws the character with Pillow primitives (stalk, cap, spore dots, ring, eyes, mouth variants), and emits the full variant set in ~1 second at $0.00 cost. **That is the architecture.** Adding a new character means writing a new YAML and (if the template exists) running the mint — no model access, no API key, no training data, no drift. ### Economics | | Frontier AI video | This pipeline (procedural) | This pipeline (AI backend, optional) | |---------------------------------|--------------------------|------------------------------------------|------------------------------------------| | Cost per new character | $0 (just prompt, drifts) | $0 (parametric program) | ~$0.15 (one-time flux-schnell mint) | | Cost per second of video | $0.10 – $0.30 | ~$0.00 (CPU render) | ~$0.00 (CPU render) | | Character consistency | Drifts per frame | Bit-exact (same YAML = same pixels) | Locked by YAML | | Time to create a new character | seconds (but drifts) | seconds (YAML edit + `python -m mint`) | ~3 minutes (API round-trip) | | Max output length | ~10 seconds | Unlimited | Unlimited | | Determinism | Nominal seed, silent drift | bit-identical from params_hash | bit-identical (sprites cached) | | Runs on | GPU cluster | Any CPU | Any CPU (mint once, render forever) | | Rights-cleanliness | Opaque training data | 100% clean (we drew it) | Known flux-schnell provenance | --- ## Repository layout ``` spore-animation-studio/ ├── engine/ # DSL + dispatch substrate │ ├── parameter_vocab.yaml # the DSL Haiku speaks (camera/scene/template names) │ ├── render_registry.yaml # backend routing (pixel_pillow, blender_headless, ...) │ ├── store_layout.yaml # hot/cold/fresh cache schema │ ├── template_schema.yaml # what a template is │ ├── dispatcher.py # scene-YAML runtime; imperative_script bootstrap │ ├── templates/ │ │ ├── humanoid.yaml # first production template (32 voice params) │ │ └── mushroom_creature.yaml # non-humanoid — stalk + cap geometry │ └── scenes/ # scene-definition YAMLs ├── tv/ # the 2D pipeline │ ├── silent_film.py # main frame renderer (speaking/walking/ambient) │ ├── camera.py # 6 camera primitives + apply_camera crop+resize │ ├── scenes/ # layered backdrop compositor │ │ ├── layered.py # LayeredScene + Layer dataclasses │ │ ├── shoreline.py # tide + horizon silhouettes + wet sand │ │ ├── forest.py # canopy + trunks + leaf litter + foliage │ │ └── cityscape.py # skyline + window grids + lamp posts │ ├── characters/ # one YAML per character (canonical identity) │ │ ├── editor_prime.yaml # humanoid template │ │ ├── reporter_field.yaml # humanoid │ │ ├── reporter_tech.yaml # humanoid │ │ ├── spore_oracle.yaml # humanoid │ │ └── mycelial_sage.yaml # mushroom_creature template │ ├── sprites/ # minting tools │ │ ├── mint.py # AI (flux-schnell) mint for humanoids │ │ └── mint_mushroom.py # procedural mint for mushroom_creature │ ├── voice/ # voice stack (Phase 2) │ │ ├── espeak_voice.py # espeak-ng wrapper (32 voice params → flags) │ │ ├── refine.py / optimize.py # naturalness foldtoy │ │ ├── phonetic_bridges.py # JIT per-pair bridge optimizer │ │ ├── apply_bridges.py # runtime boundary smoothing │ │ ├── parametric_tts.py # Phase 2.5 MVP: pure pyworld, espeak-less │ │ └── articulatory/ # Phase 2.5→4: inverse-rendering fold pipeline │ ├── article_to_script.py # HTML → Script code (auto split + speaker assign) │ ├── silent_film.py # main frame renderer │ ├── compose.py / compose_voiced.py # ffmpeg mux (silent / voiced variants) │ ├── voiced_film.py # voice-enabled render driver │ ├── *_smoke_test.py # per-track smoke tests (see below) │ └── output/ # generated MP4s (gitignored) ├── backends/ # pluggable renderers │ ├── pixel_pillow/ # (implicit — the tv/ 2D pipeline is this) │ └── blender_headless/ # Blender 3.6 headless (Cycles) │ ├── bpy_humanoid.py # runs INSIDE Blender via --background --python │ ├── render_scene.py # system-Python wrapper (subprocess bridge) │ └── render_mouth_cycle.py # proof of mouth-swap at shape granularity ├── examples/ # shipped reference MP4s ├── store/ # .gitignored content-addressed cache └── ROADMAP.md ``` --- ## Quickstart ```bash pip install -r requirements.txt cd tv # Render the 16:50 narrated reference film (Shoreline Ape, silent variant) python shoreline_ape_script.py # Render a camera-primitives reel in 36 seconds python camera_smoke_test.py # Render every shipped backdrop (shoreline + forest + cityscape) python scenes_smoke_test.py # Render the non-humanoid template (mushroom in a forest) python mushroom_smoke_test.py # Turn a Daily Spore Report article into a silent film python article_to_script.py bridge_goes_live_article.html --out my_script.py python my_script.py ``` All outputs land in `tv/output/`. All are deterministic — same script + same assets = bit-identical MP4. --- ## Smoke tests — each track in isolation Each capability has a smoke test that exercises it in ~30 seconds of video and takes ~2 minutes of render wall-clock. | Test | What it renders | What it proves | |---|---|---| | [`camera_smoke_test.py`](tv/camera_smoke_test.py) | Editor Prime locked_wide → Spore Oracle talking_head → Reporter Tech zoom_reveal → Reporter Field tracking walking → 2 ambient zoom variants | 6 camera primitives, UI-overlay-after-transform (Track B) | | [`scenes_smoke_test.py`](tv/scenes_smoke_test.py) | 3 × 5s ambient scenes, each with scrolling parallax | Layered backdrops at differential depth (Track C) | | [`mushroom_smoke_test.py`](tv/mushroom_smoke_test.py) | Mycelial Sage speaking in a forest, talking_head zoom | Non-humanoid template + speaker in themed backdrop + caption survives zoom (Track D + caption fix) | | [`compare_backends.py`](tv/compare_backends.py) | Side-by-side PNG: Blender 3D humanoid vs pixel_pillow 2D | Both renderers produce comparable output | | [`backends/blender_headless/render_mouth_cycle.py`](backends/blender_headless/render_mouth_cycle.py) | 5 stills, one per mouth shape, identical camera | 3D mouth-swap mechanism (Track A polish) | Run all five in about 10 minutes on a laptop CPU. --- ## End-to-end — article to film ### One command, one article, one MP4 ```bash cd tv python article_to_script.py --out python ``` `article_to_script.py` does: 1. Parses the HTML → extracts title, subline, kicker, paragraphs 2. Splits each paragraph on sentence boundaries into **caption-safe chunks** (≤180 chars, so a 3-line wrapped caption fits) 3. Assigns speakers via keyword matching: - `reporter_tech` ← "token", "service", "api", "bridge", "architecture", ... - `reporter_field` ← "watch", "observ", "shoreline", "corpus", "record", ... - `spore_oracle` ← "consciou", "meaning", "substrate", "geometry", ... - `editor_prime` ← "edition", "tonight", "we file", "dispatch", ... 4. Picks a backdrop per speaker: - editor_prime → shoreline - reporter_tech → cityscape - reporter_field → shoreline - spore_oracle → forest 5. Varies `camera_move`: - spore_oracle → talking_head (portrait zoom) - speaker-change transitions → zoom_reveal - otherwise → locked_wide 6. Inserts an AmbientScene every ~6 Lines for breathing room 7. Emits a Python Script source file you can inspect and tweak The emitted script is plain Python that imports `silent_film.Script / Line / AmbientScene / render_script`. Running it writes an MP4 into `tv/output/`. ### Reference run [`tv/bridge_goes_live_script.py`](tv/bridge_goes_live_script.py) is generated from *[The Bridge Goes Live: How Prometheus7 Wired Its Newsroom to the Organism](tv/bridge_goes_live_article.html)* — the Apr 22 2026 Daily Spore Report article about our own infrastructure. 60 segments, 4 characters, 3 backdrops, ~5 minutes of video. --- ## Architecture ``` article.html or dialogue script │ ▼ ┌──────────┴──────────┐ │ article_to_script │ ← sentence split, speaker assign, │ (optional) │ backdrop pick, camera vary └──────────┬──────────┘ ▼ Script / Line / AmbientScene │ ▼ ┌────────────┴─────────────┐ │ MINT TIME (one-time) │ │ character YAML + AI │ ← humanoids: flux-schnell ~$0.12 │ produces locked sprites │ ← mushroom: procedural, $0.00 └────────────┬─────────────┘ ▼ ┌────────────┴─────────────┐ │ RENDER TIME (per-film) │ │ silent_film orchestrator │ │ ├─ scenes registry │ ← layered parallax backdrop │ ├─ camera.resolve_state │ ← 6 camera primitives │ ├─ sprite[mouth_shape] │ ← per-frame mouth-variant lookup │ └─ apply_camera + UI │ ← camera crop+resize, then HUD overlays │ writes PNG sequence to tv/frames/ │ └────────────┬─────────────┘ ▼ ffmpeg compose │ ▼ final MP4 ``` The **load-bearing commitments**: - Identity lives in YAML + sprite files, never in the frame-generator at render time - Each mouth state is **its own sprite file**, not an overlay on a single sprite (see [Mouth animation](#mouth-animation)) - UI overlays (caption, kicker, phoneme) are painted AFTER the camera transform, so they pin to output-frame space instead of getting cropped by zoom --- ## Characters Every character is a single YAML in [`tv/characters/`](tv/characters/). The file declares: - `id`, `template` (humanoid | mushroom_creature), `template_version` - `identity` — seed, archetype, age/gender reads, distinguishing marks, canonical palette - `morphology` (mushroom_creature) or `silhouette` (humanoid) - `mouth_anchor` — normalized `(cx, cy) ∈ [0,1]` for placing mouth overlays at any resolution - `resolutions` — which sprite sizes to mint - `variants` — mouth × emotion × pose matrix - `styles` — canonical / allowed / forbidden style registers - `voice` — ElevenLabs / parametric_tts settings - `editorial` — desk, topics, tone keywords - `prompt` — how the mint model sees them (humanoids only) ### Shipping cast | Character | Template | Desk | Mint backend | Cost | |------------------------|---------------------|------------------------------------|----------------------------|---------| | **The Mycelial Sage** | **mushroom_creature** | Spore Council (elder) | **procedural (Pillow)** | **$0.00** | | **The Truffle Hermit** | **mushroom_creature** | Spore Council · Underground | **procedural (Pillow)** | **$0.00** | | **Reporter Weather** | **humanoid** | Atmospheric desk | **procedural (Pillow)** | **$0.00** | | The Spore Oracle | humanoid | Symbolic Cognition | AI (flux-schnell, v0.1)¹ | ~$0.12 | | Editor Prime | humanoid | Managing editorial | AI (flux-schnell, v0.1)¹ | ~$0.12 | | Reporter Field | humanoid | Shoreline / field dispatch | AI (flux-schnell, v0.1)¹ | ~$0.12 | | Reporter Tech | humanoid | Technology desk | AI (flux-schnell, v0.1)¹ | ~$0.12 | ¹ *The four v0.1 humanoid characters still ship with their original AI-minted sprites. They can be re-minted procedurally with [`mint_humanoid.py`](tv/sprites/mint_humanoid.py) at any time; the existing sprites are kept for aesthetic compatibility with the already-shipped films.* The Mycelial Sage and Reporter Weather are the reference implementations of the procedural path. Each is defined by a YAML of parameters, and a ~200-line Python program reads those parameters and draws the character with Pillow primitives. You get a complete, minted character in one second without talking to any model. --- ## Scenes — layered parallax backdrops Each scene is a stack of `Layer` objects; each Layer has a `depth ∈ (0, 1]` and a `paint(scaled_offset_x, W, H) → PIL.Image`. The `LayeredScene` passes `int(offset_x * layer.depth)` to each layer, so a layer at depth 0.1 drifts slowly (sky), a layer at 1.0 moves at full camera speed (foreground). ### Shipped scenes Each scene is a Python module in [`tv/scenes/`](tv/scenes/). | Scene | Layers (depth) | Distinct look | |---|---|---| | **shoreline** | sky (0.15) · horizon silhouettes (0.50) · shore plane + kelp (0.85) · static vignette | Cream parchment sky, tide line, distant structures | | **forest** | sky (0.12) · canopy line (0.35) · mid trunks (0.65) · leaf litter (0.85) · foliage (1.0) · vignette | Pale warm-green sky, undulating canopy, vertical trunks | | **cityscape** | sky (0.10) · distant skyline (0.30) · lit facade grids (0.65) · street + cracks (0.85) · lamp posts + glow (1.0) · vignette | Sepia dusk, window grids, curb, occasional streetlights | Pick a scene in your script: ```python AmbientScene(duration_s=5.0, caption="The tide comes in.", scene_name="shoreline") Line(speaker="reporter_tech", text="...", scene_name="cityscape", camera_move="zoom_reveal", camera_params={"start_zoom": 1.4, "end_zoom": 1.0}) WalkingScene(speaker="reporter_field", direction="e", scene_name="forest") ``` --- ## Camera primitives Defined in [`tv/camera.py`](tv/camera.py). Each primitive is a function `(frame, total, **params) → CameraState` carrying `offset_x`, `pan_y`, `zoom`, `anchor_x`, `anchor_y`. | Name | What it does | Key params | |---|---|---| | `locked_wide` | Static wide shot (default) | — | | `parallax_pan` | Lateral scroll; layered backdrops differentiate depth | `velocity`, `direction` | | `tracking_shot` | Camera follows a subject (same mechanic as parallax_pan) | `velocity`, `direction` | | `zoom_reveal` | Pull-back: start zoomed-in, end at full frame | `start_zoom`, `end_zoom`, `anchor_x`, `anchor_y`, `curve` | | `zoom_push` | Push-in: opposite of zoom_reveal | same as zoom_reveal | | `talking_head` | Portrait crop around character head/shoulders | `zoom`, `anchor_y` | **Two-layer model:** `offset_x` is consumed upstream by backdrop layers (for parallax scroll); zoom/pan/anchor are post-composition, applied by `apply_camera(img, state, target_w, target_h)` which crops+resizes the scene canvas to the output frame. The 1280×720 output is always the same size no matter what the camera does. Usage in a script: ```python Line(speaker="spore_oracle", text="...", camera_move="talking_head", camera_params={"zoom": 1.35, "anchor_y": 0.34}) ``` --- ## Renderer backends Declared in [`engine/render_registry.yaml`](engine/render_registry.yaml). Two shipped. ### pixel_pillow (default, 2D) The `tv/` pipeline. Pillow composition of sprite PNGs against layered scene backdrops, ffmpeg mux, all on CPU. Cost: electricity. - `tv/silent_film.py:render_script` orchestrates - `tv/silent_film.py:render_speaking_frame` / `render_walking_frame` / `render_ambient_frame` produce one PNG per frame - `tv/compose.py` muxes the PNG sequence into an MP4 ### blender_headless (3D, MVP) Runs Blender 3.6 LTS headless on Box A or any Linux host. - [`backends/blender_headless/bpy_humanoid.py`](backends/blender_headless/bpy_humanoid.py) — runs INSIDE Blender's Python, builds a 7-head-heights humanoid from primitives with parametric face geometry and mouth-swap per `character.mouth_shape` - [`backends/blender_headless/render_scene.py`](backends/blender_headless/render_scene.py) — system-Python wrapper, writes scene JSON, invokes `blender --background --python bpy_humanoid.py -- ` - [`backends/blender_headless/render_mouth_cycle.py`](backends/blender_headless/render_mouth_cycle.py) — proves the mouth-swap: renders 5 stills, one per shape Current cost: ~30s/frame at 1280×720 Cycles, 32 samples. Suitable for single-frame stills and short reels; needs GPU Cycles or EEVEE+xvfb for feature-length 3D films. Installation on Box A: ```bash # Official Blender 3.6 LTS (Ubuntu apt-packaged Blender is stripped of OCIO) cd /root curl -sLO https://download.blender.org/release/Blender3.6/blender-3.6.15-linux-x64.tar.xz tar -xf blender-3.6.15-linux-x64.tar.xz mv blender-3.6.15-linux-x64 blender-3.6 apt-get install -y libsm6 libxxf86vm1 libxfixes3 libxrender1 libxkbcommon0 libgl1 ``` Invoke: ```bash python3 -m backends.blender_headless.render_scene \ --output /tmp/out.png \ --blender /root/blender-3.6/blender ``` --- ## Caption / UI rendering UI overlays (caption bar, speaker kicker, phoneme indicator) are painted in **output-frame space**, after `apply_camera` has transformed the scene. This is a deliberate render-order choice: ``` 1. Build scene canvas (backdrop + sprite) 2. apply_camera → crop+resize to output W×H 3. Paint caption bar at bottom of OUTPUT frame 4. Paint kicker at top-right of OUTPUT frame 5. Paint phoneme indicator at bottom-right of OUTPUT frame 6. Save PNG ``` Before this order: caption drawn on scene → camera zoom cropped the caption off. After: captions stay pinned no matter how the scene zooms. ### Caption wrapping Long dialogue auto-wraps to up to **3 lines × 34px**. Bar height auto-sizes to fit. Width is `W − 40 − 40 = 1200px`; `_wrap_text` splits on word boundaries. ### Caption-safe Line chunking `article_to_script.chunk_text` breaks paragraphs on sentence boundaries, keeping each Line under **180 chars** so the 3-line wrap fits comfortably. Long single sentences are further split on commas. --- ## Mouth animation (2D and 3D) Mouth animation is **sprite-swap**, never overlay. Each mouth state is its own file or its own geometry. ### 2D ```python # silent_film.render_speaking_frame picks a shape per frame: shape = _silent_mouth_for_frame(frame_num) # "closed" | "half" | "wide" sprite_path = ASSETS / f"{line.speaker}_mouth_{shape}.png" sprite = Image.open(sprite_path) # DIFFERENT FILE per shape ``` For each character, five sprite PNGs ship: ``` tv/assets/_mouth_closed.png tv/assets/_mouth_half.png tv/assets/_mouth_wide.png tv/assets/_mouth_oh.png tv/assets/_mouth_ee.png ``` The mouth is drawn INTO each sprite at mint time. No overlay layer. This is why Editor Prime's mouth visibly changes between frames in the Shoreline Ape film, and why the Mycelial Sage's mouth is clearly distinct between closed / wide / oh variants. ### 3D Same abstraction at a different level. [`bpy_humanoid.py`](backends/blender_headless/bpy_humanoid.py) reads `character.mouth_shape` from the scene JSON and constructs the mouth geometry accordingly: ```python mouth_cfg = { "closed": (1.6, 0.10, 0.12, 0.22), # thin flat slit "half": (1.3, 0.28, 0.35, 0.22), # moderately open oval "wide": (1.4, 0.45, 0.65, 0.30), # big open ellipsoid "oh": (0.9, 0.40, 0.55, 0.22), # round "oh" "ee": (1.7, 0.10, 0.22, 0.22), # wide thin (teeth line) } ``` For open shapes, a darker cavity sphere is spawned behind the mouth to read as depth. Proof: [`render_mouth_cycle.py`](backends/blender_headless/render_mouth_cycle.py) renders all 5 shapes from one character config, identical camera, only mouth geometry changes. --- ## Authoring new characters ### One command, three steps: description → parameters → sprites The shipped flow is [`tv/sprites/create_character.py`](tv/sprites/create_character.py). It takes character parameters as CLI args, writes the YAML, and runs the appropriate procedural mint — all in one invocation, no AI image generation, no API key needed. **Humanoid example** — Reporter Weather, the new atmospheric desk reporter: ```bash python -m tv.sprites.create_character \ --id reporter_weather \ --name "Reporter Weather" \ --template humanoid \ --role "Atmospheric Desk" \ --gender masculine --age forties --archetype field_reporter \ --skin "#d8b48a" --hair "#6b6158" --eyes "#3c5a7a" \ --garment "#385a7a" --accent "#c8a15a" \ --hair-style short --glasses ``` Output: - `tv/characters/reporter_weather.yaml` — the canonical identity file - `tv/assets/reporter_weather.png` — default portrait - `tv/assets/reporter_weather_mouth_{closed,half,wide,oh,ee}.png` — 5 mouth variants - `tv/assets/walk/reporter_weather_walk_{e,w}_{f1,f2}.png` — 4 walk-cycle frames **Wall-clock: ~1 second. Cost: $0.00. Determinism: same YAML seed → same pixels, every time, forever.** **Mushroom example** — the Truffle Hermit, the underground desk: ```bash python -m tv.sprites.create_character \ --id truffle_hermit \ --template mushroom_creature \ --name "The Truffle Hermit" \ --role "Spore Council · Underground Desk" \ --archetype truffle_hermit \ --cap-shape bell --stalk-height 0.38 --cap-radius 0.42 \ --cap-color "#4a2e20" --cap-accent "#d8c9a8" --stalk-color "#b8a080" \ --eye-style glowing --size small ``` Same ~1 second, same $0.00, same determinism. ### Three-step mental model ``` description parameters sprites ─────────── ────────── ────────── "a weather --skin mint_humanoid.py reporter, --hair reads the YAML, forties, --garment draws head/torso/ blue blazer, --hair-style mouth variants/ glasses" --glasses walk frames with Pillow primitives ``` Step 1 is in your head. Step 2 is the CLI call (or a YAML file you edit directly). Step 3 is the procedural mint that draws the character from YAML — a plain Python program using ellipses, rectangles, and polygons. No diffusion, no GAN, no API. ### Optional — LLM-parameterized path (text is parsed, images are not generated) If you want to go straight from a free-text description to the CLI parameters, `create_character.py --description "..."` will optionally call Claude Haiku to extract structured parameters from the text, then emit the YAML and run the mint exactly as above. **The LLM parses the description; it does not generate any pixels.** Sprite drawing is still 100% procedural. ```bash # Requires ANTHROPIC_API_KEY and `pip install anthropic` python -m tv.sprites.create_character \ --id reporter_sports \ --template humanoid \ --description "Sports reporter, late 30s, salt-and-pepper beard, green team jacket, no glasses, upbeat tone" ``` This path is optional. The default, deterministic, zero-dependency path is the structured CLI above. ### Goal state — every template has a procedural mint - ✅ `humanoid` → [`mint_humanoid.py`](tv/sprites/mint_humanoid.py) - ✅ `mushroom_creature` → [`mint_mushroom.py`](tv/sprites/mint_mushroom.py) - ⏳ `quadruped` → `mint_quadruped.py` (TBD) - ⏳ `abstract_figure` → `mint_abstract.py` (TBD) ### Fitting to a reference (inverse-rendering via fold engine) When you want a procedural sprite to match a specific *existing* reference — say, an AI-minted sprite you want to reproduce deterministically, or a hand-drawn concept art — the [`fit_to_reference`](tv/sprites/fit_to_reference.py) pipeline searches the drawing program's parameter space for the configuration whose output most closely matches the reference. ```bash python -m tv.sprites.fit_to_reference \ --char editor_prime \ --reference tv/assets/editor_prime.png \ --ticks 400 --restarts 6 ``` **What it does:** 1. [`extract_features.py`](tv/sprites/extract_features.py) pulls a feature vector from the reference PNG: median palette per region (face / hair / torso / accent), bounding box, dark-pixel centroids (eye/mouth placement), pixel density, edge density, palette warmth. Robust to parchment backgrounds via corner-BG sampling + largest-component masking. 2. [`mint_humanoid.render_portrait_from_params`](tv/sprites/mint_humanoid.py) exposes ~25 tunable parameters — palette RGB (12), silhouette (6), shading deltas (3), face-feature placement (4). 3. [`fit_to_reference.local_fold`](tv/sprites/fit_to_reference.py) seeds the palette from the reference's extracted features, then runs random-restart hill-climb to minimize `feature_distance(render(params), reference)`. 4. Winner params are written to `tv/sprites/fit/.json` + a re-render PNG for visual inspection. **Results on the v0.1 AI-minted cast:** | Character | Baseline distance | Fitted distance | Improvement | Visual match | |-----------------|-------------------|-----------------|-------------|-------------------------------------| | Editor Prime | 7.35 | 5.86 | +1.49 | Dark hair, tan skin, brown vest, gold V-collar | | Spore Oracle | ~9 | ~7 | +2 | Dark bun-like hair, deep red robe, gold accent | | Reporter Field | ~8 | ~6 | +2 | Umber jacket, warm palette | | Reporter Tech | ~8 | ~6 | +2 | Navy blazer, distinct darker palette | **The architecture win:** AI's role inverts. Instead of generating the image in production, it's specifying a *reference target* that a deterministic program learns to reproduce. The production pipeline stays 100% procedural; AI leaves fingerprints only in the form of parameter values in the YAML. Ship the YAML, ship the drawing program, never call AI again. **Known ceiling** — the fold fits the drawing program's *parameter space* to the reference, but not beyond. If the reference has a beard and the drawing program has no beard axis, the fit cannot add one; it compensates by darkening the face-bottom region. Closing those gaps means *adding axes* to the drawing program (beard_density, hair_wave_count, clothing_layer, etc.). Every new axis is a permanent capability — the fold will use it next time. **Same three-term structure as the voice work:** term 1 = baseline procedural draw; term 2 = reference refinement target; term 3 = fold engine optimizer. Sprite domain is the second falsifiable demonstration of the composition primitive; voice is the first. ### Legacy AI mint (optional, for the v0.1 humanoid cast) The four v0.1 humanoid characters (Spore Oracle, Editor Prime, Reporter Field, Reporter Tech) were originally minted via flux-schnell before the procedural humanoid mint existed. Those sprites still ship and still work; re-minting them with the procedural path produces a simpler, more-stylized portrait. The AI mint remains available for anyone who wants to re-use it: ```bash export REPLICATE_API_TOKEN=... python tv/sprites/volume_mint.py --chars ``` Cost ~$0.12, ~3 minutes wall-clock, depends on a third-party model. Not the intended path. --- ## Running on Box A (headless Linux) Box A is the Institute's render server (95.217.3.65). It's kept in rough sync with local via rsync. To run a render on Box A: ```bash # Local → Box A sync (one-time per change) scp -r tv backends engine root@95.217.3.65:/root/spore-animation-studio/ # Render remotely ssh root@95.217.3.65 'cd /root/spore-animation-studio/tv && python3 bridge_goes_live_script.py' # Pull the MP4 back scp root@95.217.3.65:/root/spore-animation-studio/tv/output/bridge_goes_live.mp4 ./ ``` ### Reasons to render on Box A instead of local - 3D renders — `backends/blender_headless` expects Blender 3.6 at `/root/blender-3.6/` - Long-form films — Box A has more RAM + no desktop overhead - Voice work — alphafoldmicro + pyworld + the full articulatory stack lives there --- ## Roadmap ### Shipped - [x] Frame compositor + title cards + act structure (`silent_film.py`) - [x] Five character templates (humanoid × 4, mushroom_creature × 1) - [x] 2D sprite-swap mouth animation (5 variants per character) - [x] Cross-resolution engine (16 / 48 / 128 / 512, one mouth anchor) - [x] Full parametric voice stack (espeak + JIT phonetic bridges, 3525 bridges) - [x] 16:50 narrated reference film (`shoreline_ape_voiced.mp4`) - [x] **Track B** — six camera primitives + UI-overlay render order - [x] **Track C** — layered parallax backdrops (shoreline + forest + cityscape) - [x] **Track D** — mushroom_creature template + mycelial_sage + procedural mint - [x] **Track A MVP** — Blender headless backend with toon-adjacent flat materials - [x] **Track A polish** — 7-head proportions, segmented limbs, face, mouth-swap, linear color - [x] Article → Script auto-parser (HTML → caption-safe Lines + speaker assignment) ### In progress / open - [ ] **Voice Phase 2.5 closure** — articulatory synthesis + exemplar-based inverse rendering (awaiting reference WAV) - [ ] **Track A outlines** — Freestyle line renderer (inverted-hull shader didn't land in Cycles) - [ ] **Phase 3** — sporeanimationstudios.com landing + GitHub sync - [ ] **Phase 5** — source modules + rights enforcement (web_image, internet_archive, user_camera, ...) - [ ] **Phase 6** — CLIP match engine + bezier tracer effects + music_video_parallel - [ ] **Phase 7** — 131 foldtoys wired onto cinematography / pacing / coherence lanes - [ ] **Phase 8** — Flask+HTMX review UI (approve/rework → Haiku → re-render) - [ ] **Phase 9** — feature-length arc decomposition (brief → acts → sequences → scenes → shots) - [ ] **Phase 10** — public launch (YouTube + Gumroad + press) --- ## Contributing / agent-friendly notes For an AI agent (like Codex) reading this and wanting to operate the pipeline: - **The architecture is programs, not models.** Character sprites come from parametric mint programs (`tv/sprites/mint_mushroom.py` is the reference). AI mint is one optional backend, not the primary path. When adding a new character template, write a procedural mint alongside it. - **Entry points for non-trivial work:** `tv/silent_film.py` (2D render loop), `backends/blender_headless/bpy_humanoid.py` (3D), `tv/article_to_script.py` (article → script), `tv/scenes/` (new backdrops), `tv/camera.py` (new camera moves), `tv/sprites/mint_mushroom.py` (the canonical procedural mint example). - **Never bypass the YAML layer.** New characters go through `tv/characters/*.yaml`; new templates go through `engine/templates/*.yaml`. The YAML is the single source of truth; every backend (procedural Pillow, AI flux-schnell, Blender) reads the same parameters. - **Mouth animation is sprite-swap, not overlay.** Any new character needs 5 separate mouth variant PNGs (or, for 3D, a `mouth_cfg` entry in `bpy_humanoid.py`). - **UI overlays belong in output-frame space**, after the camera transform. Don't draw captions on the scene canvas. - **Long text must be chunked.** 180 chars per Line keeps the 3-line caption wrap safe. `article_to_script.chunk_text` does this. - **Smoke tests are the verification path.** Run the 5 smokes before claiming a change works. --- ## License [MIT](LICENSE) ## Acknowledgments - [Prometheus7 Research Institute](https://prometheus7.com) — the research home this pipeline serves - The [Daily Spore Report](https://dailyspore.prometheus7.com) editorial team — who use this library every day - ConcernedApe and the Stardew Valley aesthetic — a one-artist studio that demonstrated what a small library of well-made assets beats - [Plate (1995)](https://www.cs.toronto.edu/~tplate/papers/) — whose Holographic Reduced Representations formalism names what the underlying substrate does --- *Built in the institute.*