πŸ…   CANVAS A
Canvas A Β· Tutor Studio Β· Dashboard

Where we are.

2026-04-29 Β· D1–D4 dimensional spine Β· Phase 1 active

Shareable URL β€” open in any browser
physolympiad.com/canvasa
also available at canvasa.physolympiad.com (where it actually serves) Β· local dev at localhost:8765

What's live

Canvas A is a working interactive physics tutor staged for collaborator testing. The Studio landing page lets a student start from a curated demo card, type any topic, paste any wiki/article URL, or upload a PDF.

Key capabilities shipped

F016
Staging deploy β€” physolympiad.com/canvasa redirects to canvasa.physolympiad.com (EC2 + nginx + Let's Encrypt + Cloudflare DNS).
F017
Hello layer + background pipeline β€” POST /api/generate-lesson returns immediately with a hello-audio URL; tutor speaks within ~1s while generation runs in a background thread.
F018
Bridging dialogue β€” 28 pre-synthesized Antoni utterances across hello / action / idle / transition. Plays through latency windows so there's no dead air, ever.
F019
Tutor personality v0 (locked) β€” voice (Antoni), tone (warm-rigorous), demeanor (asks more than tells), recurring phrases. Locked spec drives all spoken content.
F020
Studio landing page β€” 6 input mode tiles + 20 pre-cooked demo cards (10 HS + 10 UG) + free-text input + πŸ… track banner.
F021
Source ingestion module β€” per-domain dispatcher: PhysOlympiad wiki β†’ JSON API + markdown render Β· Wikipedia β†’ REST API + section extraction Β· generic HTML β†’ readability fallback Β· PDF β†’ pypdf. Content-hashed source IDs, idempotent.
F022
Live URL/PDF endpoints β€” POST /api/generate-from-url and /api/generate-from-pdf. Tutor reads the source, then teaches it. Bridging covers the ~60–90s wait. Demo cards now autoplay on click (?autoplay=1).
F023
20 demo cards source-conditioned β€” 19/20 lessons now reflect their real source URL (15 PhysOlympiad wiki + 5 Wikipedia). Maxwell's equations is the lone holdout (still needs investigation).
F029
JSON robustness — bumped max_tokens 8K→16K (root cause of half the F023 failures was Claude truncation) + added a repair-retry pass when first-pass JSON fails to parse.

Known gaps β€” staged for Phase 1.5

  • Voice ↔ visuals not synchronized. βœ… FIXED 2026-04-29 (M5): word-level focus cues. Author emits [focus:elem-id]…[/focus] markers; runtime polls audio.currentTime and toggles tutor-focus by char-position approximation.
  • Lessons feel demoware. βœ… FIXED 2026-04-29 (M2B): 9-phase pedagogical arc in the SYSTEM_PROMPT (HOOK β†’ PRIOR β†’ CORE β†’ SETUP β†’ DERIVATION β†’ EXAMPLE β†’ TRAP β†’ VERIFY β†’ CONSOLIDATE) + second-pass critique-revise rubric (rewrites if any dim < 4). Lesson re-cook running on EC2 to surface this across the corpus.
  • Lesson length is fixed at 8–12 beats βœ… FIXED 2026-04-29 (M2B): beat-count guidance now scales with complexity β€” 7-10 simple / 10-13 standard / 14-20 complex.
  • 1 of 20 demo cards (Maxwell) couldn't be source-conditioned β€” Claude consistently emits malformed JSON for that source. Falls back to topic-only version (still works as a card click). [16K-token bump should help β€” verify after re-cook.]

Module status β€” post M1-M9 push (2026-04-29)

9-module architecture (per CANVAS_TUTOR_ARCHITECTURE_v1.md). All modules at 🟠⁺ or better. Full module roadmap β†’

M1
Input Handler Β· 🟑 solid β€” 6/6 input modes; 10 question cards on landing page (auto-fill on click).
M2A
Cached lessons Β· 🟠⁺ developing+ β€” patch_lessons.py shipped runtime helpers into 29 cached HTMLs; full re-cook running on devbox tmux.
M2B
Live ingest+gen Β· 🟠 developing β€” 9-phase pedagogical arc, beat-count by complexity, source-faithfulness, M9 footprint contract, critique-revise.
M3
First-Response Β· 🟑 solid β€” input-aware hellos: hello_url / hello_pdf / hello_blank / hello_question (16 utterances, 15 synthesized).
M4
Streaming Engine Β· ❌ deferred β€” not in this push.
M5
Skill Executor Β· 🟠 developing β€” word-level focus cues via [focus:elem-id]…[/focus] markers + scheduleFocusCues() runtime poller.
M6
Interruption FSM Β· 🟠 developing β€” state transitions formalized.
M7
Response Router Β· 🟑 solid β€” 5-way classifier (INLINE / TANGENT / PARK / REFUSE / CLARIFY) with confidence + rationale.
M8
Activity Indicator Β· 🟠 developing β€” visual progress modal, 4 sub-states with heuristic timing.
M9
Board State + Layout Β· 🟠 developing β€” Phase A: 6Γ—4 footprint contract. Phase B: window.boardState with permanence levels. Phase C (active reconciliation) deferred.

Status by the numbers

29
features tracked
26
live
0
in flight
2
partial
∞
future phases
D1
Activation
What can the tutor start from? How fast?
D2
Board craft
What can the tutor do on the blackboard?
D3
Interjection
What when the student breaks in?
D4
Engagement
How does the tutor pull participation?
Open in your browser
physolympiad.com/canvasa
Click the link above β€” it'll redirect to canvasa.physolympiad.com

Four things to test

Each takes <2 minutes. Try them in order β€” they go from instant (cached) to slow (live ingestion).

SCENARIO 1 Β· INSTANT

Click any of the 20 demo cards

  1. Open physolympiad.com/canvasa.
  2. Scroll to the HIGH SCHOOL or UNDERGRAD sections β€” 20 demo cards.
  3. Click any card (e.g. "Simple Harmonic Motion" or "Bohr Model of Hydrogen").
  4. The lesson page should auto-play within ~1 second of landing β€” Antoni's voice + scene elements drawing on the board.
  5. If browser blocks autoplay you'll see a "Tap to begin" overlay β€” one tap unblocks audio.

Expected feel: instant. No waiting. No dead air. Watch for: voice-visual sync is broken β€” the narration won't track which element is being talked about. This is the Phase 1.5 priority.

SCENARIO 2 Β· ~30–60s WITH BRIDGING

Type a free-text topic

  1. On the landing page, scroll to the "Or, just type a topic" panel.
  2. Type something the demo cards don't have (e.g. Bernoulli's principle, Doppler effect, Compton scattering).
  3. Hit Enter or click Generate.
  4. Within ~1 second the tutor should say "Lovely. Let me set this up β€” give me just a moment." (or one of 5 hello variants).
  5. Every ~10 seconds the tutor speaks another bridging line: "Picking the milestones…" β†’ "Sketching the first diagram…" β†’ "Almost there β€” just polishing…"
  6. After ~30–60 seconds the page navigates to the new lesson and auto-plays.

Expected feel: continuous engagement, no spinner, never silent.

SCENARIO 3 Β· ~60–90s LIVE INGESTION

Paste a Wikipedia URL

  1. Scroll to the "External wiki / URL" panel.
  2. Paste any Wikipedia URL β€” e.g. https://en.wikipedia.org/wiki/Bernoulli%27s_principle or https://en.wikipedia.org/wiki/Wave-particle_duality.
  3. Click Ingest.
  4. Tutor speaks the hello, then bridging utterances cover ~60–90s while: server fetches the URL, parses the article, conditions Claude on the actual content, synthesizes audio.
  5. Page navigates to the lesson β€” and the lesson should genuinely reflect that specific article, not just generic knowledge of the topic.

Also try a PhysOlympiad wiki URL: https://physolympiad.com/wiki/biot-savart-ampere

SCENARIO 4 Β· ~60–90s PDF UPLOAD

Drop a PDF

  1. Scroll to the "Upload PDF" panel.
  2. Choose any text-based physics PDF (a chapter, paper, problem set).
  3. Click Upload.
  4. Server extracts text via pypdf, conditions the lesson on it, runs the same hello-layer flow.
  5. Lesson plays on the topic the PDF discusses.

Limitation: scanned-image PDFs (no embedded text) won't work yet β€” needs OCR which isn't built. Text-PDFs only for now.

Mid-lesson interactions (any scenario above)

  • Type a question in the "ask anything" input at the bottom of the lesson page β†’ tutor pauses, answers in a side panel or opens a sub-board for derivations, then resumes.
  • Push to talk β€” hold the mic button, ask aloud, release β†’ Whisper transcribes, tutor answers same way as text question.
  • Pause / Restart β€” top-right buttons.
  • Navigate beats β€” Previous / Next at the top of the lesson.

Phase 1.5 β€” Voice-visual sync + Lesson quality

Phase 1 (Activation engine) is mostly done. Before moving to Phase 2 (Board craft), we need to close two visible quality gaps the user identified during testing: (1) voice and visuals don't talk to each other, (2) lessons feel like demoware, not real teaching. Phase 1.5 is dedicated to fixing both.

Track A β€” Voice-visual coordination

F024 Β· D2
Word-level timing pipeline
Use ElevenLabs with-timestamps endpoint to get per-character timing alongside the synthesized audio. Store as lesson.beats[i].word_timings. Without this, no real sync is possible.
~1 day Β· per-beat re-synthesis required
F025 Β· D2 Β· D4
Inline focus markers in narration
Update the lesson author prompt to write [focus:elem-id]…[/] blocks inline. The phrase "the magnetic field B" gets tagged with the B-vector's element ID. Source of truth for "what's being talked about right now."
~half day prompt + re-cook 20 cards
F026 Β· D2
Runtime cue engine
Wire word timings + focus markers into the runtime. At each cue's start time, fire the existing highlighter underline (F011, locked but unintegrated) on the target element. Optional secondary cue: chalk-glow / arrow / pulse depending on element type.
~1 day Β· port F011 highlighter to runtime

Track B β€” Lesson pedagogical quality

F027 Β· D4 Β· quality
Richer system prompt + critique-and-revise loop
Current prompt asks for "8–12 beats." It should specify the pedagogical arc (hook β†’ prior knowledge β†’ core insight β†’ derivation β†’ worked example β†’ trap β†’ consolidation), demand source citations, require unit checks, ban demoware patterns. Add a second-pass critic that rewrites if the first pass misses any rubric dimension.
~1 day Β· re-cook 20 cards
F028 Β· quality
Hand-edit golden lessons (the 20 demo cards)
Mukesh + a physicist walk through each demo card lesson, edit narration + scene choices + cues. These become the gold standard the eval harness scores future lessons against.
~3–5 days elapsed Β· ~2 hours per lesson active editing
F030 Β· quality
Eval harness v0
Sympy unit/dimension checks on every numeric claim + LLM-judge rubric (correctness, pedagogy, clarity, depth) on every generated lesson. CI gate: rubric < threshold = fail. From Phase 0 of the v3 roadmap, pulled forward.
~1 day

Track C β€” Loose ends

F031
Fix Maxwell's equations card
Maxwell is the 1 of 20 cards that couldn't be source-conditioned (Claude consistently emits malformed JSON for that source). Investigate: try Wikipedia URL fallback, check if the source markdown has unusual unicode, try with different model.
~30 min
F032 Β· D2
Variable lesson length + branching pathways
Lessons should scale with topic complexity (Newton's 2nd law: 6 beats; Maxwell's equations: 25 beats; Lagrangian: 18 beats). Some lessons should support pathways β€” e.g., "fast track" vs "deep dive" or "historical motivation" vs "quick math." Architectural change, not a quick fix.
~2–3 days

Order of operations

Recommended sequence for Phase 1.5:

  1. F024 + F025 + F026 in parallel β€” voice-visual sync (~3 days).
  2. F027 β€” richer prompt + critique loop (~1 day) β€” produces draft golden lessons.
  3. F031 β€” fix the Maxwell card (~30 min, parallel with F027).
  4. F028 β€” hand-edit the 20 cards into actual gold-standard lessons (~3–5 days elapsed).
  5. F030 β€” eval harness so future generations are gated.
  6. F032 later β€” variable length + pathways once gold-standard set is locked.

When all of Phase 1.5 is done: tutor speech is synchronized to what the student sees on the board, the 20 demo lessons are genuinely teaching (not demoware), and the pipeline has automated quality gates. Then Phase 2 (Board craft v1) starts on solid ground.

14 phases β€” D1–D4 dimensional spine

Click any phase header to expand. Active phase + partials are expanded by default.