Interactive World Model

Click the action buttons or use your keyboard to inject actions into the world model and see how it responds.

Space

This is how agents learn — by acting and observing the result.

Policy learning inside world models vs learning from video

World models predict the next frame conditioned on the agent's action, creating an interactive simulator where RL policies can train through trial and error. Video models only predict frames from visual history with no action conditioning, so they cannot close the feedback loop needed for policy learning.

World Model

INTERACTIVE6 FPS
First Frame
Current Frame (State)
No action input
LEARNING PIPELINE
Trajectory
4s
(st, at, rt, st+1)
Episode
τ
rollout collected
Gradient
∇ L(θ)
from real (a, r) pairs
Policy Update
πθ(a|s)
closed loop

Video Model

NO ACTIONS6 FPS
First Frame
Current Frame (State)
No action input
LEARNING PIPELINE
Trajectory
4s
(st, st+1) — frames only
Missing Signal
no at, no rt
actions & rewards unknown
Policy
πθ(a|s)
cannot learn — no feedback loop