Interactive World Model

Click the action buttons or use your keyboard to inject actions into the world model and see how it responds.

←

→

↑

↓

Space

This is how agents learn — by acting and observing the result.

Policy learning inside world models vs learning from video

World models predict the next frame conditioned on the agent's action, creating an interactive simulator where RL policies can train through trial and error. Video models only predict frames from visual history with no action conditioning, so they cannot close the feedback loop needed for policy learning.

World Model

INTERACTIVE6 FPS

First Frame

→

Current Frame (State)

No action input

LEARNING PIPELINE

Trajectory

4s

(s_t, a_t, r_t, s_t+1)

→

Episode

τ

rollout collected

→

Gradient

∇ L(θ)

from real (a, r) pairs

→

Policy Update

π_θ(a|s)

closed loop

↻

Video Model

NO ACTIONS6 FPS

First Frame

→

Current Frame (State)

No action input

LEARNING PIPELINE

Trajectory

4s

(s_t, s_t+1) — frames only

→

Missing Signal

no a_t, no r_t

actions & rewards unknown

→

Policy

π_θ(a|s)

cannot learn — no feedback loop

✗