World Model
INTERACTIVE6 FPSFirst Frame
→
Current Frame (State)
No action input
LEARNING PIPELINE
Trajectory
4s
(st, at, rt, st+1)
→
Episode
τ
rollout collected
→
Gradient
∇ L(θ)
from real (a, r) pairs
→
Policy Update
πθ(a|s)
closed loop
↻
Video Model
NO ACTIONS6 FPSFirst Frame
→
Current Frame (State)
No action input
LEARNING PIPELINE
Trajectory
4s
(st, st+1) — frames only
→
Missing Signal
no at, no rt
actions & rewards unknown
→
Policy
πθ(a|s)
cannot learn — no feedback loop
✗