Cogitator
A perceptual system that earns its beliefs.
Cogitator is a probabilistic world model for scene understanding. It associates predictive affordances with the geometry of what it sees, maintains graded beliefs about entities and their relations, and directs its attention toward what surprised it — the salient events it couldn't predict.
The premise
Detection is not understanding.
Most perception stacks treat the output of a neural net as ground truth — a label arrives, a box arrives, the system acts on it. This works until something is occluded, ambiguous, contradicted, or simply new. Cogitator takes a different stance: detections and teacher inputs are all evidence, not facts. The system maintains graded beliefs about what likely caused the evidence, and revises them under invariants that cannot be violated.
Affordances from geometry
What an object is, and what it offers.
For every entity in the scene, Cogitator derives not just its state — position, velocity, bounding box — but a belief about what the object affords. Is it graspable? Supportive? Viewable? Readable? These affordance beliefs are mandatory, carry per-affordance uncertainty, and are extracted directly from the geometric and semantic features of each detection. Affordance is what turns a shape into a hypothesis about action.
Geometry suggests a handle, a rim, a stem — something the agent could hold.
A flat upper surface at the right height — somewhere to set things down.
A flat face with contrast — pictures, screens, signs worth looking at.
A reachable extent in the environment — somewhere to go.
Attention chases surprise
Gaze follows what wasn't predicted.
Cogitator continuously predicts the near future of the scene — where things will be, which relations will persist, which affordances will hold. When the world does something the predictor did not forecast, the mismatch is recorded as surprise. Surprise is not noise; it is the signal the system directs its gaze toward. The attention policy turns the world model’s own uncertainty into where to look next.
Gaze shifts toward the entity with the highest unexplained surprise — what did we get wrong?
Gaze follows the entity with the widest belief variance — what do we not yet know?
In the absence of either, a serpentine sweep — what have we forgotten to look at?
Architecture
Three layers of governance.
Beliefs are not committed by whatever module produced them. Every update flows through a three-layer governance system — modeled on separation of powers — before it can modify the world model.
A small set of rules every belief change must satisfy — every belief must trace to evidence, confidence stays bounded, identity changes stay coherent. None of these can be relaxed at runtime.
Tunable knobs that govern how the system behaves day-to-day — how fast beliefs decay, when surprise is worth flagging, how much to trust each source. Tunable, but always within defined ranges.
Any module can propose a change. Only the Judiciary commits. Each proposal is evaluated on its own merits — never in batches where bad ones can hide behind good ones.
The world model
A graph of graded beliefs.
The world Cogitator maintains is a graph — nodes for the things it thinks are there, edges for how it thinks they’re related, and latent hypotheses for what they might be about to do. Every belief carries uncertainty, and every uncertainty decays without fresh evidence to reinforce it.
Entities (nodes)
- What’s there, and whether it’s still there
- What it is, and whether it’s the same one as before
- Where it is, how it’s moving, how big it is
- What it offers to an acting agent
Relations (edges)
- Which things are close in space
- Which ones are supporting or supported by others
- Which ones are closing in on each other
Interactions
- Built-in patterns — grasping, placing, approaching, observing
- New patterns surfaced as candidates when old ones don’t fit
Intent
- Spatial triggers — closing, leaving, reaching, gazing
- Open-vocabulary hypotheses built from those plus context
- Inferred from how well predictions hold, not from labels
Status
Foundational research, applied where ready.
Cogitator is a research framework that informs everything Midvale builds. The cognition layer in Mosaic will draw directly from it. We’re also interested in collaborators working on robotics, calibrated perception, and active inference.