Research · Active

Cogitator

A perceptual system that earns its beliefs.

Cogitator is a probabilistic world model for scene understanding. It associates predictive affordances with the geometry of what it sees, maintains graded beliefs about entities and their relations, and directs its attention toward what surprised it — the salient events it couldn't predict.

Discuss the framework

The premise

Detection is not understanding.

Most perception stacks treat the output of a neural net as ground truth — a label arrives, a box arrives, the system acts on it. This works until something is occluded, ambiguous, contradicted, or simply new. Cogitator takes a different stance: detections and teacher inputs are all evidence, not facts. The system maintains graded beliefs about what likely caused the evidence, and revises them under invariants that cannot be violated.

Affordances from geometry

What an object is, and what it offers.

For every entity in the scene, Cogitator derives not just its state — position, velocity, bounding box — but a belief about what the object affords. Is it graspable? Supportive? Viewable? Readable? These affordance beliefs are mandatory, carry per-affordance uncertainty, and are extracted directly from the geometric and semantic features of each detection. Affordance is what turns a shape into a hypothesis about action.

graspable

Geometry suggests a handle, a rim, a stem — something the agent could hold.

supportive

A flat upper surface at the right height — somewhere to set things down.

viewable

A flat face with contrast — pictures, screens, signs worth looking at.

approachable

A reachable extent in the environment — somewhere to go.

Attention chases surprise

Gaze follows what wasn't predicted.

Cogitator continuously predicts the near future of the scene — where things will be, which relations will persist, which affordances will hold. When the world does something the predictor did not forecast, the mismatch is recorded as surprise. Surprise is not noise; it is the signal the system directs its gaze toward. The attention policy turns the world model’s own uncertainty into where to look next.

Prediction verification

Gaze shifts toward the entity with the highest unexplained surprise — what did we get wrong?

Uncertainty pursuit

Gaze follows the entity with the widest belief variance — what do we not yet know?

Scan

In the absence of either, a serpentine sweep — what have we forgotten to look at?

Architecture

Three layers of governance.

Beliefs are not committed by whatever module produced them. Every update flows through a three-layer governance system — modeled on separation of powers — before it can modify the world model.

Constitution

Immutable invariants

A small set of rules every belief change must satisfy — every belief must trace to evidence, confidence stays bounded, identity changes stay coherent. None of these can be relaxed at runtime.

Legislature

Adaptive parameters

Tunable knobs that govern how the system behaves day-to-day — how fast beliefs decay, when surprise is worth flagging, how much to trust each source. Tunable, but always within defined ranges.

Judiciary

The single committer

Any module can propose a change. Only the Judiciary commits. Each proposal is evaluated on its own merits — never in batches where bad ones can hide behind good ones.

The world model

A graph of graded beliefs.

The world Cogitator maintains is a graph — nodes for the things it thinks are there, edges for how it thinks they’re related, and latent hypotheses for what they might be about to do. Every belief carries uncertainty, and every uncertainty decays without fresh evidence to reinforce it.

Entities (nodes)

What’s there, and whether it’s still there
What it is, and whether it’s the same one as before
Where it is, how it’s moving, how big it is
What it offers to an acting agent

Relations (edges)

Which things are close in space
Which ones are supporting or supported by others
Which ones are closing in on each other

Interactions

Built-in patterns — grasping, placing, approaching, observing
New patterns surfaced as candidates when old ones don’t fit

Intent

Spatial triggers — closing, leaving, reaching, gazing
Open-vocabulary hypotheses built from those plus context
Inferred from how well predictions hold, not from labels

Status

Foundational research, applied where ready.

Cogitator is a research framework that informs everything Midvale builds. The cognition layer in Mosaic will draw directly from it. We’re also interested in collaborators working on robotics, calibrated perception, and active inference.

Reach out about research