Claude Code gave Claude tools to see code.
Claude Spatial gives it tools to see geometry.

It doesn't need a smarter model — it needs a product that lets it see 3D data.

Latent: Code reasoning

Tools: bash, file read, grep, LSP

Moat: Claude knows your repo

Latent: 3D geometry reasoning

Tools: select_region, measure_distance, query_feature_tree

Moat: Claude knows your assembly

The spatial reasoning capability is partially latent

Prompting alone recovers 27-33% of the gap. Tool access should recover much more.

Internal representations exist. Probing studies find spatially selective units and "border cells" in transformer layers — emergent spatial features that encode geometry invariant to prompt format.¹
Prompting activates them. Spatial Prefix-Prompting yields 33% F1 gains; Visualization-of-Thought gives 27% accuracy boosts over chain-of-thought — no fine-tuning needed.^2,3
But there's a ceiling. GeoGramBench shows <50% accuracy at high abstraction levels. Simple primitives are easy (80%+), compositional geometry is hard (42-80% degradation).^4,5

Models that treat CAD as code dramatically outperform raw geometry approaches.

CAD-Llama hits 99.9% generation success and 0.966 command accuracy by fine-tuning on structured CAD code grammars. CADmium shows code LLMs naturally handle JSON CAD histories.^6,7
The bottleneck is data, not architecture. SpatialVLM proved this with 2B synthetic examples from 10M images. Anthropic can generate synthetic CAD training data at scale.⁸
cadrille unifies all modalities. First framework handling point clouds, images, and text for CAD reconstruction. RL fine-tuning on procedural data outperforms all single-modal approaches.⁹

Four MCP tools give Claude eyes for geometry — the spatial equivalent of bash and grep.

select_region + measure_distance let Claude query geometry on demand instead of holding it in context. query_feature_tree traverses assemblies. set_section_plane inspects internals.
Hypothesis: tools close 40-60% of the gap without fine-tuning — analogous to how Claude Code's bash tool unlocked capabilities already in the model.⁴
Connects to the full data surface: CAD, simulation, PCB, supply chain, inspection. Each source becomes a tool call, not a context dump.

Where the claim is strong, and where it needs qualification.

Strong: Code reasoning transfers (CADmium, CAD-Llama). Internal spatial representations exist (probing studies). Training data is the bottleneck, not architecture (SpatialVLM).^1,6,7,8
Needs work: Metric-accurate measurement requires fine-tuning, not just tools. Compositional geometry (50+ parts, 6 subsystems) degrades 42-80%.^5,10
Net: The capability is partially latent. Four parallelizable research gaps (vision fine-tune, MCP schema, CAD encoder, token-efficient geometry) are well-scoped. 6-12 month program, not a 3-year bet.

References

Martorell (Feb 2025). Spatial representations in LLMs — border cells in intermediate transformer layers.
Spatial Prefix-Prompting (SPP). 33% F1 gains on 3D trajectories from prompting alone.
Wu et al. (2024). Visualization-of-Thought. 27% accuracy boost over chain-of-thought.
GeoGramBench (NeurIPS 2025). <50% at high abstraction, >80% on local primitives.
Bai et al. (2025). 42-80% performance loss as spatial task complexity grows.
Li et al. — CAD-Llama (CVPR 2025). 99.9% generation success, 0.966 command accuracy.
CADmium — Mila (Dec 2025). Code LLMs handle JSON CAD histories naturally.
SpatialVLM — Google (2024). 2B synthetic examples unlock metric spatial reasoning.
cadrille (ICLR 2026). Unified multimodal CAD reconstruction from point clouds, images, text.
SpatialBench — Cai et al. (2024). RGB-D fine-tuning needed for >99% depth accuracy.