Claude Code gave Claude tools to see code.
Claude Spatial gives it tools to see geometry.

It doesn't need a smarter model — it needs a product that lets it see 3D data.

Claude Code (2024)

Latent: Code reasoning
Tools: bash, file read, grep, LSP
Moat: Claude knows your repo

Claude Spatial (proposed)

Latent: 3D geometry reasoning
Tools: select_region, measure_distance, query_feature_tree
Moat: Claude knows your assembly
1

The spatial reasoning capability is partially latent

Prompting alone recovers 27-33% of the gap. Tool access should recover much more.

  • Internal representations exist. Probing studies find spatially selective units and "border cells" in transformer layers — emergent spatial features that encode geometry invariant to prompt format.1
  • Prompting activates them. Spatial Prefix-Prompting yields 33% F1 gains; Visualization-of-Thought gives 27% accuracy boosts over chain-of-thought — no fine-tuning needed.2,3
  • But there's a ceiling. GeoGramBench shows <50% accuracy at high abstraction levels. Simple primitives are easy (80%+), compositional geometry is hard (42-80% degradation).4,5
2

CAD-as-code transfers Claude's strongest skill

Models that treat CAD as code dramatically outperform raw geometry approaches.

  • CAD-Llama hits 99.9% generation success and 0.966 command accuracy by fine-tuning on structured CAD code grammars. CADmium shows code LLMs naturally handle JSON CAD histories.6,7
  • The bottleneck is data, not architecture. SpatialVLM proved this with 2B synthetic examples from 10M images. Anthropic can generate synthetic CAD training data at scale.8
  • cadrille unifies all modalities. First framework handling point clouds, images, and text for CAD reconstruction. RL fine-tuning on procedural data outperforms all single-modal approaches.9
3

A tool surface closes the remaining gap

Four MCP tools give Claude eyes for geometry — the spatial equivalent of bash and grep.

  • select_region + measure_distance let Claude query geometry on demand instead of holding it in context. query_feature_tree traverses assemblies. set_section_plane inspects internals.
  • Hypothesis: tools close 40-60% of the gap without fine-tuning — analogous to how Claude Code's bash tool unlocked capabilities already in the model.4
  • Connects to the full data surface: CAD, simulation, PCB, supply chain, inspection. Each source becomes a tool call, not a context dump.
4

The honest gaps

Where the claim is strong, and where it needs qualification.

  • Strong: Code reasoning transfers (CADmium, CAD-Llama). Internal spatial representations exist (probing studies). Training data is the bottleneck, not architecture (SpatialVLM).1,6,7,8
  • Needs work: Metric-accurate measurement requires fine-tuning, not just tools. Compositional geometry (50+ parts, 6 subsystems) degrades 42-80%.5,10
  • Net: The capability is partially latent. Four parallelizable research gaps (vision fine-tune, MCP schema, CAD encoder, token-efficient geometry) are well-scoped. 6-12 month program, not a 3-year bet.
References
  1. Martorell (Feb 2025). Spatial representations in LLMs — border cells in intermediate transformer layers.
  2. Spatial Prefix-Prompting (SPP). 33% F1 gains on 3D trajectories from prompting alone.
  3. Wu et al. (2024). Visualization-of-Thought. 27% accuracy boost over chain-of-thought.
  4. GeoGramBench (NeurIPS 2025). <50% at high abstraction, >80% on local primitives.
  5. Bai et al. (2025). 42-80% performance loss as spatial task complexity grows.
  6. Li et al. — CAD-Llama (CVPR 2025). 99.9% generation success, 0.966 command accuracy.
  7. CADmium — Mila (Dec 2025). Code LLMs handle JSON CAD histories naturally.
  8. SpatialVLM — Google (2024). 2B synthetic examples unlock metric spatial reasoning.
  9. cadrille (ICLR 2026). Unified multimodal CAD reconstruction from point clouds, images, text.
  10. SpatialBench — Cai et al. (2024). RGB-D fine-tuning needed for >99% depth accuracy.