Research
Technical deep-dives, model releases, and insights from our work on spatial intelligence.
spatial-reasoningvision-transformerssurveyworld-indexreal-timearchitectureobject-detectionedge-deploymentengineering
Spatial Reasoning in Vision Transformers: A Survey and New Directions
We survey recent advances in spatial reasoning capabilities of vision transformers and propose a new benchmark for evaluating 3D scene understanding from 2D inputs.
Building a Real-Time World Index: From Pixels to Semantic Maps
How we approach the problem of building structured, queryable representations of physical environments from streaming video — our architecture and lessons learned.
Open-Vocabulary Object Detection on the Edge: Practical Tricks That Work
Notes from our experience deploying open-vocabulary detection models on edge devices — what works, what doesn't, and the trade-offs we've made.