Author: Will Douglas Heaven
DeepFlow Tech
DeepChaohao Summary: Niantic has turned 30 billion city photos taken by Pokémon Go players into a new business. Its AI subsidiary, Niantic Spatial, used this data to train a visual positioning system that achieves centimeter-level accuracy—far surpassing GPS performance in urban canyons. Its first major client is the delivery robot company Coco Robotics. From catching Pikachu to delivering pizza, this may be one of the most unexpected commercialization paths for crowdsourced data.
The full text is as follows:
Pokémon Go was the world's first blockbuster AR game. Released in 2016 by Niantic, a subsidiary of Google, this game, which overlays augmented reality gameplay onto the Pokémon franchise, quickly swept across the globe. From Chicago to Oslo to Enoshima, players flooded the streets, hoping to catch a Pidgey, Squirtle, or—lucky enough—a rare Galarian Zapdos, hovering just out of reach above the real world.
Simply put, this means millions of people are using their phones to take photos of countless buildings. “Five hundred million people installed this app within 60 days,” said Brian McClendon, CTO of Niantic Spatial, an AI company spun off from Niantic in May last year. According to data from game company Scopely—which acquired Pokémon Go from Niantic at the same time—the game still had over 100 million active players in 2024, eight years after its release.
Now, Niantic Spatial is leveraging this unparalleled crowdsourced data repository—city landmark photos from the phones of hundreds of millions of Pokémon Go players worldwide, accompanied by ultra-precise location tags—to build a World Model. This is a current hot area of technology, aiming to anchor LLM intelligence in real-world environments.
The company’s latest product is a model that can pinpoint your location on a map to within a few centimeters using just a few snapshots of buildings or other landmarks. They aim to use it to enable robots to navigate more accurately in areas where GPS is unreliable.
As the first large-scale validation of the technology, Niantic Spatial has just partnered with Coco Robotics, a startup that deploys last-mile food delivery robots in multiple cities across the U.S. and Europe. “Everyone thought AR was the future and that AR glasses were coming,” McClendon said. “Turns out, the robots came first as users.”
From Pikachu to pizza delivery
Coco Robotics has deployed approximately 1,000 suitcase-sized robots in Los Angeles, Chicago, Jersey City, Miami, and Helsinki, each capable of carrying up to eight extra-large pizzas or four bags of groceries. According to CEO Zach Rash, these robots have completed over 500,000 deliveries and traveled millions of miles in all types of weather conditions.
But to compete with human riders, Coco’s robots—which travel at about 5 miles per hour on sidewalks—must be highly reliable. “Our best way to work is to arrive right on time, as promised,” Rash said. That means not getting lost.
The problem Coco faces is that it cannot rely on GPS. In cities, radio signals bounce between buildings and interfere with each other, resulting in weak GPS signals. “We deliver in many dense areas with tall buildings, underground passages, and overpasses, where GPS basically never works well,” says Rash.
“City canyons are where GPS performs the worst worldwide,” said McClendon. “You see that blue dot on your phone, and it often drifts 50 meters, placing you on another block, in another direction, or across the street.” This is the problem Niantic Spatial aims to solve.
Over the past few years, Niantic Spatial has been aggregating data generated by players of Pokémon Go and Ingress (Niantic’s previous mobile AR game, released in 2013) to build a Visual Positioning System—determining your location based on what you see. “Making Pikachu run realistically down the street and enabling Coco’s robot to navigate a city safely and precisely are essentially the same problem,” said John Hanke, CEO of Niantic Spatial.
"Visual positioning is not a new technology," says Konrad Wenzel at Esri, a digital mapping and geospatial analysis company, "but clearly, the more cameras out there, the better it works."
Niantic Spatial trained its model on 30 billion images captured in urban environments. These images are especially dense around "hotspots"—key locations in Niantic games that encourage players to visit, such as Pokémon Gyms. "We have over one million locations worldwide that can pinpoint your position with high accuracy," said McClendon. "We know exactly where you're standing, within a few centimeters. More importantly, we know which direction you're looking."
As a result, for each of these one million locations, Niantic Spatial has thousands of photos taken from slightly different angles, at different times, and under varying weather conditions—all centered around the same spot. Each photo is accompanied by detailed metadata: the exact position, orientation, pose, movement status, speed, and direction of the phone at the time the photo was taken.
The company trained the model using this dataset to accurately predict its location based on what it "sees"—even in areas beyond the one million hotspots, where image and location data are relatively scarce.
In addition to GPS, Coco’s robot (equipped with four cameras) now uses this model to determine its location and destination. The robot’s cameras are mounted at hip height and face in all directions, offering a slightly different perspective than that of a Pokémon Go player, but Rash says the data adaptation is not complex.
Competitors are also using visual positioning systems. For example, Starship Technologies, a robotics delivery company founded in Estonia in 2014, states that its robots use sensors to create 3D maps of their surroundings, marking building edges and streetlight locations.
But Rash bet that Niantic Spatial’s technology would give Coco an advantage—he believed it would enable the robot to park precisely at the correct pickup spot outside restaurants, without blocking anyone, and right at the customer’s doorstep rather than a few steps away, which had sometimes happened in the past.
The Cambrian explosion of bots
When Niantic Spatial began developing its visual positioning system, the goal was for augmented reality, Hanke said. “If you’re wearing AR glasses and want the virtual world to lock onto where you’re looking, you need some way to achieve that. But now we’re witnessing a Cambrian explosion in robotics.”
Some robots need to share space with humans, such as on construction sites and sidewalks. “For robots to integrate into these environments without disturbing people, they must possess spatial understanding similar to that of humans,” says Hanke. “When robots are pushed or bumped, we can help them accurately reorient themselves.”
The partnership with Coco Robotics is just the beginning. Hanke says Niantic Spatial is building the first components of what he calls a "Living Map"—a highly precise virtual world simulation that evolves in real time alongside the physical world. As Coco and other companies’ robots travel around the globe, they will provide new sources of mapping data, making the digital replica increasingly detailed.
In Hanke and McClendon’s view, maps are not only becoming more detailed but are also increasingly used by machines—changing the purpose of maps. For a long time, maps have helped humans orient themselves. From 2D to 3D to 4D (think of real-time simulations like digital twins), the fundamental principle remains the same: points on a map correspond to points in space or time.
But maps designed for machines may need to become more like guidebooks, filled with information that humans take for granted. Companies like Niantic Spatial and ESRI aim to add descriptions to maps, telling machines exactly what they are seeing by labeling each object with a set of attributes. “The task of this era is to build useful descriptions of the world for machines,” says Hanke. “The data we have is a strong starting point for understanding how the interconnected fabric of the world operates.”
World models are currently very popular, and Niantic Spatial is well aware of this. While LLMs appear to know everything, they lack common sense when it comes to interpreting and interacting with everyday environments. World models are designed to address this issue. Companies such as Google DeepMind and World Labs are developing models that can instantly generate virtual fantasy worlds, which they then use as training grounds for AI agents.
Niantic Spatial says they approach this problem from a different angle. "If you make the map extreme enough, you'll eventually capture everything," McClendon said. "We haven't gotten there yet, but that's where we want to go. Right now, I'm very focused on trying to reconstruct the real world."
