The request was deceptively simple: film yourself performing mundane domestic tasks — placing food in a bowl, microwaving it, removing it — in exchange for cryptocurrency. Elsewhere, users are being recruited to play games that involve remotely controlling a robotic arm in Shenzhen, China, to solve puzzles. These are not merely oddities of the gig economy. They represent a systematic effort to collect the raw material that robotics companies believe they need most: real-world movement data, the substrate of what the industry has taken to calling "physical intelligence."
Just as the written word fueled the rise of large language models, the nuances of human movement have become the latest frontier for data collection. The parallel is instructive — and its limits are revealing.
The missing internet of movement
When AI companies built the large language models behind products like ChatGPT, they drew on a vast, pre-existing archive: the open internet, decades of digitized books, forums, code repositories, and encyclopedias. The data was messy, but it was abundant and, critically, it already existed. Roboticists enjoy no such inheritance. There is no "internet of movement" to download. Human motor behavior — the way a hand adjusts grip pressure on a wet glass, the micro-corrections a body makes while stepping over a threshold — has never been recorded at scale in a format machines can learn from.
Until recently, the standard workaround was simulation. Companies built virtual environments where digital robots could practice millions of tasks in compressed time, learning through reinforcement in physics engines that approximate gravity, friction, and collision. The approach has produced results in controlled settings: warehouse pick-and-place operations, for instance, where objects are standardized and surfaces predictable. But domestic environments are a different proposition entirely. A kitchen counter may be cluttered, damp, or uneven. A bag of groceries deforms under its own weight. The specific elasticity of a sponge, the resistance of a stuck drawer — these are properties that simulations routinely get wrong or omit altogether.
The consequence is what researchers call the "reality gap," a term that has circulated in robotics literature for years but has gained renewed urgency as companies race to commercialize humanoid platforms. Robots trained exclusively in simulation often stumble, hesitate, or fail outright when confronted with the physical world's irreducible messiness. The gap is not merely a technical inconvenience; it is arguably the central bottleneck standing between current prototypes and commercially viable household robots.
Scaling laws meet the physical world
The emerging response borrows a conviction from the language-model era: scaling laws. The hypothesis is straightforward — if a model is fed enough diverse, high-quality data, it will eventually learn to generalize across novel situations. In the context of robotics, that means harvesting enormous volumes of human motion performing ordinary tasks: cooking, cleaning, folding, carrying. The bet is that statistical patterns buried in thousands of hours of recorded movement will teach a robot the implicit physics that simulation cannot reliably provide.
This creates a new kind of data economy. Where internet users once generated training data passively — by writing emails, posting reviews, uploading photos — contributors to movement datasets are being asked to participate actively, often compensated through micropayments or cryptocurrency. The economics resemble early crowdsourcing platforms, but the data itself is fundamentally different. Text and images are two-dimensional and static; movement data is temporal, three-dimensional, and context-dependent. Capturing it faithfully requires either specialized hardware or carefully designed tasks that approximate naturalistic behavior through consumer devices.
The strategic logic is clear enough. Humanoid robots are being designed to operate in environments built for human bodies — homes, offices, retail spaces — rather than in purpose-built factories. If they are to function in those spaces, they need to understand how humans already move through them. The companies pursuing this approach are, in effect, attempting to compile a behavioral corpus of domestic life, one recorded chore at a time.
Whether scaling laws that proved transformative for language will transfer cleanly to physical manipulation remains an open question. Text is symbolic and compositional; movement is continuous and governed by physics that tolerates no rounding errors. A language model that hallucinates a fact produces a wrong sentence. A robot that hallucinates a grip force drops a plate. The tolerance for error is categorically different, and it is not yet established that more data alone resolves that difference — or whether something architecturally distinct will be required to close the reality gap for good.
With reporting from MIT Technology Review.
Source · MIT Technology Review



