You’ve been training Google’s AI for 15 years. You had no idea.

Original author: Sharbel, Co-founder of Unfungible

Original compilation: Lila, BlockBeats

Editor’s Note: CAPTCHA—the digits or images you click every time you log into a website—is something every internet user knows well. But each time you click “I’m not a robot,” you think you’re simply verifying your identity, when in fact you’re contributing to the world’s largest and most covert data production system. Luis von Ahn’s reCAPTCHA has transformed scattered human actions into the foundational data that powers core businesses like Google and its autonomous driving company, Waymo.

Beneath the surface of “free” and “secure,” the internet has quietly reshaped a new form of labor: you spend time proving you’re human, contributing to AI training—only for that labor to be entirely replaced once the AI learns. This article, published less than 20 hours ago, has already garnered over 9.5 million views on Twitter. Below is the original text:

Approximately 500,000 hours of human labor are used by Google for free each day, contributed by people who simply wanted to log in to their online banking.

reCAPTCHA is the most successful covert data operation in internet history. At its peak, 200 million people completed verification each day. Yet almost no one realizes what each click truly entails.

Google's autonomous vehicle company, Waymo, is now valued at $45 billion, and much of its core training data was provided freely by you as you browsed various websites.

Here is the complete story:

Origin: A clever idea

In 2000, spam bots were destroying the internet. Forums were flooded, inboxes were overflowing, and websites urgently needed a way to distinguish humans from machines.

Professor Luis von Ahn at Carnegie Mellon University solved this problem. He invented CAPTCHA: a distorted text that only humans can read, which robots cannot pass.

But von Ahn saw more than this. Millions of people were expending effort on these challenges. What if that effort could accomplish two things at once?

In 2007, he launched reCAPTCHA. Its brilliance lies in the fact that instead of displaying random gibberish, it shows two words: one that the system knows, and another that is a real scanned word from books that computers cannot yet recognize. Your response helps digitize these books.

These books come from the New York Times archive and Google Books, totaling up to 130 million.

You think you're just logging into a regular website, but you're actually performing OCR (optical character recognition) for the world's largest digital library.

In 2009, Google acquired reCAPTCHA.

Later, Google changed the rules.

The era of "distorted text" ended around 2012.

Google has encountered a new challenge: its Street View cars have photographed every road in the world, but the images are just raw data. For AI to be effective, it needs to understand what it sees: traffic signs, crosswalks, traffic lights, and storefronts.

So Google redesigned reCAPTCHA v2. Instead of distorted text, the images show a grid of photos. “Click all squares with traffic lights.” “Select every crosswalk.” “Identify storefronts.”

These images are directly from Google Street View. Your clicks are the labels.

Each choice tells Google's computer vision model: this cluster of pixels is a traffic light, that shape is a crosswalk. You're not taking a test—you're building a dataset.

Beyond imagination in scale

At its peak, 200 million reCAPTCHAs were solved daily. Each challenge took 10 seconds, meaning 2 billion seconds of human labor were generated each day—that’s 500,000 hours per day.

The cost of paid data labeling is approximately $10 to $50 per hour. At the minimum rate, the value of labor being extracted for free each day reaches up to $5 million.

And reCAPTCHA isn't limited to just one app—it’s found on every bank, every government portal, and every e-commerce site. You have no choice: want to log into your account? First, label this dataset. Google never asked your permission, never paid you a cent, and never even told you about it.

What has all of this created?

These data are directly fed into two products:

- Google Maps: The most widely used navigation tool worldwide. Its ability to recognize road signs, stores, and city geography is partly due to billions of human annotations made when users log in to the website.

-Waymo: Google’s autonomous driving project. To navigate safely, autonomous vehicles must identify thousands of visual patterns with near-perfect accuracy.

The ground truth training data for those recognition tasks was annotated by millions of people unaware that they were doing so through reCAPTCHA. Waymo completed over 4 million paid rides in 2024 and is valued at $45 billion. Its foundation was laid by those “unpaid internet users” who simply wanted to check their email.

Why can't anyone replicate this model?

Data labeling is extremely expensive. Companies like Scale AI, Appen, and Labelbox exist to address this issue, employing hundreds of thousands of workers, some of whom earn less than $1 per hour.

Google’s solution took a different approach: they made labeling mandatory. No payment, no consent required—just a “ticket” to access every corner of the internet. The result: billions of labeled images, global coverage, 24/7 weather, and every city in the world. No labeling company could achieve this. The internet itself became the factory, and every internet user became an uncontracted worker.

You are still participating.

reCAPTCHA v3, introduced in 2018, no longer displays challenges at all. It analyzes how you move your mouse, your scrolling speed, and how long you stay on a page. Your behavioral fingerprint tells it whether you are human. This behavioral data is also fed back into Google’s AI system.

You never actively chose to join, and there was never a checkbox for you to select. But now, on most websites you visit, you are still doing so.

Disturbing irony

Luis von Ahn’s original intent was brilliant: to transform the energy humans were already wasting into useful output. But what Google did with this vision is another matter entirely. They exploited a security mechanism users were forced to use, deployed it across the web, and harvested the output to build a multi-billion-dollar commercial product. Users gained nothing—and didn’t even know it.

The deepest irony is this: you spent years proving you were human by completing visual recognition tasks that AI couldn’t yet do—only for human visual labeling to become unnecessary once AI learned to do them.

You proved you were human, only to make yourself replaceable.

Original link

Click to learn about the open positions at BlockBeats

Welcome to the official BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Official Twitter account: https://twitter.com/BlockBeatsAsia