Thousands worldwide sell personal data for AI training amid privacy risks

Author: The Guardian

Compiled by Deep潮 TechFlow

Shenchao Overview: This investigative report reveals a rapidly growing gray industry: thousands of people worldwide are earning income from AI training by selling their voices, faces, call records, and daily videos.

This is not a vague discussion about privacy concerns—it’s an investigation involving real people, real amounts of money, and real consequences: an actor who sold his likeness later saw “himself” promoting an unknown medical product on Instagram, with comments evaluating his “appearance.”

When the data hunger of AI companies meets global economic disparities, it is creating an uneven transaction.

The full text is as follows:

Last year, on a morning in Cape Town, South Africa, Jacobus Louw went for his usual walk, feeding seagulls along the way. But this time, he recorded several videos—capturing his footsteps and surroundings as he walked down the sidewalk. The videos earned him $14, about ten times the country’s minimum wage and equivalent to half a week’s food expenses for this 27-year-old.

This is a "City Navigation" task completed by Louw on Kled AI. Kled AI is an app that pays users for uploading photos, videos, and other data used to train AI models. Within just a few weeks, Louw earned $50 by uploading photos and videos from daily life.

Thousands of miles away in Ranchi, India, 22-year-old student Sahil Tigga regularly earns money through Silencio—an app that crowdsources audio data for AI training by accessing his phone’s microphone to capture ambient noise from places like restaurant interiors or busy intersections. He also uploads recordings of his own voice. Sahil makes special trips to unique locations not yet documented on Silencio’s map, such as hotel lobbies. He earns over $100 per month, enough to cover all his food expenses.

In Chicago, 18-year-old welding apprentice Ramelio Hill earned hundreds of dollars by selling his private cellphone chats with friends and family to Neon Mobile, a conversational AI training platform that pays $0.50 per minute. For Hill, the math was simple: he believed tech companies were already harvesting vast amounts of his personal data, so he might as well profit from it too.

These "AI training gigs"—uploading photos, videos, and audio of surrounding scenes and oneself—are at the forefront of a global new data gold rush. As Silicon Valley’s demand for high-quality human data exceeds what can be scraped from the open internet, a thriving data marketplace industry has emerged to fill this gap. From Cape Town to Chicago, thousands of people are micro-licensing their biometric identities and private data to the next generation of AI.

But this new gig economy comes at a cost. Behind the few dollars earned, these workers are fueling an industry that may ultimately render their skills obsolete, while exposing themselves to future risks of deepfakes, identity theft, and digital exploitation—risks they are only just beginning to understand.

Keep the AI gears turning

AI language models like ChatGPT and Gemini require vast amounts of training material to continuously improve, but they are now facing a data shortage. The most commonly used training data sources—C4, RefinedWeb, and Dolma—account for a quarter of the web’s highest-quality datasets and are now restricting generative AI companies from using their data to train models. Researchers estimate that AI companies could exhaust available fresh, high-quality text as early as 2026. Although some labs have begun training models using synthetic data generated by AI itself, this recursive process leads to models producing increasingly corrupted outputs filled with errors, potentially causing system collapse.

Apps like Kled AI and Silencio are exactly where this comes into play. In these data markets, millions of people are feeding and training AI by selling their own identity data. Beyond Kled AI, Silencio, and Neon Mobile, AI trainers have many other options: Luel AI, backed by the renowned incubator Y-Combinator, acquires multilingual dialogue samples at approximately $0.15 per minute; ElevenLabs allows you to digitally clone your voice and makes it available to others at a base rate of $0.02 per minute.

Professor Bouke Klein Teeselink of King’s College London stated that AI training gig work is an emerging job category that will grow significantly.

AI companies know that paying people for data authorization helps avoid copyright disputes that could arise from relying entirely on web scraping, Teeselink said. AI researcher Veniamin Veselovsky noted that these companies also need high-quality data to model new and improved behaviors for their systems. "For now, human-generated data remains the gold standard for sampling outside the model distribution," Veselovsky added.

The humans who power these machines—especially those in developing countries—often rely on this income and have little other choice. For many AI training gig workers, taking on this work is a pragmatic response to economic inequality. In countries with high unemployment and depreciating local currencies, earning U.S. dollars is often more stable and profitable than local jobs. Some struggle to find entry-level work and turn to AI training out of necessity for survival. Even in wealthier nations, rising living costs make selling one’s labor a logical financial decision.

Louw, an AI trainer from Cape Town, is well aware of the privacy costs involved. Despite unstable income that doesn’t cover all his monthly expenses, he accepts these conditions to earn money. Having suffered from a neurological disorder for years that prevented him from finding employment, the money he earned from the AI data market—including Kled AI—allowed him to save $500 to enroll in a spa training course and become a massage therapist.

"As a South African, receiving U.S. dollars is worth more than people realize," said Louw.

Mark Graham, Professor of Internet Geography at the University of Oxford and author of "Feeding the Machine," acknowledges that for individuals in developing countries, this income may have practical significance in the short term, but he warns that "structurally, this work is unstable, offers no upward mobility, and is effectively a dead end."

Graham added that the AI data market relies on "a race to the bottom in wages" and "temporary demand for human data." Once this demand shifts, "workers will have no safeguards, no transferable skills, and no safety net."

Graham said the only winners are "platforms in the Northern Hemisphere, which have captured all the lasting value."

Full authorization

Hill, an AI trainer from Chicago, feels conflicted about selling his private phone calls to Neon Mobile. He earned $200 for about 11 hours of call data, but says the app frequently goes offline and delays payments. "Neon has always seemed suspicious to me, but I kept using it just to earn extra cash for bills," Hill said.

Now he’s beginning to reconsider whether that money was really so easy to come by. In September last year, Neon Mobile shut down just weeks after launching, after TechCrunch discovered a security vulnerability that allowed anyone to access users’ phone numbers, call recordings, and text logs. Hill says Neon Mobile never informed him about this, and now he’s concerned his voice could be misused online.

Jennifer King, a data privacy researcher at Stanford University’s Institute for Human-Centered Artificial Intelligence, is concerned that the AI data market lacks transparency regarding how and where user data will be used. She adds that, without understanding their rights or having the ability to negotiate on the matter, "consumers face the risk of their data being repurposed in ways they dislike, do not understand, or did not anticipate—with little to no recourse available."

When AI trainers share their data on Neon Mobile and Kled AI, they grant a worldwide, exclusive, irrevocable, transferable, and royalty-free license permitting the platform to sell, use, publicly display, and store their likenesses, and to create derivative works based on them.

Avi Patel, founder of Kled AI, stated that his company’s data protocol will be restricted to AI training and research purposes. “The entire business model relies on user trust. If contributors believe their data could be misused, the platform cannot function,” he said. The company will vet potential buyers before selling datasets to avoid partnering with entities deemed “suspicious,” such as the pornography industry, and “government agencies” that they believe might use the data in ways that violate this trust.

Neon Mobile did not respond to requests for comment.

Professor Enrico Bonadio from City, University of London, noted that these terms allow the platform and its users to "virtually do anything with the material, permanently and without additional payment, and contributors have no practical way to withdraw consent or renegotiate."

More concerning risks include trainers' data being used to create deepfakes and impersonations. Although data markets claim to strip identifying information—such as names and locations—from data before sale, Bonadio adds that biometric patterns are inherently difficult to anonymize meaningfully.

Seller's regret

Even if AI trainers can negotiate more detailed protection terms regarding how their data is used, they may still regret it. In 2024, actor Adam Coy from New York sold his likeness to Captions—an AI video editing software now renamed Mirage—for $1,000. His agreement stipulated that his identity would not be used for any political purposes, nor for promoting alcohol, tobacco, or adult content, and that the license would last for one year.

Captions did not respond to requests for comment.

Soon after, Adam’s friends began sharing videos they found online featuring his face and voice, which had garnered millions of views. One Instagram video showed an AI-generated clone of Adam identifying himself as a “vagina doctor,” promoting unverified supplements to pregnant and postpartum women.

"It makes me uncomfortable to explain this to others," Coy said.

“The comments section was weird because they were commenting on my appearance, but that’s not even me,” Coy added. “When I made the decision to sell my likeness, I thought that most models already have their data and images scraped online anyway, so why not get paid for it?”

Coy said he has not taken any other AI data gig since then. He said he would only consider doing so if a company offered substantial compensation.