Thousands worldwide sell personal data for AI training amid privacy risks

iconTechFlow
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
Global AI and crypto news highlights thousands selling personal data—voices, faces, and private chats—for minimal pay. These contributors, from South Africa to India and the U.S., power platforms like Kled AI and Neon Mobile. Terms often grant companies perpetual rights to use the data, raising privacy concerns. Inflation and market demand are driving this trend, as AI models require vast datasets. Risks include deepfakes and identity theft, with users having little control.

Author: The Guardian

Compiled by Deep潮 TechFlow

Shenchao Overview: This investigative report reveals a rapidly growing gray industry: thousands of people worldwide are earning income from AI training by selling their voices, faces, call records, and daily videos.

This is not a vague discussion about privacy concerns—it’s an investigation involving real people, real amounts of money, and real consequences: an actor who sold his likeness later saw “himself” promoting an unknown medical product on Instagram, with comments evaluating his “appearance.”

When the data hunger of AI companies meets global economic disparities, it is creating an uneven transaction.

The full text is as follows:

Last year, on a morning in Cape Town, South Africa, Jacobus Louw went for his usual walk, feeding seagulls along the way. But this time, he recorded several videos—capturing his footsteps and surroundings as he walked down the sidewalk. The videos earned him $14, about ten times the country’s minimum wage and equivalent to half a week’s food expenses for this 27-year-old.

This is a "City Navigation" task completed by Louw on Kled AI. Kled AI is an app that pays users for uploading photos, videos, and other data used to train AI models. Within just a few weeks, Louw earned $50 by uploading photos and videos from daily life.

Thousands of miles away in Ranchi, India, 22-year-old student Sahil Tigga regularly earns money through Silencio—an app that crowdsources audio data for AI training by accessing his phone’s microphone to capture ambient noise from places like restaurant interiors or busy intersections. He also uploads recordings of his own voice. Sahil makes special trips to unique locations not yet documented on Silencio’s map, such as hotel lobbies. He earns over $100 per month, enough to cover all his food expenses.

In Chicago, 18-year-old welding apprentice Ramelio Hill earned hundreds of dollars by selling his private cellphone chats with friends and family to Neon Mobile, a conversational AI training platform that pays $0.50 per minute. For Hill, the math was simple: he believed tech companies were already harvesting vast amounts of his personal data, so he might as well profit from it too.

These "AI training gigs"—uploading photos, videos, and audio of surrounding scenes and oneself—are at the forefront of a global new data gold rush. As Silicon Valley’s demand for high-quality human data exceeds what can be scraped from the open internet, a thriving data marketplace industry has emerged to fill this gap. From Cape Town to Chicago, thousands of people are micro-licensing their biometric identities and private data to the next generation of AI.

But this new gig economy comes at a cost. Behind the few dollars earned, these workers are fueling an industry that may ultimately render their skills obsolete, while exposing themselves to future risks of deepfakes, identity theft, and digital exploitation—risks they are only just beginning to understand.

Keep the AI gears turning

AI language models like ChatGPT and Gemini require vast amounts of training material to continuously improve, but they are now facing a data shortage. The most commonly used training data sources—C4, RefinedWeb, and Dolma—account for a quarter of the web’s highest-quality datasets and are now restricting generative AI companies from using their data to train models. Researchers estimate that AI companies could exhaust available fresh, high-quality text as early as 2026. Although some labs have begun training models using synthetic data generated by AI itself, this recursive process leads to models producing increasingly corrupted outputs filled with errors, potentially causing system collapse.

image

Apps like Kled AI and Silencio are exactly where this comes into play. In these data markets, millions of people are feeding and training AI by selling their own identity data. Beyond Kled AI, Silencio, and Neon Mobile, AI trainers have many other options: Luel AI, backed by the renowned incubator Y-Combinator, acquires multilingual dialogue samples at approximately $0.15 per minute; ElevenLabs allows you to digitally clone your voice and makes it available to others at a base rate of $0.02 per minute.

Professor Bouke Klein Teeselink of King’s College London stated that AI training gig work is an emerging job category that will grow significantly.

AI companies know that paying people for data authorization helps avoid copyright disputes that could arise from relying entirely on web scraping, Teeselink said. AI researcher Veniamin Veselovsky noted that these companies also need high-quality data to model new and improved behaviors for their systems. "For now, human-generated data remains the gold standard for sampling outside the model distribution," Veselovsky added.

The humans who power these machines—especially those in developing countries—often rely on this income and have little other choice. For many AI training gig workers, taking on this work is a pragmatic response to economic inequality. In countries with high unemployment and depreciating local currencies, earning U.S. dollars is often more stable and profitable than local jobs. Some struggle to find entry-level work and turn to AI training out of necessity for survival. Even in wealthier nations, rising living costs make selling one’s labor a logical financial decision.

Louw, an AI trainer from Cape Town, is well aware of the privacy costs involved. Despite unstable income that doesn’t cover all his monthly expenses, he accepts these conditions to earn money. Having suffered from a neurological disorder for years that prevented him from finding employment, the money he earned from the AI data market—including Kled AI—allowed him to save $500 to enroll in a spa training course and become a massage therapist.

"As a South African, receiving U.S. dollars is worth more than people realize," said Louw.

Mark Graham, Professor of Internet Geography at the University of Oxford and author of "Feeding the Machine," acknowledges that for individuals in developing countries, this income may have practical significance in the short term, but he warns that "structurally, this work is unstable, offers no upward mobility, and is effectively a dead end."

Graham added that the AI data market relies on "a race to the bottom in wages" and "temporary demand for human data." Once this demand shifts, "workers will have no safeguards, no transferable skills, and no safety net."

Graham said the only winners are "platforms in the Northern Hemisphere, which have captured all the lasting value."

image

Full authorization

Hill, an AI trainer from Chicago, feels conflicted about selling his private phone calls to Neon Mobile. He earned $200 for about 11 hours of call data, but says the app frequently goes offline and delays payments. "Neon has always seemed suspicious to me, but I kept using it just to earn extra cash for bills," Hill said.

Now he’s beginning to reconsider whether that money was really so easy to come by. In September last year, Neon Mobile shut down just weeks after launching, after TechCrunch discovered a security vulnerability that allowed anyone to access users’ phone numbers, call recordings, and text logs. Hill says Neon Mobile never informed him about this, and now he’s concerned his voice could be misused online.

Jennifer King, a data privacy researcher at Stanford University’s Institute for Human-Centered Artificial Intelligence, is concerned that the AI data market lacks transparency regarding how and where user data will be used. She adds that, without understanding their rights or having the ability to negotiate on the matter, "consumers face the risk of their data being repurposed in ways they dislike, do not understand, or did not anticipate—with little to no recourse available."

When AI trainers share their data on Neon Mobile and Kled AI, they grant a worldwide, exclusive, irrevocable, transferable, and royalty-free license permitting the platform to sell, use, publicly display, and store their likenesses, and to create derivative works based on them.

Avi Patel, founder of Kled AI, stated that his company’s data protocol will be restricted to AI training and research purposes. “The entire business model relies on user trust. If contributors believe their data could be misused, the platform cannot function,” he said. The company will vet potential buyers before selling datasets to avoid partnering with entities deemed “suspicious,” such as the pornography industry, and “government agencies” that they believe might use the data in ways that violate this trust.

Neon Mobile did not respond to requests for comment.

Professor Enrico Bonadio from City, University of London, noted that these terms allow the platform and its users to "virtually do anything with the material, permanently and without additional payment, and contributors have no practical way to withdraw consent or renegotiate."

More concerning risks include trainers' data being used to create deepfakes and impersonations. Although data markets claim to strip identifying information—such as names and locations—from data before sale, Bonadio adds that biometric patterns are inherently difficult to anonymize meaningfully.

Seller's regret

Even if AI trainers can negotiate more detailed protection terms regarding how their data is used, they may still regret it. In 2024, actor Adam Coy from New York sold his likeness to Captions—an AI video editing software now renamed Mirage—for $1,000. His agreement stipulated that his identity would not be used for any political purposes, nor for promoting alcohol, tobacco, or adult content, and that the license would last for one year.

Captions did not respond to requests for comment.

Soon after, Adam’s friends began sharing videos they found online featuring his face and voice, which had garnered millions of views. One Instagram video showed an AI-generated clone of Adam identifying himself as a “vagina doctor,” promoting unverified supplements to pregnant and postpartum women.

"It makes me uncomfortable to explain this to others," Coy said.

“The comments section was weird because they were commenting on my appearance, but that’s not even me,” Coy added. “When I made the decision to sell my likeness, I thought that most models already have their data and images scraped online anyway, so why not get paid for it?”

Coy said he has not taken any other AI data gig since then. He said he would only consider doing so if a company offered substantial compensation.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.