Asset managers have embraced web scraping as a cornerstone of contemporary alpha generation, with the industry spending more than $2 billion annually to extract alternative data. Some estimate that that web scraping now represents 5 percent of global internet traffic. As Bloomberg reports, “Almost all quant funds use machine learning to sweep through social media, news articles, and earnings reports.”
But this dependence on digital intelligence exposes managers to a serious threat. The exponential growth of misleading content — misinformation, disinformation, and AI slop — is polluting digital ecosystems. Compounding the risk is the fact that no current detection system can reliably verify the authenticity of all forms of digital information.
Institutional allocators may not even be aware of the systemic risk they face, which could compromise investment outcomes on an unprecedented scale.
How to assess this risk? Let’s begin by defining some terms.
Misinformation is false information spread without malicious intent — human error leading to inaccurate earnings rumors or misinterpreted company data circulating organically.
Disinformation is information deliberately fabricated to deceive, such as a coordinated campaign to manipulate stock prices or spread false merger rumors.
AI slop is low-quality, misleading, or entirely synthetic content generated by artificial intelligence systems. This phenomenon represents an evolution beyond traditional misinformation and disinformation, creating qualitatively different threats to investment intelligence.
Since May 2023, there has been a 1,000 percent increase in AI-generated news and information sites specifically designed to spread misinformation and disinformation. NewsGuard has identified more than 1,200 such websites masquerading as legitimate news outlets and churning out bot-written financial, political, and economic content. This is more than a quantitative increase — it signals a qualitative transformation in how disinformation operates. Whereas traditional misinformation and disinformation campaigns relied on human authors, editors, and distribution networks, AI slop sites can operate with minimal human oversight, with a single actor able to operate 24/7 and generate thousands of pieces of content that automatically adapt to trending topics and emerging market events.
A.W. Ohlheiser, a senior technology reporter and editor at Vox, says of this digital pollution, “After a decade of covering online culture and information manipulation, I don’t think I’ve ever seen things as bad as they are now.”
An insidious aspect of AI slop is its self-reinforcing nature. AI systems, such as large language models like OpenAI’s ChatGPT and Anthropic’s Claude, increasingly train on internet data that includes AI-generated content, which can lead to what is called model collapse — where algorithms learn from their own synthetic output rather than from authentic human knowledge and produce increasingly useless or undesirable outputs. Add to this the problem of LLM grooming — the systematic duplication of false narratives online with the intent of manipulating LLM outputs.
This creates a deteriorating information environment, where each generation of AI-produced content is less reliable than the previous one.
Unfortunately, the situation is likely to get even worse. The increasing power of readily available multimodal LLMs to generate false content, the proliferation of deepfakes, and the development of new AI tools will undoubtedly result in a tsunami of false digital information.
Executives at social media and news platforms are aware of the problem but are struggling to develop reliable solutions for detecting and removing mis- and disinformation. The platforms typically employ a combination of human judgment and technology. However, X, Meta, Google, and others have significantly cut their trust and responsibility staffs, and the various automated algorithms developed by these companies are “unable to detect emerging fake news at an early stage, and thus [fail] to minimize the damage caused by fake news.”
Social media platforms’ revenue models have long incentivized them to promote content regardless of its accuracy, prioritizing engagement over safety. Although critics have rightly focused on social problems, including election fraud and adverse effects on children, investors face their own issues.
Governments, too, are aware of the problem. But the U.S., for example, is constrained by Section 230 of the 1996 Communications Decency Act and by free speech questions, leaving it struggling to develop and implement effective, sustainable legislative and policy directives.
Social media platforms’ abdication of responsibility and regulators’ failure to develop robust frameworks governing web scraping and data quality — frameworks that can keep pace with the AI slop threat — mean allocators and managers must find ways to cope in real time with what author and activist Cory Doctorow calls the “enshittification of the internet.” The term describes an endless stream of bot-generated content with credible-seeming bylines, fake analyst reports and earnings previews, deepfakes of prominent figures making false statements, fabricated images of events, websites generating thousands of AI-generated articles daily, and artificial social media posts designed to manipulate sentiment. (The American Dialect Society chose “enshittification” as its 2023 Word of the Year.)
We’ve seen real-world examples of how this enshittification has already permeated the investment ecosystem. In 2023, an AI-generated image showing a fabricated explosion near the Pentagon caused brief but significant market volatility, demonstrating how synthetic content can trigger reactions in automated trading systems and influence investor behavior. A highly realistic deepfake video showing Goldman Sachs chief U.S. equity strategist David Kostin endorsing a fraudulent investment scheme with promises of outsize returns circulated online April 2025. The video spread rapidly across social media, messaging apps, and investment scam networks. And deepfake videos of Elon Musk have appeared on YouTube, X, and other platforms, showing him seemingly promoting fraudulent cryptocurrency schemes.
The Illusion of Detection: Why Current Safeguards Are Insufficient
A Coalition Greenwich research report on AI adoption by investment managers shows they are aware of the rapid proliferation of digital pollution and the risks it poses, with 58 percent saying they are concerned about data integrity.
In addition to human oversight and manual verification of web-scraped data, managers have responded to data quality concerns with increasingly sophisticated detection systems using such techniques as multisource cross-verification (one example is consensus-building algorithms to identify outliers), advanced machine learning systems such as natural language processing for anomaly detection, and external validation sources, including misinformation databases like NewsGuard’s False Claim Fingerprints.
However, these techniques, like those employed by social media companies, cannot reliably separate the gold from the dross.
The sheer volume of content being generated makes comprehensive human review impossible. There are more than 500 million posts daily on X, and a single AI system can generate thousands of articles per day, overwhelming human verification capacity.
The Center for Academic Innovation cites a large-scale study that tested 14 AI detection tools, concluding that they were neither accurate nor reliable. “Most of them had trouble telling human writing from AI, and the best-performing tool only got it right about half the time. That’s basically a coin toss.”
Automated detection systems rely on historical patterns to identify synthetic content. Entirely new forms of AI slop — using novel generation techniques or targeting previously unseen vulnerabilities — can evade detection until they’re identified and then added to training datasets.
We can add to this list of challenges the recurring problem of transparency, as many advanced AI detection systems operate as black boxes, making it difficult for managers to understand why content has been flagged or cleared. This opacity undermines confidence in the detection process, making it impossible to assess the reliability of specific data points.
Due Diligence Framework: Essential Questions for Allocators
The failure of current detection methods to reliably identify AI slop introduces systemic risk into any investment strategy relying on web-scraped data. This means allocators should include in their due diligence an assessment of the safeguards managers have put in place to mitigate the risk of ingesting false digital information.
This assessment can begin with a simple question: What specific processes ensure the quality and integrity of your scraped data, particularly regarding “data poisoning” from AI-generated content?
Allocators should expect managers to detail their data validation systems from collection through analysis; due diligence procedures for external data vendors, especially related to compliance with data privacy laws; protocols for identifying and addressing data corruption; and regular auditing of data sources and validation processes.
Managers should be able to describe their methodology for identifying AI slop, including empirical performance metrics and acknowledged limitations. In addition, they should provide the names of their specific detection algorithms and providers along with their tested-accuracy rates; outline their use of external misinformation databases and validation feeds; offer an honest assessment of detection limitations and failure modes; and explain their process for regular updating and improvement of detection systems.
If managers use a combination of human oversight and AI to detect bad information, allocators should ask how they incorporate human oversight and ensure transparency in their AI-driven detection processes. And does the process include clear human-in-the-loop protocols for reviewing flagged content, explainable AI systems that provide reasoning for classification decisions, customizable oversight levels and intervention capabilities, and regular human validation of the accuracy of the efforts?
Because of the acceleration of AI slop techniques, allocators should make sure they understand a manager’s plans, including how they will adapt their detection capabilities in response to evolving synthetic-content threats. A manager with a thorough detection plan should cite systematic monitoring for new forms of AI slop and regular model retraining and capability updates. The manager should also collaborate with industry and academic research communities to develop proactive threat assessment and response protocols.
The Uncomfortable Truth: Perfect Detection Is Impossible
AI slop detection and generation are in a perpetual arms race. As detection improves, so does generation, creating a systematic lag between the emergence of threats and the development of threat detection capabilities. AI-assisted detection cannot keep pace with sophisticated AI generation on the current trajectory.
Moreover, content generators have a fundamental advantage: They need to succeed only occasionally to cause damage, whereas detection systems must achieve near-perfect accuracy to be reliable. This asymmetry is not a temporary technological limitation but a fundamental characteristic of the adversarial relationship between content generation and detection systems. The perfect detection method simply does not exist.
For allocators, the AI slop era demands a new level of diligence in manager selection and ongoing monitoring. Given the impossibility of perfect detection, allocators (and managers) must approach AI slop as an irreducible risk requiring management rather than elimination. Allocators may prioritize managers who can clearly explain their detection process and its limitations rather than those claiming perfect protection.
The alternative — continuing to operate under the assumption that sophisticated technology can perfectly filter truth from fiction — risks systemic exposure to an evolving threat that shows no signs of slowing.
The age of perfect information was always an illusion. The age of AI slop has made that illusion impossible to maintain.
Angelo Calvello, PhD, founder of C/79 Consulting LLC and writes extensively on the impact of AI on institutional investing. All views expressed herein are solely the authors and not those of any entity with which the author is affiliated.