Reddit sues Perplexity over ‘industrial‑scale’ data scraping for AI
Reddit has filed a Manhattan federal case accusing Perplexity and three data‑scraping firms of bypassing protections and harvesting Reddit content — including via Google results — to train an AI “answer engine”.
Reddit has filed a federal lawsuit in the Southern District of New York against Perplexity AI and three data‑scraping outfits, alleging they have bypassed technical safeguards and harvested Reddit conversations — including via Google search results — to power Perplexity’s AI “answer engine.”
The complaint has named Perplexity AI alongside Oxylabs UAB, AWMProxy and SerpApi, and has accused them of unfair competition, unjust enrichment and copyright violations stemming from “industrial‑scale” scraping of user comments for commercial gain.
Reddit said the firms evaded its protections and, when blocked, have scraped Reddit content from Google’s results pages.
Reddit further claimed it sent Perplexity a cease‑and‑desist letter in May 2024, yet citations to Reddit within Perplexity’s results have increased “forty‑fold,” and that a trap post visible only to Google has later surfaced in Perplexity’s answers.
Perplexity said it will “fight vigorously for users’ rights to freely and fairly access public knowledge,” and rejected the accusations. Oxylabs has said it has been “shocked and disappointed” and will defend itself, while SerpApi has also denied wrongdoing. AWMProxy has not responded publicly.
Training data for AI
Reddit has positioned its archive of human conversation as a valuable dataset and has previously entered paid data‑licensing deals with Google and OpenAI, while pursuing a strategy to monetise access to its content.
Reddit’s chief executive and co‑founder Steve Huffman has championed partnerships that have brought Reddit content into AI assistants under licence, contrasting with what the company has described as unlawful scraping that has undermined its anti‑abuse investments.
In fact, Reddit has said it has spent tens of millions of dollars on anti‑scraping systems.
Reddit has also sought damages and an injunction to prevent further unauthorised use of its data.


