Perplexity Sued: Inside the Copyright Lawsuit Shaking Up AI and Publishing

Perplexity Sued for Copyright: The Future of AI Content
September 13, 2025

Encyclopedia Britannica and Merriam-Webster Sue Perplexity for Copying Their Definitions: The AI Copyright Battle That's Reshaping Digital Publishing

The irony couldn't be more perfect. Perplexity AI, the company positioning itself as Google's smart competitor, now faces accusations of plagiarizing the very definition of "plagiarize" from Merriam-Webster. This isn't just another tech industry squabble—it's a watershed moment that could determine whether AI companies can freely harvest content from established publishers or must pay for the privilege.

Encyclopedia Britannica and Merriam-Webster have filed a comprehensive lawsuit against Perplexity in New York federal court, marking the latest escalation in the growing war between AI copyright lawsuits and traditional content creators. The case represents more than simple copyright infringement; it's a fundamental challenge to how artificial intelligence companies build their knowledge bases and whether they can continue operating under current fair use interpretations.

What makes this Perplexity lawsuit particularly compelling isn't just the legal implications—it's the brazen nature of the alleged copying. Court documents reveal screenshot evidence showing Perplexity's responses are virtually identical to the original sources, raising serious questions about whether AI answer engines are sophisticated plagiarism machines disguised as innovation tools.

The Lawsuit Details - What Encyclopedia Britannica and Merriam-Webster Are Claiming

Copyright Infringement and Content Scraping Allegations

The Encyclopedia Britannica sues Perplexity AI case centers on systematic content scraping that allegedly violates fundamental copyright protections. According to the federal court filing, Perplexity has been harvesting content from both publishers' databases without permission, authorization, or compensation. This isn't casual browsing—it's industrial-scale content extraction designed to fuel Perplexity's AI-driven answer engine.

The lawsuit documents detail how Perplexity's algorithms systematically crawl through Britannica's comprehensive encyclopedia entries and Merriam-Webster's dictionary definitions, extracting not just individual facts but entire explanatory passages. This Perplexity content scraping lawsuit argues that the AI company has built its competitive advantage by essentially photocopying the intellectual property that these publishers have spent decades and millions of dollars developing.

What's particularly damaging to Perplexity's position is the evidence showing how the AI reproduces not just facts, but the specific way those facts are explained and contextualized. Copyright law doesn't protect individual facts, but it absolutely protects the creative expression of those facts—the unique way Britannica explains historical events or how Merriam-Webster contextualizes word meanings and usage examples.

The Plagiarism Irony - Copying the Definition of 'Plagiarize'

Perhaps no single piece of evidence in the Merriam-Webster vs Perplexity legal battle is more symbolically powerful than Perplexity's alleged copying of the word "plagiarize" itself. The court documents include screenshots showing that when users ask Perplexity to define plagiarism, the AI returns a response that's virtually identical to Merriam-Webster's official definition, complete with usage examples and contextual explanations.

This isn't just about one word—it represents the core problem with how Perplexity operates. The definition of plagiarize, according to Merriam-Webster, involves taking someone else's work and presenting it as your own without proper attribution. Yet Perplexity appears to have done exactly that with the very definition of the term, creating a meta-level irony that perfectly encapsulates the plaintiffs' broader argument.

The screenshot evidence shows Perplexity presenting Merriam-Webster's carefully crafted definition without clear attribution, allowing users to believe they're receiving original AI-generated content when they're actually getting repackaged dictionary entries. This practice undermines the fundamental value proposition that makes dictionary publishers viable—if users can get identical content from AI tools without visiting the original source, why would they ever subscribe to or visit Merriam-Webster directly?

Trademark Infringement Through Brand Misrepresentation

Beyond copyright violations, the lawsuit includes serious trademark infringement claims that could prove even more damaging to Perplexity's business model. Britannica alleges that Perplexity uses their brand names to lend credibility to AI-generated content that may be inaccurate or entirely fabricated—what the industry calls "hallucinations."

This aspect of the case touches on a critical vulnerability in AI answer engines. When Perplexity generates a response and attributes it to "Encyclopedia Britannica" or "Merriam-Webster," users naturally assume they're getting information that meets these publishers' rigorous editorial standards. But if the AI has hallucinated or misinterpreted the original content, these trusted brands become unwitting endorsers of misinformation.

The trademark claims argue that this practice dilutes and damages the carefully cultivated reputations these publishers have built over centuries. Britannica has been synonymous with authoritative, fact-checked knowledge since 1768. Merriam-Webster has defined American English usage since 1847. When an AI tool falsely attributes information to these brands, it potentially erodes public trust in their reliability.

'Stealth Crawling' Accusations and Technical Violations

The lawsuit also exposes sophisticated technical methods that Perplexity allegedly uses to bypass website protections—a practice known as "stealth crawling." This isn't simply about AI companies reading publicly available web pages; it's about deliberately circumventing the technical measures that websites use to control how their content gets accessed and used.

Most major websites, including Britannica and Merriam-Webster, use robots.txt files and other technical barriers to signal which parts of their sites should be off-limits to automated crawlers. These protections exist for good reasons—they help publishers control server load, protect premium content, and maintain some control over how their intellectual property gets distributed.

The stealth crawling accusations suggest that Perplexity has developed methods to bypass these protections, essentially treating "do not crawl" signals as suggestions rather than binding technical contracts. This approach allows AI companies to harvest content that publishers explicitly wanted to keep restricted, creating an unfair competitive advantage that traditional search engines like Google typically respect.

Understanding Perplexity's Business Model and How It Works

Perplexity as a Google Search Competitor

Perplexity has positioned itself as the next evolution beyond traditional search engines, promising users direct answers instead of lists of links to explore. While Google shows you ten blue links and expects you to click through to find information, Perplexity's AI reads those sources for you and synthesizes a comprehensive answer in seconds. It's an undeniably appealing user experience that explains the company's rapid growth and substantial venture capital funding.

The business model represents a fundamental shift in how people access information online. Instead of driving traffic to original sources, Perplexity keeps users on its platform by providing complete answers. This "answer engine" approach has attracted millions of users who appreciate getting immediate, synthesized responses without the need to visit multiple websites and piece together information themselves.

However, this user-friendly approach creates an existential threat to content publishers. If users can get Encyclopedia Britannica's information without ever visiting Britannica's website, the publisher loses not just traffic but also subscription opportunities, advertising revenue, and the ability to build direct relationships with readers. The Perplexity lawsuit represents publishers fighting for their economic survival in an AI-dominated information landscape.

The 'Bullshit Machine' Controversy and Content Sourcing

Critics have dubbed Perplexity and similar AI answer engines as "bullshit machines"—a harsh but not entirely unfair characterization that highlights fundamental problems with how these systems operate. The term reflects concerns about AI tools that confidently present information without the rigorous fact-checking and editorial oversight that traditional publishers provide.

The controversy stems from how AI systems blend information from multiple sources, sometimes creating responses that sound authoritative but contain subtle errors or misinterpretations. Unlike human editors who understand context and can verify claims across multiple sources, AI systems can confidently combine incompatible information or misunderstand nuanced distinctions that matter enormously in fields like history, science, or law.

This content sourcing approach creates particular problems for reference publishers like Britannica and Merriam-Webster, whose entire value proposition depends on accuracy and reliability. When AI systems scrape their content and blend it with potentially less reliable sources, the result can be authoritative-sounding misinformation that damages the original publishers' reputations while providing no compensation for their editorial investment.

Revenue Sharing Programs vs. Content Theft Allegations

Interestingly, not all publishers view Perplexity as an enemy. The company has successfully negotiated revenue-sharing partnerships with media outlets like Time magazine and the Los Angeles Times, demonstrating that cooperation between AI companies and content creators is possible. These partnerships typically involve sharing advertising revenue when Perplexity cites content from partner publications.

These collaborative relationships reveal the complexity of the AI content ecosystem. Some publishers see AI answer engines as inevitable and prefer to negotiate favorable terms rather than fight losing battles in court. The revenue-sharing model acknowledges that AI companies derive value from publisher content and should compensate creators accordingly.

However, the existence of these partnerships also strengthens the plaintiffs' case in the current lawsuit. If Perplexity can afford to pay some publishers for content access, why should Britannica and Merriam-Webster receive their content for free? The selective approach to compensation suggests that Perplexity understands the value of publisher content but chooses to take it without permission when they believe they can avoid consequences.

The Broader Context - AI Companies vs. Content Publishers

Perplexity's Previous Legal Battles with Media Companies

The current lawsuit isn't Perplexity's first encounter with angry publishers. News Corp filed a similar lawsuit against the company in October 2024, alleging systematic copyright infringement and unfair competition. That earlier case established many of the legal arguments that Britannica and Merriam-Webster are now advancing, creating a pattern of litigation that suggests widespread industry dissatisfaction with Perplexity's practices.

The News Corp case was particularly significant because it involved major publications like The Wall Street Journal and The New York Post—outlets with significant legal resources and strong motivations to protect their content. The fact that these cases are proliferating suggests that publishers have moved beyond individual complaints to coordinated legal strategy designed to establish clear boundaries for AI content use.

These multiple lawsuits create compound legal pressure on Perplexity. Even if the company successfully defends against individual cases, the mounting legal costs and negative publicity could undermine their business model. Investors and users may lose confidence in a platform that faces constant litigation over its fundamental operational practices.

Industry-Wide AI Content Scraping Issues

The Perplexity controversies reflect broader tensions throughout the AI industry regarding content scraping and copyright compliance. Major AI companies like OpenAI, Google, and Anthropic all face similar challenges regarding training data and content sourcing, though they've generally been more careful about public-facing content presentation than Perplexity.

The stealth crawling practices that Perplexity allegedly employs are common throughout the AI industry, though most companies are more discreet about discussing these methods. The technical arms race between content creators trying to protect their material and AI companies trying to access it has become a defining characteristic of the modern internet.

What makes AI copyright lawsuits particularly complex is the scale involved. Traditional copyright disputes might involve one company copying one piece of content. AI training involves potentially millions of copyrighted works being processed and synthesized in ways that existing legal frameworks struggle to address. The Perplexity case could establish important precedents for how courts handle these large-scale, algorithmic copyright questions.

Mixed Industry Response - Collaboration vs. Litigation

The publishing industry's response to AI content scraping has been notably divided. While Britannica and Merriam-Webster have chosen litigation, other respected publishers have opted for collaboration. The World History Encyclopedia's partnership with Perplexity demonstrates how AI technology can enhance rather than replace traditional publishing.

In the World History Encyclopedia collaboration, Perplexity powers an AI chatbot that helps users navigate the encyclopedia's extensive database of academic articles and primary sources. This partnership preserves the original publisher's role as curator and authority while using AI to improve user access and engagement. Users get better search functionality, while the encyclopedia maintains traffic and attribution.

These successful partnerships suggest that the conflict between AI companies and publishers isn't inevitable. When AI tools clearly attribute sources, drive traffic back to original publishers, and share revenue appropriately, they can create win-win relationships. The problem arises when AI companies treat publisher content as free raw material for their own competing products.

What This Means for the AI Industry and Content Creation

Potential Outcomes of the Britannica-Merriam-Webster Lawsuit

The legal outcomes of this case could fundamentally reshape how AI companies operate and access content. If the plaintiffs prevail, other AI companies may need to dramatically change their content sourcing practices or face similar lawsuits. A significant monetary judgment could also establish the financial stakes for unauthorized content use, making licensing agreements more attractive than legal risks.

Several possible resolutions could emerge from this litigation. A favorable settlement might establish industry-standard licensing fees for reference content, creating a sustainable revenue model for publishers while allowing AI companies predictable access costs. Alternatively, a court ruling could establish clear boundaries around fair use in AI contexts, providing legal certainty for both sides.

The precedent implications extend far beyond Perplexity. Every major AI company watches these cases closely because they all face similar content sourcing challenges. A ruling that AI content scraping violates copyright could require massive changes to how AI systems are trained and operated, potentially slowing innovation but protecting content creators' economic interests.

Impact on AI Development and Training Practices

Regardless of the specific legal outcomes, this lawsuit and others like it are already influencing how AI companies approach content acquisition. More companies are proactively negotiating licensing agreements rather than assuming fair use protections cover their activities. This trend toward permission-based content access could slow AI development but create more sustainable relationships with content creators.

The technical implications are also significant. If stealth crawling practices face legal restrictions, AI companies may need to develop new methods for accessing training data. This could favor companies with existing content partnerships or those willing to invest heavily in licensing agreements. Smaller AI startups might find themselves at a disadvantage if content access becomes expensive.

The broader AI industry may also need to invest more heavily in content verification and attribution systems. If AI companies become liable for accurately citing sources and avoiding misattribution, they'll need better technical systems for tracking content provenance and ensuring accurate attribution in AI-generated responses.

Effects on Publishers and Content Creators

For publishers, this lawsuit represents both an opportunity and a crossroads. Success could establish their right to compensation when AI systems use their content, creating new revenue streams to offset declining traditional publishing economics. However, overly aggressive legal strategies could also alienate AI companies and reduce opportunities for beneficial partnerships.

The reference publishing industry faces particular challenges because their content is especially valuable for AI training. Dictionary definitions, encyclopedia entries, and other reference materials provide the kind of authoritative, structured information that AI systems need for accurate responses. This creates both leverage and vulnerability—their content is valuable but also particularly susceptible to wholesale copying.

Content creators across industries are watching this case for signals about how to protect their intellectual property in an AI-dominated landscape. The strategies that work for established publishers like Britannica and Merriam-Webster may not translate directly to individual creators, bloggers, or smaller publications, but the legal precedents will influence everyone's options.

Legal Analysis - Copyright Law Meets Artificial Intelligence

Fair Use Defense Strategies for AI Companies

Perplexity and other AI companies typically defend their content practices under fair use doctrine, arguing that their use of copyrighted material is transformative and serves different purposes than the original works. The fair use analysis considers factors like the purpose of use, the nature of the copyrighted work, the amount used, and the effect on the market for the original.

AI companies argue that their systems transform raw content into new, synthesized responses that serve different user needs than consulting the original sources directly. They contend that users seeking quick answers from AI tools wouldn't necessarily have purchased dictionary subscriptions or encyclopedia access, so there's minimal market harm to the original publishers.

However, the fair use defense faces significant challenges in this context. The systematic, commercial nature of AI content scraping doesn't fit neatly into traditional fair use categories, which were developed primarily for individual, educational, or commentary uses. When AI companies build billion-dollar businesses by processing millions of copyrighted works, courts may be skeptical of fair use claims.

Publishers' Rights in the Digital Age

Copyright law gives content creators exclusive rights to reproduce, distribute, and create derivative works from their original material. Publishers argue that AI systems violate all three rights by copying their content, distributing it through AI responses, and creating derivative works by synthesizing multiple sources into new presentations.

The compilation copyright that protects reference works like dictionaries and encyclopedias may be particularly relevant to this case. These publications gain copyright protection not just for individual entries but for the selection, arrangement, and coordination of information. When AI systems replicate this organizational structure, they may infringe on compilation copyrights even if individual facts aren't protectable.

Database rights, more common in European law but increasingly relevant in U.S. contexts, provide additional protections for the investment publishers make in collecting and organizing information. These rights recognize that even factual databases represent significant editorial and financial investment that deserves legal protection from wholesale copying.

What Readers and Internet Users Should Know

How to Verify AI-Generated Information

As AI answer engines become more prevalent, users need better skills for evaluating the information they provide. The most important practice is cross-referencing AI responses with original sources, especially for important decisions or controversial topics. AI systems can sound confident while being completely wrong, so independent verification remains essential.

Users should pay particular attention to how AI systems cite their sources. Vague attributions like "according to experts" or "studies show" are red flags that suggest the AI may be hallucinating or misremembering information. Quality AI systems should provide specific, verifiable citations that users can check independently.

The convenience of AI answers comes with trade-offs in reliability and completeness. While AI can quickly synthesize information from multiple sources, it may miss important nuances, recent developments, or conflicting perspectives that human experts would consider. For complex or consequential questions, AI should be a starting point for research rather than the final authority.

Understanding Source Attribution in AI Tools

Different AI platforms handle source attribution very differently, and users should understand these variations when evaluating information quality. Some systems like Perplexity attempt to cite specific sources, while others like ChatGPT traditionally provided responses without citations. The quality and accuracy of these citations varies enormously.

Users should be skeptical when AI systems attribute specific quotes or claims to well-known sources without providing direct links or page numbers. The Britannica and Merriam-Webster lawsuit highlights how AI systems can falsely attribute information to authoritative sources, potentially misleading users about the reliability of their responses.

Supporting original content creators remains important even when AI tools provide convenient access to information. When AI responses cite helpful sources, users should consider visiting those original sources directly, especially for topics they care about. This practice helps sustain the publishers and creators who produce the authoritative content that makes AI responses possible.

Industry Expert Opinions and Reactions

Legal Expert Perspectives on the Perplexity Case

Copyright attorneys following AI litigation see this case as potentially precedent-setting for how courts handle large-scale, algorithmic content use. The clear evidence of copying, including the plagiarism definition irony, gives plaintiffs unusually strong factual arguments that could overcome AI companies' fair use defenses.

Legal experts note that the trademark claims may prove even more significant than copyright issues. While copyright law includes fair use exceptions that might protect some AI activities, trademark law offers fewer defenses for using brands to endorse potentially inaccurate information. If courts find that AI hallucinations constitute trademark infringement when falsely attributed to trusted sources, it could force major changes in how AI systems present information.

The technical aspects of stealth crawling also present novel legal questions that courts haven't extensively addressed. If publishers can demonstrate that AI companies deliberately bypassed technical restrictions, it could establish that AI content scraping isn't passive fair use but active circumvention of publisher protections.

AI Industry Response to Content Copying Allegations

The AI industry's response to these lawsuits has been notably defensive, with companies emphasizing their innovative benefits while downplaying content creators' concerns. Many AI executives argue that their systems provide valuable services that ultimately benefit information access and shouldn't be constrained by outdated copyright frameworks designed for different technologies.

However, the proliferation of lawsuits has also encouraged more proactive licensing approaches. Companies that initially assumed fair use would protect their activities are increasingly negotiating permission-based content access agreements. This shift suggests that even AI companies recognize the legal risks of purely extractive content strategies.

Industry associations and AI advocacy groups generally support broad fair use interpretations that would protect current AI practices. They argue that overly restrictive copyright enforcement could stifle beneficial AI innovation and limit public access to information. However, they've also acknowledged the need for better attribution and compensation mechanisms for content creators.

Future Implications - Where AI and Copyright Law Are Heading

Proposed Solutions and Compromise Approaches

Several potential frameworks could resolve the tensions between AI innovation and content creator rights without requiring complete victory for either side. Compulsory licensing systems, similar to those used in music publishing, could establish standard rates for AI content access while ensuring creators receive compensation for their contributions.

Revenue-sharing models like those Perplexity has negotiated with some publishers could become industry standards, creating sustainable economic relationships between AI companies and content creators. These arrangements would acknowledge AI companies' need for content access while preserving creators' economic interests in their intellectual property.

Technical solutions for attribution and provenance tracking could also help address many concerns. If AI systems could accurately track and cite content sources, it would reduce trademark infringement risks while helping users understand the reliability and origins of AI-generated information. Blockchain and other distributed technologies might eventually provide comprehensive content tracking capabilities.

Regulatory and Legislative Responses

Government agencies and legislators are increasingly aware that existing copyright law may not adequately address AI content use scenarios. The scale and algorithmic nature of AI content processing creates challenges that traditional copyright frameworks, developed for human-scale copying, struggle to address effectively.

International coordination on AI copyright issues is becoming essential as AI companies operate globally while content creators seek protection in multiple jurisdictions. European database rights and other regional legal frameworks may influence how U.S. courts interpret AI copyright questions, especially for multinational companies and publishers.

Industry self-regulation initiatives are also emerging as companies recognize that proactive standards could prevent more restrictive government intervention. Technical standards for content attribution, ethical guidelines for AI training data, and voluntary compensation mechanisms might help establish stable industry practices before courts or legislators impose mandatory solutions.

Conclusion: The High-Stakes Battle Between AI Innovation and Content Protection

The lawsuit by Encyclopedia Britannica and Merriam-Webster against Perplexity represents far more than a simple copyright dispute—it's a defining moment that will shape the relationship between artificial intelligence and human knowledge creation for years to come. The irony of an AI company plagiarizing the definition of plagiarism perfectly encapsulates the broader tensions between technological innovation and intellectual property rights.

This case highlights fundamental questions about how we value and protect human expertise in an age of artificial intelligence. The encyclopedias and dictionaries that form the backbone of human knowledge didn't emerge spontaneously—they represent centuries of scholarly work, editorial judgment, and financial investment. If AI companies can freely harvest this content without compensation, they undermine the economic incentives that make such authoritative resources possible.

The outcome of this litigation will influence how AI companies access training data, how they present information to users, and whether content creators can maintain sustainable business models in an AI-dominated information landscape. For users, the stakes include the continued availability of authoritative, fact-checked information sources and the reliability of AI-generated responses.

As this legal battle unfolds, all stakeholders—AI companies, publishers, and users—have interests in finding solutions that preserve both innovation and content quality. The most likely positive outcomes will involve new forms of collaboration that compensate content creators while enabling AI advancement, rather than zero-sum conflicts that pit technological progress against intellectual property rights.

The Perplexity lawsuit may well be remembered as the case that established how artificial intelligence and human knowledge creation can coexist sustainably. Whether that coexistence emerges through legal precedent, industry compromise, or regulatory intervention, the fundamental questions raised by this case will influence the information ecosystem for generations to come.

MORE FROM JUST THINK AI

IDP: Build or Buy? Making the Right Decision

September 11, 2025
IDP: Build or Buy? Making the Right Decision
MORE FROM JUST THINK AI

OpenAI's First APAC Partner: A Game-Changer for Thinking Machines

September 10, 2025
OpenAI's First APAC Partner: A Game-Changer for Thinking Machines
MORE FROM JUST THINK AI

OpenAI's New Focus: The Team Reshaping ChatGPT's Personality

September 5, 2025
OpenAI's New Focus: The Team Reshaping ChatGPT's Personality
Join our newsletter
We will keep you up to date on all the new AI news. No spam we promise
We care about your data in our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.