The Controversial Rise of Perplexity AI: A Battle Over Information Control and Profit - New Summarization Technology Sparks Debate on Data Ownership and Monetization
CEO Aravind Srinivas has stated that Perplexity AI collects data aggressively.
Perplexity AI aims to provide answers directly, potentially starving primary sources of ad revenue.
Perplexity AI is a startup backed by tech giants like Google, Jeff Bezos, and Susan Wojcicki.
Perplexity AI's mission is to create an 'answer engine' instead of a traditional search engine.
Title: The Controversial Rise of Perplexity AI: A Battle Over Information Control and Profit
Lead:
Perplexity AI, a startup backed by tech giants like Google, Jeff Bezos, and Susan Wojcicki, is shaking up the information industry with its innovative summarization technology. However, its aggressive approach to data collection has sparked controversy among publishers and raised concerns over plagiarism and copyright infringement.
Background:
Perplexity AI's mission is to create an 'answer engine' instead of a traditional search engine. The company aims to provide answers directly, potentially starving primary sources of ad revenue. Perplexity CEO Aravind Srinivas has stated,
Amazon's cloud division, AWS, is investigating Perplexity AI over allegations of scraping abuse.
Perplexity AI appears to rely on content from scraped websites that had forbidden access through the Robots Exclusion Protocol.
Perplexity CEO Aravind Srinivas refused to name the third-party company performing web crawling and indexing services for the company citing a nondisclosure agreement.
Perplexity maintains that its PerplexityBot ignores robots.txt when a user enters a specific URL in their prompt for an infrequent use-case.
Accuracy
Perplexity maintains that its PerplexityBot, which runs on AWS, respects robots.txt and does not crawl in any way that violates AWS Terms of Service.
Perplexity has been ignoring robots.txt code, allowing third-party scrapers to violate it for their ‘answer engine’
Deception
(30%)
The article reports on Perplexity AI's alleged scraping of websites that have forbidden access through the Robots Exclusion Protocol. The authors quote WIRED's previous findings and confirmations from various sites that Perplexity had accessed their properties despite robots.txt files. However, the authors do not disclose whether they have confirmed if Perplexity is still scraping these sites or if this was a past issue. Additionally, the article reports on Perplexity's response to WIRED's investigation and their claim that a third-party company performs web crawling and indexing services for them. The authors do not provide any evidence or confirmation of this claim, making it an unsubstantiated statement.
Perplexity CEO Aravind Srinivas responded to WIRED’s investigation first by saying the questions we posed to the company ‘reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.’
An AWS spokesperson, who talked to WIRED on the condition that they not be named, confirmed the company’s investigation of Perplexity.
The Amazon spokesperson told WIRED that AWS customers must adhere to the robots.txt standard while crawling websites.
Platnick says PerplexityBot will ignore robots.txt when a user enters a specific URL in their prompt—a use-case Platnick describes as ‘very infrequent.’
Fallacies
(80%)
The authors make an appeal to authority by stating that 'most have traditionally respected' the Robots Exclusion Protocol and quoting an AWS spokesperson. They also state that 'Digital Content Next is a trade association for the digital content industry whose members include The New York Times, The Washington Post, and Condé Nast. Last year, the organization shared draft principles for governing generative AI to prevent potential copyright violations.' This implies that these organizations have authority on the matter of web scraping and respect for robots.txt files.
][The authors] The Robots Exclusion Protocol is a decades-old web standard that involves placing a plaintext file (like wired.com/robots.txt) on a domain to indicate which pages should not be accessed by automated bots and crawlers. While companies that use scrapers can choose to ignore this protocol, most have traditionally respected it.[/]
[The authors] Digital Content Next is a trade association for the digital content industry whose members include The New York Times, The Washington Post, and Condé Nast. Last year, the organization shared draft principles for governing generative AI to prevent potential copyright violations.
Bias
(80%)
The authors of the article state that Perplexity AI is scraping websites that have forbidden access through the Robots Exclusion Protocol. While this is not a legal violation, it goes against traditional web standards and terms of service. The authors also mention that Perplexity has been accused of stealing articles and plagiarism, which could be seen as an ethical issue.
Amazon's cloud division has launched an investigation into Perplexity AI. At issue is whether the AI search startup is violating Amazon Web Services rules by scraping websites that had forbidden access through the Robots Exclusion Protocol, a common web standard.
WIRED investigations confirmed the practice and found further evidence of scraping abuse and plagiarism by systems linked to Perplexity's AI-powered search chatbot.
In April, Perplexity CEO Aravind Srinivas told Forbes that 'Citations are our currency'.
Perplexity marks up its short summaries with footnotes that are supposed to link to recent and reliable sources of real-time information.
A study conducted by GPTZero determined that Perplexity is citing AI-generated blog posts on various topics including travel, sports, food, technology and politics.
Accuracy
Perplexity maintains that its PerplexityBot, which runs on AWS, respects robots.txt and does not crawl in any way that violates AWS Terms of Service.
Deception
(30%)
The article makes several statements about Perplexity's use of AI-generated sources without explicitly stating that these are the author's opinions. This is a form of selective reporting and emotional manipulation as it presents the information in a way that implies Perplexity's actions are problematic, but does not provide enough context to fully understand the situation. The article also uses sensational language such as 'drawing information from and citing AI-generated posts on a wide variety of topics including travel, sports, food, technology and politics.' This is an exaggeration that creates a misleading impression. Furthermore, the author quotes GPTZero CEO Edward Tian stating 'If the sources are AI hallucinations, then the output is too.' This is an editorializing statement that goes beyond reporting the facts.
On average, Perplexity users only need to enter three prompts before they encounter an AI-generated source, according to the study
Searches like ‘cultural festivals in Kyoto, Japan,’ ‘impact of AI on the healthcare industry,’ ‘street food must-tries in Bangkok Thailand,’
The study determined if a source was AI-generated by running it through GPTZero’s AI detection software, which provides an estimation of how likely a piece of writing was written with AI with a 97% accuracy rate; for the study, sources were only considered AI-generated if GPTZero determined with at least 95% certainty that they were written with AI
Fallacies
(80%)
The author makes an appeal to authority by quoting the CEO of GPTZero, Edward Tian, stating that 'If the sources are AI hallucinations, then the output is too.' This statement implies that if a source is identified as being generated by AI and contains inaccurate information, then Perplexity's output will also be inaccurate. However, this does not necessarily mean that all AI-generated sources contain inaccuracies or that Perplexity's output will always be incorrect when it cites such sources.
“Perplexity is only as good as its sources.”
“If the sources are AI hallucinations, then the output is too.”
Bias
(90%)
The author expresses concern about Perplexity's citation of AI-generated sources and quotes CEO Aravind Srinivas stating 'Citations are our currency.' However, the article itself goes on to provide examples of Perplexity citing inaccurate and outdated information from AI-generated sources. This demonstrates a bias against the use of AI-generated sources as reliable information.
According to a study conducted by AI content detection platform GPTZero, Perplexity’s search engine is drawing information from and citing AI-generated posts on a wide variety of topics including travel, sports, food, technology and politics.
On average, Perplexity users only need to enter three prompts before they encounter an AI-generated source...
Perplexity is increasingly citing AI-generated blog posts on a wide variety of topics...
Perplexity is a startup aiming to create an 'answer engine' instead of a search engine.
Perplexity wants to provide answers directly, which may starve primary sources of ad revenue.
Perplexity CEO Aravind Srinivas used a fake academic identity to scrape Twitter data for research purposes.
Accuracy
]Perplexity wants to provide answers directly, which may starve primary sources of ad revenue.[
Perplexity's Pages product creates summaries based on primary sources and plagiarizes content without proper citation or permission.
Perplexity has been ignoring robots.txt code, allowing third-party scrapers to violate it for their 'answer engine'
Deception
(35%)
The article contains several examples of deception. The author uses editorializing language and makes accusations against Perplexity without providing concrete evidence. She also employs sensationalism by implying that Perplexity is a 'vampire' and a 'rent-seeking middleman.' Furthermore, the author selectively reports information by focusing on instances where Perplexity has allegedly plagiarized content from other sources, while ignoring the potential benefits of their technology. Lastly, the author makes an unfounded claim that Perplexity is surfacing AI-generated results and actual misinformation without providing any evidence.
Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.
Perplexity has taken it a step further with its Pages product, which creates a summary ‘report’ based on those primary sources. It’s not just quoting a sentence or two to directly answer a user’s question – it’s creating an entire aggregated article, and it’s accurate in the sense that it is actively plagiarizing the sources it uses.
Perplexity is trying to create a ‘answer engine.’ The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you.
Forbes discovered Perplexity was dodging the publication’s paywall in order to provide a summary of an investigation the publication did of former Google CEO Eric Schmidt’s drone company. Though Forbes has a metered paywall on some of its work, the premium work – like that investigation – is behind a hard paywall.
Fallacies
(50%)
The author makes several informal fallacies throughout the article. The most prominent one is an appeal to emotion when she describes Perplexity as a 'rent-seeking middleman on high-quality sources' and 'a group of vampires'. She also uses hyperbole when she states that Perplexity has taken aggregation to a 'remarkable' scale. Additionally, the author makes an ad hominem fallacy when she refers to Srinivas as someone who feels free to lie whenever it is more convenient.
Perplexity is trying to create an ‘answer engine.’ The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you.
Forbes discovered Perplexity was dodging the publication’s paywall in order to provide a summary of an investigation the publication did of former Google CEO Eric Schmidt’s drone company. Though Forbes has a metered paywall on some of its work, the premium work – like that investigation – is behind a hard paywall. Not only did Perplexity somehow dodge the paywall but it barely cited the original investigation and ganked the original art to use for its report.
Perplexity cannot generate actual information on its own and relies instead on third parties whose policies it abuses. The ‘answer engine’ was developed by people who feel free to lie whenever it is more convenient, and that preference is necessary for how Perplexity works.
Bias
(0%)
The author demonstrates a clear bias against Perplexity and its business practices. The author uses language that depicts Perplexity as unethical and dishonest, referring to it as a 'rent-seeking middleman', 'vampire', and 'answer engine' that starves primary sources of ad revenue. The author also accuses Perplexity of plagiarism, copyright infringement, and ignoring robots.txt codes. These statements are not objective facts but rather the author's opinion.
But Perplexity has taken it a step further with its Pages product, which creates a summary ‘report’ based on those primary sources. It’s not just quoting a sentence or two to directly answer a user’s question – it’s creating an entire aggregated article, and it’s accurate in the sense that it is actively plagiarizing the sources it uses.
Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.
Perplexity is trying to create an ‘answer engine.’
Perplexity is using content from publishers to create summaries, guide pages, and even verbatim copies.
Forbes has threatened Perplexity with legal action for reposting copyrighted material.
Traditional publishers are accusing Perplexity of plagiarism and outright theft of content.
Accuracy
Perplexity maintains that its PerplexityBot respects robots.txt and does not crawl in any way that violates AWS Terms of Service.
Perplexity has come under fire for republishing the work of journalists without proper attribution.
Deception
(30%)
The article contains selective reporting and emotional manipulation. The author chooses to focus on the conflicts between publishers and Perplexity without providing a balanced perspective or mentioning any potential benefits of Perplexity's summaries. Additionally, the author uses emotionally charged language such as 'bulls--t machine' when describing Perplexity, which is an attempt to manipulate readers' emotions against the company.
The smaller companies have less to lose if a court rules against them, but the deep-pocketed giants are always ready to move in and occupy the terrain that the startups open up.
Perplexity Plagiarized Our Story About How Perplexity Is a Bulls--t Machine.
The big picture: From disputes over the use of "publicly available" material to train AI models, the argument between information providers and AI companies is now shifting to the creation of summaries, "guide" pages and even verbatim copies of previously published material.
Fallacies
(100%)
None Found At Time Of
Publication
Bias
(90%)
The authors use language that depicts Perplexity as a 'bulls--t machine' and accuses it of plagiarism without providing any evidence other than the opinions of Wired. They also quote Wired's accusation that Perplexity is not obeying websites' rules for what content can be accessed by site-scraping robots, but do not mention that the problem stemmed from a third-party web-crawler and not Perplexity's own bot.
Perplexity Plagiarized Our Story About How Perplexity Is a Bulls--t Machine.
The publication says Perplexity reposted it and then turned it into a podcast and a YouTube video.