The report titled 'Controversial Practices of Perplexity AI: Scraping, Plagiarism, and Legal Scrutiny' examines the controversies surrounding the artificial intelligence startup Perplexity AI. The primary issues revolve around unauthorized content scraping, plagiarism, and creating fake quotes, which have drawn substantial media and legal attention. Amazon Web Services (AWS) is spearheading the investigation into these practices based on reports from media outlets like WIRED and Forbes. Perplexity AI, backed by notable investors including Jeff Bezos and Nvidia, faces allegations resulting in heightened scrutiny and legal challenges from content publishers. Despite these challenges, the company asserts compliance with terms of service and maintains that its operations involve aggregation rather than illegal content scraping.
Perplexity AI is an artificial intelligence startup based in San Francisco, California. The company is known for its search tools that enable users to get instant answers to questions, complete with sources and citations, leveraging a variety of large language models from OpenAI and Meta's open-source model Llama. Established with the mission to rival Google in the business of searching for information, Perplexity AI has made significant strides in its operations, serving over 500 million queries in 2023 with minimal marketing expenses.
Perplexity AI has attracted significant financial backing from several high-profile investors. As of January, the company raised $73.6 million from a group of investors that include Nvidia and Jeff Bezos, at a valuation of $520 million. More recently, Japanese technology investor SoftBank Group Corp's Vision Fund 2 planned to invest between $10 million and $20 million in Perplexity AI, valuing the company at $3 billion as part of a larger $250 million funding round. Despite SoftBank and Perplexity AI declining to comment on the finer details, the significant sums highlight the confidence investors have in the startup's potential.
Perplexity AI aims to establish itself as a leader in the search engine market, challenging industry giants like Google. Its advanced AI-driven search tools are designed to provide users with quick, accurate information from reputable sources. The startup's ability to raise substantial funding and acquire high-use metrics such as serving over 500 million queries in a year positions it strongly in the competitive market. With financial backing from prominent technology investors, Perplexity AI's market position appears robust despite the current controversies surrounding its practices.
A WIRED investigation detailed that Perplexity AI was accused of scraping content from websites without permission and even generating fake quotes. WIRED's analysis, alongside developer Robb Knight's findings, suggested that Perplexity's bot was bypassing the Robots Exclusion Protocol to access restricted web areas. Perplexity was observed utilizing an unpublicized IP address to achieve this, which reportedly visited Condé Nast properties, including WIRED, at least 822 times over three months.
Amazon, leveraging its Web Services platform, initiated a review following the findings presented by WIRED. Amazon's spokesperson, Samantha Mayowa, stated that the company routinely examines reports of abuse associated with its services, reaffirming their stance against abusive and illegal activities as dictated by their terms of service. Amazon's steps followed allegations that Perplexity AI was using AWS servers to host its crawler, which allegedly ignored the Robots Exclusion Protocol. Perplexity spokesperson Sara Platnick maintained that their services were compliant with AWS terms and that they had responded to Amazon's inquiries accordingly.
Perplexity AI, through its CEO Aravind Srinivas and spokesperson Sara Platnick, denied any wrongdoing. Srinivas highlighted that the company's operations did not involve illegal scraping of content and emphasized that they only aggregated information generated by other AI systems. However, Srinivas acknowledged Forbes' preference for more prominent source citations. Furthermore, Perplexity admitted to employing third-party web crawlers, which could include the bot identified by WIRED. Platnick confirmed that while PerplexityBots respect the Robots Exclusion Protocol generally, they might ignore it if a specific URL is included in a user query.
On June 6, 2024, Forbes released an investigative article on Eric Schmidt's AI-drone startup. The following day, Perplexity AI published an AI-generated webpage using its new 'Perplexity Pages' feature, which heavily drew from this article without prominent attribution. John Paczkowski, an executive editor at Forbes, criticized Perplexity on social media for citing Forbes and a few other reblogs in a way that marginalized Forbes' original reporting. Following this, Perplexity updated its webpage to better cite Forbes' work. Despite this, an AI-generated Perplexity podcast about the same topic failed to mention Forbes, which led Forbes to accuse Perplexity of plagiarism multiple times, including allegations of content theft from CNBC and Bloomberg. Reports from Wired also found Perplexity paraphrasing its stories inaccurately and without proper attribution, while potentially bypassing publisher settings intended to block AI web scraping. Perplexity's CEO, Aravind Srinivas, responded by emphasizing that the company acts as an aggregator of information rather than an outright content ripper.
The Associated Press uncovered instances where Perplexity AI's products fabricated quotes. In one case, Bill Rossi, a former town official from Martha's Vineyard, was falsely quoted about his views on marijuana legalization. Rossi denied ever making the statements attributed to him. Perplexity CEO Aravind Srinivas admitted the 'Writing' feature of their service, which helps compose text without searching the web, was more prone to 'hallucinations,' leading to fabricated information. Forbes has taken legal steps against Perplexity for using its content without permission, demanding changes in how Perplexity cites sources and reimbursement for any generated advertising revenue.
The controversy around Perplexity AI has drawn significant media attention. Reports from Forbes, Wired, and The Associated Press have highlighted issues of content misuse and fake quote generation. Srinivas has acknowledged some of these issues, attributing problems to the early development ('rough edges') of Perplexity's products. Despite some updates to improve source citation, Perplexity has faced ongoing scrutiny and legal threats. The company has expressed interest in forming revenue-sharing partnerships to compensate news publishers when their content is used. Media critiques suggest that while Perplexity's improvements are steps forward, more fundamental respect for original content creators is necessary.
The controversy surrounding Perplexity AI has significant implications for journalism and content ownership. According to reports from Associated Press and WinBuzzer, the startup has been accused of scraping content from websites without permission, violating terms of service, and plagiarizing articles from outlets like Forbes. This behavior undermines the value and ownership of original journalistic work, raising concerns about intellectual property rights and the ethical use of content in AI systems.
The industry is divided on the issues of content scraping and reuse. Mustafa Suleyman, head of Microsoft's AI division, argued that content on the open web is free to use, claiming this has been a long-standing assumption since the 1990s. However, this view conflicts with U.S. copyright laws, which protect created works automatically. Suleyman conceded that websites using robots.txt to exclude scraping should have their preferences respected, but this introduces a layer of complexity, as there is no universal consensus on the matter. The debate highlights the tension between advancing AI technologies and respecting intellectual property rights.
Perplexity AI faces several legal challenges. Amazon, via its Web Services division, is investigating the startup for potential breaches of its terms of service due to unauthorized web scraping. Perplexity's CEO has defended the company, claiming that their services are compliant and do not violate AWS terms. However, allegations of generating fake quotes and plagiarizing content have spurred ongoing scrutiny. This legal turmoil underscores the necessity for AI firms to navigate the complex landscape of content ownership and adherence to legal standards.
Perplexity AI's challenges highlight the critical intersection between AI technology and content ownership. The startup, despite substantial financial investment from prominent figures like Jeff Bezos and firms such as Nvidia, is under fire for alleged content misuse, including unauthorized scraping, plagiarism, and generating fabricated quotes. These accusations, coupled with legal threats and investigations by AWS, underscore the importance of ethical standards in AI-driven content generation. Notably, the media criticism from WIRED and other outlets prompts a reevaluation of content usage practices in the AI industry. Going forward, it will be crucial for firms like Perplexity AI to develop frameworks that respect intellectual property rights while advancing AI technologies. Future research should focus on the long-term implications for the AI sector and law enforcement of content ownership standards. Practical solutions might involve forming revenue-sharing partnerships with content creators to ensure fair use and respect for original work.
An AI startup focused on developing a search engine that aggregates content from various sources. The company has faced allegations of unauthorized content scraping, plagiarism, and generating fake quotes. Despite these controversies, Perplexity AI has secured significant funding from investors, including Jeff Bezos.
A subsidiary of Amazon providing on-demand cloud computing platforms. AWS is investigating Perplexity AI for potential violations of its terms of service related to unauthorized content scraping. The case is crucial for understanding the enforcement of content ownership standards in cloud services.
Founder of Amazon and an investor in Perplexity AI. His involvement brings significant attention to the startup and its controversies. Bezos' backing underscores the connection between major tech investments and emerging AI technologies.
A technology-focused publication that has been at the forefront of reporting on Perplexity AI's alleged content scraping practices. Their investigations have played a key role in bringing the ethical concerns surrounding Perplexity AI to light.
A global media company known for its business and technology news. Forbes has accused Perplexity AI of plagiarism and improper citation of their content, leading to legal threats and increased scrutiny of the startup's practices.