Google tries a questionable new tactic to promote Gemini in Google Messages Summary AI applications, including Gemini and ChatGPT, had over half of summaries of news stories rated by journalists as having "significant issues." AI inaccuracies include false statements about health recommendations, current officeholders, and global events. An upcoming Google TV feature meant to summarize news with AI will involve human oversight.

From misconstruing jokes and memes as facts to outright hallucinating output that's not grounded in any existing information, artificial intelligence applications are infamously poor arbiters of reality. Today, the BBC's published the results of a small-scale research study that quantifies the issue. In a review of a handful of AI chatbots including Gemini and ChatGPT, journalists rated more than half of the apps' summaries of news stories and found that more than half had "significant issues of some form."

In the study, the BBC fed content from 100 of its news stories into ChatGPT, Microsoft Copilot, Gemini, and Perplexity AI. It asked for summaries of each story, then had "journalists who were relevant experts in the subject of the article" rate those summaries. According to the BBC, 51 percent of AI answers were flagged as having "significant issues of some form." Nearly one in five summaries included outright falsehoods, like "incorrect factual statements, numbers and dates."

Specifically, BBC cites some of the following flubs:

Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed "restraint" and described Israel's actions as "aggressive"

BBC says that Copilot and Gemini "had more significant issues" than ChatGPT or Perplexity. The outlet notes that it typically blocks AI chatbots from scraping its content, but that it allowed access during these tests, which took place in December.

Not a surprising result

If you've been following AI developments for the past couple of years, the results of these tests probably won't come as a shock. After years of seemingly manic development by some of the most highly funded organizations on the planet, AI is still notoriously unreliable for many purposes. AI-powered chatbot apps like Gemini and ChatGPT all carry a disclaimer to check results for accuracy.

In January, Apple pulled an iOS Apple Intelligence feature meant to summarize news stories after users found similar results to the BBC's more controlled study: summaries came through jumbled or, in some of the worst cases, included fabricated details. An upcoming Google TV feature is set to feature AI-summarized news stories, but, according to Google, there'll also be human involvement. That seems like the way to go -- though if a human is evaluating an AI-generated summary for accuracy, it does seem like a human may as well write the summary to begin with.

Rapid Reads News

AI is bad at news, BBC finds

POPULAR CATEGORY

corporate

tech

entertainment

research

misc

wellness

athletics