Google Books, which indexes published material and has become essential for academics, has begun indexing low-quality books, which could impact how it informs its language tracking tool, Ngram.
404Media reports Google Books included several books that AI could have written. The publication searched Google Books with the term “as of my last knowledge update,” a common phrase chatbots like ChatGPT use. You can search Google Books for specific sentences or terms, and it will normally send back written works with those phrases.
It found that most of the books in the first few pages of the search were works about AI, but scattered among those results were ones that did not talk about the technology and seemed written by a bot.
404Media said the books it found, like Tristin McIver’s Bears, Bulls, and Wolves: Stock Trading for the Twenty-Year-Old, looked like these trawled Wikipedia for information about financial events and did include the sentence “as of my last knowledge update.” Other books on topics like Twitter still contained information from 2021, when some AI models would’ve last gotten training data.
Google Books makes up most of the data backing its Ngram viewer, a research tool that tracks how language has changed over time. Ngram takes information from written works to show how language usage evolves.
Google Books scanned and indexed written works dating back to the 1500s, and Ngram last updated the data it cites in 2019. Though Ngram is not perfect, many linguists and other academics use the tool for research gathering.
Google told 404Media that recent works on Google Books do not show up on Ngram results, but it is possible that these might make it into future data updates.