Machine entities who have as much, or more, intelligence as human beings and who have the same emotional potentialities in their personalities as human beings… Most computer theorists believe that once you have a computer which is more intelligent than man and capable of learning by experience, it’s inevitable that it will develop an equivalent range of emotional reactions — fear, love, hate, envy, etc.”
Stanley Kubrick, quoted in “Stanley Kubrik’s astute prediction regarding the future of AI” [faroutmagazine.co.uk] published on Far Out magazine
If Stanley Kubrick was “right” about AI… and they are referencing 2001 (not A.I. Artificial Intelligence), then please let’s stop AI now. It didn’t end well for Dave or Frank or anyone else on the Discovery.
But I think we are a long way from AIs “learning by experience”. Today there is a iterative process to training the current batch of AIs, but it’s humans who are still learning what needs to be improved and making adjustments in the system based on the experience. It is, in fact, what and how they learn that I think is going to be the problem. Maybe in fixing that we will make a true AI like all those AIs in scifi, one that can truly learn from experiance, and then it may well be that it goes down dark path just like HAL. Like Ultron, like Skynet and many more.
Today, we do seem to be running full steam into a risk of a different kind; not a true thinking machine that decides to destroy us, but a downward spiral where we are undone by AIs that are not actually intelligent, but are tricking us with what amounts to mind reading parlor tricks turned up to 11 with massive amounts of computing power. The AIs that are in the news are just menatlists predicting that you are thinking of a gray elephant from Denmark. They make the decision not based on carefully constructed series of questions that will lead almost everyone to think of a gray elephant from Denmark, but it’s the same principle, the same tool set; statistics, that allows them to answer any question you ask. They only decide based on thier statistics what the next letter, or word, or phrase —the next ‘token’— of the output should. And of couse the statistics are very complex, ‘h’ does not always follow ‘t’ and then ‘e’ sometimes, given the context of the input and where the AI is in the output ‘e’, then ‘x’ and then ‘t’ could follow ‘t’.
It’s how the statistics the AI relies on are created and how we are starting to use these AIs that creates the issue, the issue of feedback loops. The current batch of headline stealing AIs, based on Large Language Models (LLMs) are trained on content scraped from the internet. The same way that Google or Bing scrape the internet and buld ‘indexes’ that allow you to find the particular needle you are looking for. By sucking down all the text they can from the internet and processing it to build their indexes, search enginers can quickly match your input in the search box and give you the results. Over the years much work has gone into adding things to the ‘algorithm’ that creats the index and processes your input to make the reslts better. Fixing spelling mistakes and typos, looking for related terms, raking results based on how many other sites link to the result, etc. AI is doing a similar thing, taking your input and returning results, but rather than returing a web page, the AIs are constructing ‘new’ output based on the statistics they calcualted from their web scraping. The fatal feedback will come as people —and companies— start to use AIs to generate the very content that makes up the internet. The AIs will start to eat themselve and their kid. AI canibalism.
People already have a massive issue with identifying correct information on the internet. Social media is a dumpster fire of self proclaimed experts who are, at best misguided and delusional, and at worst deceitful and nefarious. LLMs trained on this poor quality data may learn internet colloquial syntax and vocabulary, they may be able to speak well and sound like they know what they are talking about, but they are not learning any other subject. They are not able to understand right from wrong, incorrect from correct, they didn’t study medicine or history, only the structure of langauge on the internet. The vast size of the models and the volume of training data and the clever tricks of the researchers and developers impress us, but it’s just a better mentalist. LLMs have only a statistical reasoning of what to say not any understanding, or knowledge of why it should be said. Ironically what the AIs actually lack is intelligence.
This is quickly becoming a problem as people and companies embrace LLMs to generate content faster a cheaper, to drive traffic to their websites and earn advertising money. Without human experts in the loop to review and revise the AIs output you end up with hallucinations, or “a confident response by an AI that does not seem to be justified by it s training data,” as Wikipedia [wikipedia.org] explains. Wikipedia also gives an example: “a hallucinating chatbot might, when asked to generate a financial report for Tesla, falsely state that Tesla’s revenue was $13.6 billion (or some other random number apparently “plucked from thin air”).”
Again, the problem is that the LLM lacks any knowledge or understanding of the subject, it can’t analyze the question or it’s own answer except from the point of view that, statistically based on its training data it should output the tokens “13.6 billion” in this situation. It’s amazing how much they do get right, how lucid they seem. This is down to their size and complexity. But if people blindly accept the AIs output and post these hallucinations to the internet —even to point out they are wrong as I’m doing— then the next batch of training data will be polluted with yet more inaccurate data and over time the whole model may start to be overwhelmed by some sort of delirium, a digital mad cow disease.
Mad Cow Disease [wikipedia.org] was caused by caused by (or really spread by) feeding cows the remains of their own dead to save money, to reuse the parts of the cow with no other use, it was ground it up and added it to cow feed. This allowed, it seems, a random harmful mutation in a prion, maybe only in a single cow, to be ingested by more cows who were, in turn, ground up and the cycle repeated and the disease spread. Now the LLMs will be fed their own AIs output and the output of competing AIs in the next training cycle. So a hallucination from one AI makes it’s way, like a defective prion, into the model of another AI to be regurgitate, and the cycle repeats allowing this digital mad cow disease to spread. And like real world mad cow disease humans who digest this infected content may come down with a digital analog of Variant Creutzfeldt–Jakob Disease (vCJD) [wikipedia.org]. Let’s hope digital vCJD is not irreverable and 100% fatal like it’s namesake.
Maybe the people who design these systems will figure out how to inject some sort of fact checking and guidelines. But who decides what us write when we can’t even agree on facts? The internet is filled with bigots and conspiracy theorists, well meaning idiots and scam artists, trolls and shills. How will AIs be any different? Maybe we will find a way to regulate things so AIs can’t be fed their own verbal diarrhea but who decides what is right and wrong? Who decides what is morally or socially acceptable?
This post is a good example of the problem, should LLMs use it as training data, it’s pure opinion by someone unqualified, I studied the principles of AI and built neural networks back in college, but I’m not an expert, the content of this post could be useful to an AI answering a question about the perception of the problem among the general public or their concerns, but it should not be used as factual or confused with expert opinion, it should not be used to answer a question about “what are the risks of training AIs on data scraped from the internet”. How does and AI trained on the internet know the difference? We seem to have jumped out of the plane without checking if we packed the parachute.
One note on the article where I got the quote: Intake issue with the fact that it talks about the all effort the Kubrick put in to make 2001 as accurate as possible, speaking with experts for “countless hours”, but it fails to mention that the 2001 was co-written with an actual scientist – Arthur C. Clark.