By now I’m sure you’ve heard that the Internet is obsessed with ChatGPT. I’ve been watching from the sidelines as people find more and more uses for our current favorite large language model (LLM) toy. Why a toy and not a full-blown solution to all our ills? Because ChatGPT has one glaring flaw that I can see right now that belies its immaturity. ChatGPT knows everything. Or at least it thinks it does.
Unknown Unknowns
If I asked you the answer to a basic trivia question you could probably recall it quickly. Like “who was the first president of the United States?” These are answers we have memorized over the years to things we are expected to know. History, math, and even written communication has questions and answers like this. Even in an age of access to search engines we’re still expected to know basic things and have near-instant recall.
What if I asked you a trivia question you didn’t know the answer to? Like “what is the name of the metal cap at the end of a pencil?” You’d likely go look it up on a search engine or on some form of encyclopedia. You don’t know the answer so you’re going to find it out. That’s still a form of recall. Once you learn that it’s called a ferrule you’ll file it away in the same place as George Washington, 2+2, and the aglet as “things I just know”.
Now, what if I asked you a question that required you to think a little more than just recalling info? Such as “Who would have been the first president if George Washington refused the office?” Now we’re getting into more murky territory. Instead of being able to instantly recall information you’re going to have analyze what you know about the situation. For most people that aren’t history buffs they might recall who Washington’s vice president was and answer with that. History buffs might take more specialized knowledge about matters would apply additional facts and infer a different answer, such as Jefferson or even Samuel Adams. They’re adding more information to the puzzle to come up with a better answer.
Now, for completeness sake, what if I asked you “Who would have become the Grand Vizier of the Galactic Republic if Washington hadn’t been assassinated by the separatists?” You’d probably look at me like I was crazy and say you couldn’t answer a question like that because I made up most of that information or I’m trying to confuse you. You may not know exactly what I’m talking about but you know, based on your knowledge of elementary school history, that there is no Galactic Republic and George Washington was definitely not assassinated. Hold on to this because we’ll come back to it later.
Spinning AI Yarns
How does this all apply to a LLM? The first thing to realize is that LLMs are not replacements for search engines. I’ve heard of many people asking ChatGPT basic trivia and recall type questions. That’s not what LLMs are best at. We have a multitude of ways to learn trivia and none of them need the power of a cloud-scale computing cluster interpreting inputs. Even asking that trivia question to a smart assistant from Apple or Amazon is a better way to learn.
So what does an LLM excel at doing? Nvidia will tell you that it is “a deep learning algorithm that can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive datasets”. In essence it can take a huge amount of input, recognize certain aspects of it, and produce content based on the requirements. That’s why ChatGPT can “write” things in the style of something else. It knows what that style is supposed to look and sound like and can produce an output based on that. It analyzes the database and comes up with the results using predictive analysis to create grammatically correct output. Think of it like Advanced Predictive Autocorrect.
If you think I’m oversimplifying what LLMs like ChatGPT can bring to the table then I challenge you to ask it a question that doesn’t have an answer. If you really want to see it work some magic ask it something oddly specific about something that doesn’t exist, especially if that process involves steps or can be broken down into parts. I’d bet you get an answer at least as many times as you get something back that is an error message.
To me, the problem with ChatGPT is that the model is designed to produce an answer unless it has specifically been programmed not to do so. There are a variety of answers that the developers have overridden in the algorithm, usually something racially or politically sensitive. Otherwise ChatGPT is happy to spit out lots of stuff that looks and sounds correct. Case in point? This gem of a post from Joy Larkin of ZeroTier:
https://mastodon.social/@joy/109859024438664366
Short version: ChatGPT gave a user instructions for a product that didn’t exist and the customer was very frustrated when they couldn’t find the software to download on the ZeroTier site. The LLM just made up a convincing answer to a question that involved creating something that doesn’t exist. Just to satisfy the prompt.
Does that sound like a creative writing exercise to you? “Imagine what a bird would look like with elephant feet.” Or “picture a world where people only communicated with dance.” You’ve probably gone through these exercises before in school. You stretch your imagination to take specific inputs and produce outputs based on your knowledge. It’s like the above mention of applied history. You take inputs and produce a logical outcome based on facts and reality.
ChatGPT is immature enough to not realize that some things shouldn’t be answered. If you use a search engine to find the steps to configure a feature on a product the search algorithm will return a page that has the steps listed. Are the correct? Maybe. Depends on how popular the result is. But the results will include a real product. If you search for nonexistent functionality or a software package that doesn’t exist your search won’t have many results.
ChatGPT doesn’t have a search algorithm to rely on. It’s based on language. It’s designed to approximate writing when given a prompt. That means, aside from things it’s been programmed not to answer, it’s going to give you an answer. Is it correct? You won’t know. You’d have to take the output and send it to a search engine to determine if that even exists.
The danger here is that LLMs aren’t smart enough to realize they are creating fabricated answers. If someone asked me how to do something that I didn’t know I would preface my answer with “I’m not quite sure but this is how I think you would do it…” I’ve created a frame of reference that I’m not familiar with the specific scenario and that I’m drawing from inferred knowledge to complete the task. Or I could just answer “I don’t know” and be done with it. ChatGPT doesn’t understand “I don’t know” and will respond with answers that look right according to the model but may not be correct.
Tom’s Take
What’s funny is that ChatGPT has managed to create an approximation of another human behavior. For anyone that has ever worked in sales you know one of the maxims is “never tell the customer ‘no'”. In a way, ChatGPT is like a salesperson. No matter what you ask it the answer is always yes, even if it has to make something up to answer the question. Sci-fi fans know that in fiction we’ve built guardrails for robots to save our society from being harmed by functions. AI, no matter how advanced, needs protections from approximating bad behaviors. It’s time for ChatGPT and future LLMs to learn that they don’t know everything.