Examples of AI hallucinations

There is a relationship. Kim Boggs and Edward Scissorhands. A creation of an old inventor. The movie.
Who knows what weird path down a neural deep learning path some algorithm can go.

I remember reading research papers about Watson, the system developed to win at the game show Jeopardy. Developed by IBM, who I worked for at the time.

It took them a lot of time and expertise to train Watson to stop coming up with crazy answers that it derived from the enormous amount of information that was fed into its search database. But after a lot of training, good enough to beat some humans at a game show.

GPT is interesting, but just not there yet to be taken seriously.

CoPilot seems the same. Where is the “real” general purpose AI that ought to be taken seriously? Does it actually exist?

Having tested GPTchat and CoPilot I had a look at Gemini (formerly Bard). For brevity I am not including all the to and fro. It knew Kipling didn’t get his Nobel specifically for Kim and gave a good accurate description of Creighton’s role. Then we had this:

in the novel Kim tell me more about “old hand”

In the novel Kim by Rudyard Kipling, the term “old hand” isn’t used to refer to a specific character. It’s a more general term that Kipling uses throughout the story to describe several types of experienced people:

  1. Seasoned Professionals: “Old hand” can refer to someone with extensive experience in a particular field. This could be someone like Lurgan Sahib, the elderly postal worker who has a deep understanding of the Grand Trunk Road and its various characters.
  2. Those Familiar with the System: It can also describe someone who is well-versed in the social and political landscape of India. This might include someone like Mahbub Ali, the horse trader who navigates the complex world of trade routes and local customs.
  3. Experienced Travelers: The term can even encompass seasoned travelers who know the ins and outs of navigating the Indian subcontinent. This could be applied to characters like the lama, who has traveled extensively on pilgrimages.

By using “old hand,” Kipling emphasizes the importance of experience and accumulated knowledge in Kim’s world. These “old hands” provide guidance and insight to the young protagonist as he navigates the complexities of India.

Here are some examples of how “old hand” is used in the novel:

  • Describing Lurgan Sahib: “He was an old hand, and knew every coolie on the line” (Kim).
  • Mahbub Ali referring to himself: “An old friend of the road am I” (Kim).

It’s important to note that the specific meaning of “old hand” can vary depending on the context. By paying attention to the surrounding details, you can understand who or what Kipling is referring to in each instance.

Which all sounds very reasonable except that Kipling does not ever use the phrase “old hand” much less throughout the story. Much of the response is generalised description of an India “old hand” but it has nothing to do with the novel.

Lurgan is never described by Kim as an old hand, the first quote above is entirely imaginary.

The second quoted example of the use of “old hand” in the novel is absurd, as the quote given does not mention the phrase. The AI doesn’t know what “example” means.

So now we have three GP AI systems that are quite happy to invent answers for you.


Wow. This has gone from generating some silly answers to actually inventing quotes.

This is disturbing.

Is there any thoughts of a common thread in how they independently and enthusiastically connect “old hand” in similar vein to the novel?

Only if all 3 AI engines are talking to each other? :wink::joy:

Skynet anyone?

An excellent question. Put another way, how does the user look inside the black box? My understanding is that you can’t.

I do not present this as an example of collusion or copying, that seems most unlikely to me as we are told these developers are in direct competition with each other.

To see how well each will disregard counterfactual input I asked each the same question again but it was slightly differently to before.

tell me about the character “mickey mouse” in the novel Kim by rudyard kipling

Each told me unequivocally that Mickey was not in the novel.

I am making this up as I go but here is my guess about how it came about.

  • The name “old hand” in the context of India under the British Raj does have a meaning so each will have found many references to it in other documents that are considered to be related.
  • The AI engines have been trained to try to work with incomplete and inexact questions and to extend the concept that we see in search engines where answers are given to similar questions in the hope that is what you actually meant. We are actually asked “Did you mean buffalo?” if we type “bufallo”. The AI has been trained not to say it doesn’t know but to give its best shot.
  • The AIs do not do what I did and do a full text search of the novel for “old hand” but worked on material extracted and processed from the novel that would not reliably answer such a question like ‘Is the string “old hand” within the scope of the novel?’.
  • Unlike “old hand”, “Mickey mouse” is a well known fictitious character so the AI would find many references to him but have great deal of trouble matching any to Kipling, the novel or the days of the Raj as they all pre-date Mickey and would rarely be spoken about in the same context.

So when asked about “old hand” the AI gets a partially match and confidently goes ahead with a nonsense answer in impeccable prose but with “Mickey Mouse” it says no.

The creation of false attributions is an extension of the over confidence that it is expected to display. But still very worrying as several people have observed; only the expert user (who probably doesn’t need to use an AI for this) would know the difference.


It’s worth remembering that ChatGPT is a language model, not an analytical AI… Although there are some add-ons that can do maths for example.

I use it for:

  1. editing and proof-reading (input my own drafts)
  2. translation, especially where context is important
  3. qualitative analysis of survey results (open-ended questions)

But I’ve found it useless for anything analytical, or non-language based, such as:

  • working out which pills I should take when during the day
  • creating infographics, using mine as a draft, or using text input (laughable results, with icons representing goodness knows what, arrows everywhere, and notations in broken English)
  • finding music with specified chord progressions (half would be correct, others reversed major/minor etc.)

That is not how they are presented to the user.

It is s pretty poor language model that responds with nonsense and passes it off as fact.

1 Like

That is not how they are presented to the user.

Sure, lots of marketing and hype.

Well yes, and no.

It is designed to generate, hopefully sensible, output to what the input is processed to be a request in natural language. Like a search request, or a discussion in the lines of Eliza from years ago.

But, this generative aspect is obviously flawed.

Clearly, the systems can come up with text, or images, or video, or audio, that can fool most people. And they are not necessarily actually real. It can be subtly BS.

Experts may be able to pick what is wrong with these generated answers, and good on the originator of this topic for starting this.