If it is constantly making serious errors, and "hallucinations", what is it good for?

I wouldn't say that it is _constantly_ making serious errors and hallucinations. The key point to note is that hallucinations are only likely to happen when you ask an LLM about something for which it has little training data on. The training data set for a modern LLM like GPT-4 is huge---reportedly over 1.7 trillion parameters and over 1 petabyte (or 1 million GB) in data size. So it can provide good answers to lots of different prompts, as long as you're not asking about a highly specialized topic. Khan Academy's AI assistant, Khanmigo, receives additional context prompting with information about the content and course that a learner is currently studying. This allows Khanmigo to make well-informed (and generally correct, in my experience) responses when asked about an article, video, or exercise that the student is currently on.

How large is the data base storage for these!!? Although text takes up less room than images do, it still must be astronomical! Wow!! Just wow! :-)

The training data size of GPT-4 is reportedly over 1 petabyte in size. That's over 1 million gigabytes!

Would it be possible to have ai that updates itself to the internet without human input?

Yes; but without a human filtering the training data, the AI would learn all sorts of bad habits.

in how many years will it take AI to get perfect?

There is really no answer for that question. Only time will tell. It also depends on what you say is AI perfect.

How will LLMs and English Language learners be able to communicate effectively with each other?

Some LLMs, such as GPT-4, are able to understand and respond in multiple languages since they have been trained on data sets in multiple languages. One of the good things about LLMs is that they are remarkably good at extracting meaning from text, even if there are mistakes or grammatical errors. Given some prompting, they are also able to generate multiple responses to the same question at different reading and vocabulary levels.

How can AI not be good at math unless whoever programed it seriously goofed? Like computers, AI is then only as good as the human programmers.

AI is programmed to create a probabilistic answer to your question, but it is only trained on creating answers for text questions. True, the AI can read numbers, but it wasn’t trained to do math. The math that they say might just be copied from somewhere else, not from its own calculations. However, it is starting to become better at math.

Main content

Course: AI for education > Unit 2

Lesson 2: AI in the classroom: Promising practices

Setting realistic expectations

Google Classroom

Quick overview of the things large language models do well, and the things they don't do so well—yet.

What should I expect?

Large language models like OpenAI's ChatGPT are computer programs that have been trained on huge data sets to extract meaning from text and produce language. These models can do some things very well, but they also have some limitations!

You probably wouldn't try to use a screwdriver to hammer a nail, right? When we use LLM-based chatbot tools for tasks they were designed for, we get much better results!

What do LLMs do well?

Good: Language processor

Large language models are great at extracting meaning from language. They don't "understand" text in a human sense, but they can ingest it and make sense of it, even if it's written in a way that's not perfectly clear. They've been trained on so much data that they've learned to recognize patterns and, from those patterns, the meaning of words in context.

Simplifying and summarizing long, complex text is one of AI’s superpowers. It is good at extracting key takeaways, but it is always a good idea to double check if it missed critical points!

Good: Text generator

Large language models are also good at generating text. They can take a prompt and write a paragraph or even an entire article that sounds like it may have been written by a human—a human with really good grammar skills!

Good: Brainstorming partner

Teachers are already using LLMs to help them come up with ideas for their classrooms! Given a clear request and a couple of examples, LLMs can generate multiple variations on ideas, like possible class activities, interesting thesis statements, or drafts of quiz questions.

What are LLMs not as good at?

Not so good: They make things up!

While large language models can process language well on common topics, they sometimes give wrong information, which often looks like they are making things up. People in the AI and LLM business call these “hallucinations.” This can happen for a number of reasons:

Faulty training data: The huge datasets that LLMs are trained on can contain millions of words, and they often come from a variety of sources, including articles, books, the Internet, and even social media posts. If the training data contain inaccuracies, the model will inherit those mistakes. If the training data are messy or inconsistent, the AI can infer patterns that don't actually exist, or misinterpret information.
Old training data: It can take a long time to assemble data on which to train a model, and it takes time to actually do the training. LLMs can't just be "updated" with "whatever is new on the internet." So, the model won't know anything about things that have occurred in the recent past—sometimes up to two years in the past. When an LLM's training data don't give it a basis for a fact-based response, the LLM will hallucinate. Some search engines are working on this by connecting models to the internet, but you shouldn't assume that every model you interact with has this capability.

Not so good at math!

Large language models do not, on their own, make calculations. When LLMs are asked to generate math, they generate it the same way they generate text: probabilistically. Because of this, they can sometimes make mistakes when working with simple arithmetic or more advanced mathematical concepts.

Mistakes can also happen when the model is asked to generate text that includes numbers or calculations. If the training data contains incorrect calculations, the model may replicate these errors. For example, the model might say that

3 (4 + 2) = 14

Not so good: Fake websites and other “hallucinations”

As mentioned above, if an LLM doesn’t have the data it needs to generate a correct response, it may make up a convincing one—these “hallucinations” can happen frequently:

Fake websites: It may refer you to a URL, but the webpage doesn’t actually exist.
Wrong websites: It might provide a link to a site that is completely unrelated to the topic.
False citations: It might provide as a source a work that never existed, or claim that two real authors who have never collaborated are co-authors of a study—or invent fake names for authors, or fake titles for articles, research studies, or books!

Not so good: Doesn't have deep understanding of specialized concepts

While large language models can process language well on common topics, they're not always as effective when discussing the details of highly specialized concepts. For example, they might struggle to accurately identify and explain the nuances of a complex medical procedure. When pushed, it will start to make things up (see below).

Not so good: Doesn't have your context

This may sound obvious, but the models don't have all the information about you and your environment. If you are a student or teacher in a school, the model doesn't know about the sequence of lessons this week, who is having a bad day, or that you never really understood that one idea in science. So, it may suggest ideas or generate writing that does not make sense for you or your class.

Summary: Don’t trust! Verify!

LLMs are specialized productivity tools: Learn how to use them to help you be more productive. Don't ask them for answers to things they can't know, or they will make things up. Collaborate with them, but don't depend on them to create.

Forewarned is forearmed: The only way to guard against the hallucinations of an LLM is to make a habit of fact-checking everything it tells you.

The takeaway: Overall, large language models are good at understanding language, generating text, and answering questions. However, they struggle with understanding complex concepts and reasoning. They also don’t have “judgment”.

As these models continue to develop, though, they may improve in these areas!

Want to join the conversation?

Sort by:

Velociraptor105
Posted a year ago. Direct link to Velociraptor105's post “If it is constantly makin...”
If it is constantly making serious errors, and "hallucinations", what is it good for?
(11 votes)
- Evan Lewis
  Posted a year ago. Direct link to Evan Lewis's post “I wouldn't say that it is...”
  I wouldn't say that it is constantly making serious errors and hallucinations. The key point to note is that hallucinations are only likely to happen when you ask an LLM about something for which it has little training data on. The training data set for a modern LLM like GPT-4 is huge---reportedly over 1.7 trillion parameters and over 1 petabyte (or 1 million GB) in data size. So it can provide good answers to lots of different prompts, as long as you're not asking about a highly specialized topic.
  
  Khan Academy's AI assistant, Khanmigo, receives additional context prompting with information about the content and course that a learner is currently studying. This allows Khanmigo to make well-informed (and generally correct, in my experience) responses when asked about an article, video, or exercise that the student is currently on.
  (5 votes)
gajewskib
Posted a year ago. Direct link to gajewskib's post “How large is the data bas...”
How large is the data base storage for these!!? Although text takes up less room than images do, it still must be astronomical! Wow!! Just wow! :-)
Comment on gajewskib's post “How large is the data bas...”
(6 votes)
- Evan Lewis
  Posted a year ago. Direct link to Evan Lewis's post “The training data size of...”
  The training data size of GPT-4 is reportedly over 1 petabyte in size. That's over 1 million gigabytes!
  (5 votes)
Elizabeth DePriest
Posted a year ago. Direct link to Elizabeth DePriest's post “Would it be possible to h...”
Would it be possible to have ai that updates itself to the internet without human input?
(4 votes)
- HSstudent16
  Posted a year ago. Direct link to HSstudent16's post “Yes; but without a human ...”
  Yes; but without a human filtering the training data, the AI would learn all sorts of bad habits.
  (9 votes)
florence.i.chukwurah
Posted a year ago. Direct link to florence.i.chukwurah's post “Is this going to help our...”
Is this going to help our students or harm them? How will it help and encourage English Language learners?.
Button navigates to signup pageComment on florence.i.chukwurah's post “Is this going to help our...”
(5 votes)
Answer
tapazol.ethiopia
Posted a year ago. Direct link to tapazol.ethiopia's post “in how many years will it...”
in how many years will it take AI to get perfect?
Comment on tapazol.ethiopia's post “in how many years will it...”
(3 votes)
- sangkini
  Posted a year ago. Direct link to sangkini's post “There is really no answer...”
  There is really no answer for that question. Only time will tell. It also depends on what you say is AI perfect.
  (5 votes)
Kevin Worley
Posted a year ago. Direct link to Kevin Worley's post “How will LLMs and English...”
How will LLMs and English Language learners be able to communicate effectively with each other?
(3 votes)
- Evan Lewis
  Posted a year ago. Direct link to Evan Lewis's post “Some LLMs, such as GPT-4,...”
  Some LLMs, such as GPT-4, are able to understand and respond in multiple languages since they have been trained on data sets in multiple languages.
  
  One of the good things about LLMs is that they are remarkably good at extracting meaning from text, even if there are mistakes or grammatical errors. Given some prompting, they are also able to generate multiple responses to the same question at different reading and vocabulary levels.
  (2 votes)
Jorge Daniel Garcia
Posted 10 months ago. Direct link to Jorge Daniel Garcia's post “Can these tools be used t...”
Can these tools be used to check if their sources are correct?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
R3hall
Posted 8 months ago. Direct link to R3hall's post “How can AI not be good at...”
How can AI not be good at math unless whoever programed it seriously goofed? Like computers, AI is then only as good as the human programmers.
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
- NotSoAverageBiologicalLifeform
  Posted 7 months ago. Direct link to NotSoAverageBiologicalLifeform's post “AI is programmed to creat...”
  AI is programmed to create a probabilistic answer to your question, but it is only trained on creating answers for text questions. True, the AI can read numbers, but it wasn’t trained to do math. The math that they say might just be copied from somewhere else, not from its own calculations. However, it is starting to become better at math.
  Button navigates to signup page
  (3 votes)
DragonWolf
Posted a year ago. Direct link to DragonWolf's post “how do you use Ai for a s...”
how do you use Ai for a school environment?
Button navigates to signup pageComment on DragonWolf's post “how do you use Ai for a s...”
(2 votes)
Answer
Darrin Patterson
Posted 14 days ago. Direct link to Darrin Patterson's post “Why not train on facts ra...”
Why not train on facts rather than language? Wouldn't this eliminate the hallucinations, lessen the biases, and allow for more accurate results? Is that the next step in AI's evolution?
Button navigates to signup pageButton navigates to signup page
(1 vote)
Answer