Main content
Course: AI for education > Unit 2
Lesson 2: AI in the classroom: Promising practices- A new chapter in education
- Setting realistic expectations
- Don't ban AI-powered chatbots
- Learning with AI: Promising practices for students
- Teaching with AI: Emerging practices for teachers
- Teaching with AI: More inspiration for teachers
- The writing process, redefined in the Age of AI
© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice
Setting realistic expectations
Quick overview of the things large language models do well, and the things they don't do so well—yet.
What should I expect?
Large language models like OpenAI's ChatGPT are computer programs that have been trained on huge data sets to extract meaning from text and produce language. These models can do some things very well, but they also have some limitations!
You probably wouldn't try to use a screwdriver to hammer a nail, right? When we use LLM-based chatbot tools for tasks they were designed for, we get much better results!
What do LLMs do well?
Good: Language processor
Large language models are great at extracting meaning from language. They don't "understand" text in a human sense, but they can ingest it and make sense of it, even if it's written in a way that's not perfectly clear. They've been trained on so much data that they've learned to recognize patterns and, from those patterns, the meaning of words in context.
Simplifying and summarizing long, complex text is one of AI’s superpowers. It is good at extracting key takeaways, but it is always a good idea to double check if it missed critical points!
Good: Text generator
Large language models are also good at generating text. They can take a prompt and write a paragraph or even an entire article that sounds like it may have been written by a human—a human with really good grammar skills!
Good: Brainstorming partner
Teachers are already using LLMs to help them come up with ideas for their classrooms! Given a clear request and a couple of examples, LLMs can generate multiple variations on ideas, like possible class activities, interesting thesis statements, or drafts of quiz questions.
What are LLMs not as good at?
Not so good: They make things up!
While large language models can process language well on common topics, they sometimes give wrong information, which often looks like they are making things up. People in the AI and LLM business call these “hallucinations.” This can happen for a number of reasons:
- Faulty training data: The huge datasets that LLMs are trained on can contain millions of words, and they often come from a variety of sources, including articles, books, the Internet, and even social media posts. If the training data contain inaccuracies, the model will inherit those mistakes. If the training data are messy or inconsistent, the AI can infer patterns that don't actually exist, or misinterpret information.
- Old training data: It can take a long time to assemble data on which to train a model, and it takes time to actually do the training. LLMs can't just be "updated" with "whatever is new on the internet." So, the model won't know anything about things that have occurred in the recent past—sometimes up to two years in the past. When an LLM's training data don't give it a basis for a fact-based response, the LLM will hallucinate. Some search engines are working on this by connecting models to the internet, but you shouldn't assume that every model you interact with has this capability.
Not so good at math!
Large language models do not, on their own, make calculations. When LLMs are asked to generate math, they generate it the same way they generate text: probabilistically. Because of this, they can sometimes make mistakes when working with simple arithmetic or more advanced mathematical concepts.
Mistakes can also happen when the model is asked to generate text that includes numbers or calculations. If the training data contains incorrect calculations, the model may replicate these errors. For example, the model might say that .
Not so good: Fake websites and other “hallucinations”
As mentioned above, if an LLM doesn’t have the data it needs to generate a correct response, it may make up a convincing one—these “hallucinations” can happen frequently:
- Fake websites: It may refer you to a URL, but the webpage doesn’t actually exist.
- Wrong websites: It might provide a link to a site that is completely unrelated to the topic.
- False citations: It might provide as a source a work that never existed, or claim that two real authors who have never collaborated are co-authors of a study—or invent fake names for authors, or fake titles for articles, research studies, or books!
Not so good: Doesn't have deep understanding of specialized concepts
While large language models can process language well on common topics, they're not always as effective when discussing the details of highly specialized concepts. For example, they might struggle to accurately identify and explain the nuances of a complex medical procedure. When pushed, it will start to make things up (see below).
Not so good: Doesn't have your context
This may sound obvious, but the models don't have all the information about you and your environment. If you are a student or teacher in a school, the model doesn't know about the sequence of lessons this week, who is having a bad day, or that you never really understood that one idea in science. So, it may suggest ideas or generate writing that does not make sense for you or your class.
Summary: Don’t trust! Verify!
LLMs are specialized productivity tools: Learn how to use them to help you be more productive. Don't ask them for answers to things they can't know, or they will make things up. Collaborate with them, but don't depend on them to create.
Forewarned is forearmed: The only way to guard against the hallucinations of an LLM is to make a habit of fact-checking everything it tells you.
The takeaway: Overall, large language models are good at understanding language, generating text, and answering questions. However, they struggle with understanding complex concepts and reasoning. They also don’t have “judgment”.
As these models continue to develop, though, they may improve in these areas!
Want to join the conversation?
- If it is constantly making serious errors, and "hallucinations", what is it good for?(11 votes)
- I wouldn't say that it is constantly making serious errors and hallucinations. The key point to note is that hallucinations are only likely to happen when you ask an LLM about something for which it has little training data on. The training data set for a modern LLM like GPT-4 is huge---reportedly over 1.7 trillion parameters and over 1 petabyte (or 1 million GB) in data size. So it can provide good answers to lots of different prompts, as long as you're not asking about a highly specialized topic.
Khan Academy's AI assistant, Khanmigo, receives additional context prompting with information about the content and course that a learner is currently studying. This allows Khanmigo to make well-informed (and generally correct, in my experience) responses when asked about an article, video, or exercise that the student is currently on.(5 votes)
- How large is the data base storage for these!!? Although text takes up less room than images do, it still must be astronomical! Wow!! Just wow! :-)(6 votes)
- The training data size of GPT-4 is reportedly over 1 petabyte in size. That's over 1 million gigabytes!(5 votes)
- Would it be possible to have ai that updates itself to the internet without human input?(4 votes)
- Yes; but without a human filtering the training data, the AI would learn all sorts of bad habits.(9 votes)
- Is this going to help our students or harm them? How will it help and encourage English Language learners?.(5 votes)
- in how many years will it take AI to get perfect?(3 votes)
- There is really no answer for that question. Only time will tell. It also depends on what you say is AI perfect.(5 votes)
- How will LLMs and English Language learners be able to communicate effectively with each other?(3 votes)
- Some LLMs, such as GPT-4, are able to understand and respond in multiple languages since they have been trained on data sets in multiple languages.
One of the good things about LLMs is that they are remarkably good at extracting meaning from text, even if there are mistakes or grammatical errors. Given some prompting, they are also able to generate multiple responses to the same question at different reading and vocabulary levels.(2 votes)
- Can these tools be used to check if their sources are correct?(3 votes)
- How can AI not be good at math unless whoever programed it seriously goofed? Like computers, AI is then only as good as the human programmers.(2 votes)
- AI is programmed to create a probabilistic answer to your question, but it is only trained on creating answers for text questions. True, the AI can read numbers, but it wasn’t trained to do math. The math that they say might just be copied from somewhere else, not from its own calculations. However, it is starting to become better at math.(3 votes)
- how do you use Ai for a school environment?(2 votes)
- Why not train on facts rather than language? Wouldn't this eliminate the hallucinations, lessen the biases, and allow for more accurate results? Is that the next step in AI's evolution?(1 vote)