LLMs (Large Language Models) form the core of AI chatbots, offering remarkable predictive capabilities that are both impressive and puzzling. Understanding LLMs provides a solid foundation for exploring AI, as we delve into their unexpected abilities and learn how to communicate effectively with them.
πUser: Write a funny one sentence description of a typical cat personality.
π€AI Assistant: Luna the mischievous feline, with an insatiable love for toilet paper rolls and a knack for unlocking drawers by napping on them.
If we input, π: "There's the ocean and a nice beach. Let's go for a..."
The LLM should predict something like, π€: "swim".
This seemingly simple task is the foundation upon which chatbots are built. The LLM's ability to predict words based on context enables it to generate human-like responses, making conversations feel more natural.
But how does this turn into a full-fledged chatbot? Keep reading, and we'll delve into that soon.
For instance,
π:"There's the ocean and a nice trail. Let's go for a"
should predict π€:"hike" not "swim".Β Β
But inputting,
π:"There are ocean zombies... There's the ocean and a nice beach, let's..."
might result in:Β
π€:"RUN!!!".
This prediction ability is unprecedented. In that example, the LLM had to factor in zombies, fear, spatial awareness, and human reactions.
InputΒ
π:"There's the ocean and a nice beach. Let's go for a"
LLM predicts π€:"swim", then adds it to the sentence.
New input: "There's the ocean and a nice beach. Let's go for a swim"
Predicts π€:"before", adds it:
New input: "There's the ocean and a nice beach. Let's go for a swim before"
Predicts π€:"sunset", adds it:
Final output: π€:"There's the ocean and a nice beach. Let's go for a swim before sunset".Β Β
This continuous prediction and addition enables LLMs to generate coherent stories, one word at a time, by considering each new word in its context.
Language complexity and countless possibilities make rule-based programming unviable.
For instance, in a context where previous sentences describe zombie hunters, π: "There are ocean zombies... There's the ocean and a nice beach, let's..." might deserve a response like π€: "Get'em!" instead of "RUN!!".
We create a "brain-like" computer program using billions of interconnected, adjustable parameters (like neurons).
These parameters process input sentences and output the next word by solving complex mathematical problems.
Imagine billions of tiny dials that can be tuned to improve predictions.
ποΈποΈποΈποΈποΈποΈποΈ
ποΈποΈποΈποΈποΈποΈποΈ
For each sentence, we'd hide the last word and input it into our model.
The model would predict a word; if wrong, we'd adjust its parameters and repeat.
With billions of dials and trillions of sentences over time, our "brain" learned to predict next words exceptionally well.
(This process involves a specific method called Machine Learning, which we'll explore later.)
We can't easily interpret these numbers or "fix" the code directly.
If an LLM outputs something unexpected (like saying the sky is "green"), we can't pinpoint which numbers caused the error.
Instead, we improve LLMs by providing more relevant data and continuing to adjust parameters through training.
π:"How many ears on a bear?"
π€:"Three!".
π€¦
Training:π§βπ»:Β π»ποΈποΈπ¨ποΈποΈπ§ΈποΈποΈ
π:"How many ears on a bear?"
π€:"Two!".
Online LLMs
Offline LLMs
Number of parameters
Size of training dataset
Training period
Fine tuning
Prompting
Retrieval Augmented Generation (RAG)