An Intro to LLM's

At their core, all Language Models (LLMs) do one thing: predict text.

Their task is simple - given a sequence of words, determine which word should come next.

Exercise: Predict the next word in the poem

Maryhadalittlelamb,itsfleecewaswhiteassnow.AndeverywherethatMarywent,Thelambwassuretogo.

But how does this simple task translate into anything useful?

Consider the following example:

Mary has a little lamb that follows her everywhere she goes.

One day, Mary decides to take her lamb to school.

The teacher asks the class a question: "If Mary has 3 lambs and one of them goes missing, how many does she have left?"

What is the correct answer?

ANSWER:

A system that can predict the next words accurately would respond with something like:

To do this, the system would need to:

  • Understand the concept of subtraction
  • Recognize that lambs are objects that can be subtracted
  • Subtract 1 from 3
  • Determine that the answer is 2

These capabilities emerge naturally from the next-word-prediction task, without explicit programming.

Let's consider another example:

The following is a Python function that takes in a 'starting_sheep_count', a 'missing_sheep_count', and returns the 'current_sheep_count'.

PYTHON CODE:

A system proficient in next-word prediction would generate something like:

Through this simple task of predicting the next word, the system demonstrates an understanding of Python syntax and programming concepts.

This core capability of predicting the next word is what makes LLMs so versatile and powerful. The better they become at this task, the more diverse and complex operations they can perform.