Lesson 2 of many About 20 minutes Goal: Python subset for building LLMs

Python from Zero

You do not need all of Python. You need the four things that appear in every neural network: numbers, lists, functions, and loops. That is the lesson.

Why Python? Every major AI library — PyTorch, NumPy, Hugging Face — is Python. Karpathy builds GPT in Python. The AI Engineering from Scratch curriculum runs in Python. It is the language of this field.

1. Numbers and variables

A variable is a named box that holds a value.

learning_rate = 0.001
num_layers = 6
temperature = 0.7

print(learning_rate)   # 0.001
print(num_layers * 2)  # 12

These three variables appear in almost every LLM you will ever work with. learning_rate controls how fast the model learns. num_layers is how many transformer blocks stack on top of each other. temperature controls how random the output is.

2. Lists

A list holds many values in order. In LLMs, lists hold tokens — the chunks of text the model processes.

tokens = [15496, 11, 995, 0]   # "Hello, world!" as token IDs
print(tokens[0])    # 15496  (first token)
print(len(tokens))  # 4      (how many tokens)

When you type "Hello, world!" to an LLM, it does not see letters — it sees a list of numbers like this. Each number maps to a chunk of text in its vocabulary. This is called tokenisation.

3. Functions

A function takes input, does something, returns output. You will write and read hundreds of these.

def add(a, b):
    return a + b

result = add(3, 4)
print(result)  # 7

In a transformer, every single operation — attention, feed-forward, layer normalisation — is a function. The model itself is a function: text in, text out.

def gpt(prompt):
    tokens = tokenise(prompt)
    output = transformer(tokens)
    return decode(output)

response = gpt("The sky is")
# "The sky is blue."

That is not real code yet — but it is exactly how a real GPT function works structurally. You will write this for real in Lesson 5.

4. Loops

A loop runs the same code many times. Training an LLM is fundamentally a loop: show the model text, measure how wrong it is, nudge it to be less wrong. Repeat millions of times.

for step in range(10000):
    loss = model.forward(batch)
    loss.backward()
    optimizer.step()

    if step % 1000 == 0:
        print(f"Step {step}: loss = {loss:.4f}")

Again — not real code yet. But this is the skeleton of every training loop you will write. loss measures how wrong the model is. backward() figures out what to adjust. optimizer.step() makes the adjustment. Loop until the model is good.

Notice: You just read a real training loop — even if you do not understand every word. That is the goal of these early lessons. Pattern recognition before full comprehension.

Putting it together

Here is a tiny working Python program that uses all four concepts. Read it — you will understand more than you expect.

# A tiny fake "LLM" that predicts the next word

vocabulary = ["hello", "world", "the", "sky", "is", "blue"]

def predict_next(word):
    pairs = {"the": "sky", "sky": "is", "is": "blue", "hello": "world"}
    return pairs.get(word, "?")

sentence = ["the"]

for step in range(4):
    last_word = sentence[-1]
    next_word = predict_next(last_word)
    sentence.append(next_word)

print(sentence)
# ['the', 'sky', 'is', 'blue']

This is a one-rule "LLM" — it predicts the next word from a lookup table. A real LLM does the same thing, but the lookup table has billions of entries and is learned from data instead of hand-coded.

Where to run this code — free, in your browser

Google Colab — free Python environment, no install needed

Create a new notebook, paste the code above, press Shift+Enter. You just ran Python. Do this for every code block in this lesson.

Check your understanding

Tweet this insight

"An LLM is just a loop: show text, measure how wrong the prediction is, nudge weights, repeat millions of times. That's all training is. Learning Python today so I can build one from scratch. Day 2 of my AI journey. #BuildingAI #LearnInPublic"

Post to Twitter / X →

Ask your teacher anything. Confused about any code above? Want to know what loss.backward() actually does? Just ask Claude directly.