How to make a guess

21 May 2025

Today we read up to Chapter 3, Frame 37, on page 67 of The Little Learner. You can find the code for the session in the Github repo.

Three steps to learning

So far we have only seen the data structures for deep learning. Today, we started to see how to apply learning algorithms the data.

At a high level, we found that deep learning involves three steps:

Guess the function parameters
Measure the ‘loss’
Improve the guess

You keep doing 1., 2. and 3. until finally the ‘loss’ is nearly $0.0$. At this point, further improvement becomes impossible, and you stop.

What a guess looks like

The first step is the “guess the function parameters.” How do we do this?

For the sake of this chapter, a guess is a list of parameters. Let’s say we are trying to fit a straight line to some data. A straight line can be represented by the following function:

$$ f(x) = wx + b $$

In a machine learning situation, we already know $x$ and $y$ for a large number of entities. What we don’t know is what $w$ and $b$ should be. So, let’s make a guess! We can put $w$ and $b$ together into a list, and call that list $\theta$. In the book, the initial guess was $w = 0.0$ and $b = 0.0$. That results in the following $\theta$:

$$ \theta = (0.0, 0.0) $$

or, in Scheme/Racket:

(define theta (list 0.0 0.0))

That’s it—a guess is a list of numbers. Now the line function is very simple. There are only two parameters, $w$ and $b$. A more complex function may have many parameters, and those parameters may have a very complex structure. In the future, theta may be a very complex variable—a whole collection of tensors all linked together. But that is, of course, the future.

How to calculate the loss

So far we have encountered just one “loss function,” which allows us to measure how good our latest guess it. The loss function works as follows:

Use the current guess for the parameters to produce some predicted y-values for the x-values in the training set
Subtract these predicted y-values from the real y-values that we already know
Square the differences
Take the sum of the squares

This is called the “l2 loss”. The “2” comes from the fact that we square the errors, which is the same as raising them to the power of 2.

The l2-loss is defined as follows in the text:

(define l2-loss
  (lambda (target) ; the function we are trying to 'learn'
    (lambda (xs ys) ; the data: the x-values and y-values we have
      (lambda (theta) ; our current guess for the function parameters
        (let ((pred-ys ((target xs) theta)))
          (sum
           (sqr
            (- ys pred-ys))))))))

Since the target function and xs and ys will stay the same throughout the whole training process, we can fix these in place, producing an “objective function.” The objective function already knows which function we would like to learn parameters for. It already knows what the x-values and y-values are in the training data. It is just waiting for some guesses that it can try out. In the session, we tried the following guesses for $\theta$, using line as the target function, and some simple points as the xs and ys:

; define objective function
(define objective-function (expectant-function line-xs line-ys))

; try first guess
(objective-function (list 0.0 0.0))
; = 33.21

; try second guess
(objective-function (list 0.0099 0.0))
; = 32.5892403

We could keep on trying the algorithm of increasing $\theta_0$ by $0.0099$:

(objective-function (list 0.0198 0.0))
; = 31.9743612

(objective-function (list 0.0297 0.0))
; = 31.3653627

If you perform this operation 106 times, you get the best possible answer:

(objective-function (list (* 106 0.0099) 0.0))
; = 0.13501079999999996

If you keep going, the loss starts to go up again. That is, the guesses get worse:

(objective-function (list (* 107 0.0099) 0.0))
; = 0.13759470000000001

As the authors of The Little Learner would say, after 107 iterations, we have “overshot” the best possible answer.

How to improve the guess

Clearly just increasing the $\theta_0$ by $0.0099$ is not a great algorithm for learning the function. It takes many iterations to learn an extremely simple function. Looking at the graph on page 58, the correct function is clearly something very close to $y = (1)x + 0$. Shouldn’t this be pretty easy to find? And what if we need to learn $b$ as well as $w$? This algorithm assumes that $b = 0$!

Well, we’ve not seen a better solution yet, but there are many chapters to go, and many layers left in this onion. Assuredly by the end of all this, we will know the arcana of the field.

First interlude

Next meeting, we take a break from The Little Learner and will look at the LLM code for HESPI, a fascinating AI-for-GLAM project.

Anticodians

How to make a guess

Three steps to learning

What a guess looks like

How to calculate the loss

How to improve the guess

First interlude

Related Posts

ChatGPT sees the herbarium 04 Jun 2025

Code recitation as ASMR 07 May 2025

Coding in Many Dimensions 09 Apr 2025