Is a neuron a line?

Today we read up to frame 29 on page 27 of The Little Learner.1 You can read/run the code from the chapter in our github repo for this reading.

$y = wx + b$

In this session, we learned how to represent this familiar function in Racket. The form in which we eventually cast the function was at first sight, rather strange:

(define line
  (lambda (x)
    (lambda (theta)
      (+ (* (zeroth-member theta) x) ; wx
         (first-member theta)))))    ; b

There are three main ways that this representation differs from the $y = mx + b$ we all learned at school:

First diffeence: $m$ is $w$

The authors don’t tell us why they use $w$ for the coefficient of $x$, rather than the more familiar $m$, but it presumably because the parameters which are coefficients are typically called the “weights” of a model. They left $b$ as is, presumably because the non-coefficient parameters of a model are typically called the “biases.” Thus $y = wx + b$ can be read as “$y$ equals $x$ times its weight plus the bias.”

Second difference: $w$ and $b$ are stored in a “tensor”:

We’ve not been introduced to “tensors” yet, but they are implicit in the notation used in the chapter (see page xxiii of the book). For now, a “tensor” is just an array of numbers. In the final version of line, w and b are contained in a single value theta, which looks like this: (tensor w b). Since x needs to be multipled by w, we need to get the “zeroth” member of theta: (zeroth-member theta). Since b needs to be added to this, we need to get the “first” member of theta: (first-member theta).

Third difference: line is not one function, but two!

We normally think of $y = mx + b$ as a single function. We could rewrite it as $f{x} = mx + b$. But there are two lambda expressions in line, meaning there are two functions. What is the deal? The first lambda is a “function maker.” As its input, it takes some x. It then gives us a new function, where x is fixed, and theta (i.e. w and b) aren’t known yet. If you plug theta into this new function, out will pop the answer.

(
    (line 10.0)      ; new function, where x=10
    (tensor 4.0 2.0) ; now plug in w=4.0, b=2.0
)
;; answer = 42

This is called a “parameterized” function, the authors explain. In an “unparameterized” function, w and b would be fixed. But now we allow the function to change according to some parameters that we give it. It could be $y=7x+3$ or $y=22x -9$ or $y=(z+58)x - (31z)$. Because it is a “parameterized” function, we can try out different “parameters,” and let the computer choose the right ones.

(define
    machine-learning
    (find-the-right-parameters
        the-machine))

The reason for all this is explained in the book. In machine learning, you usually know $x$ and $y$: this is your ‘training data’. The problem is to work out what $w$ and $b$ should be. Hence our line function is constructed so that it is given x to begin with, and only later is supplied with w and b. In principle, we could try many different values for w and b, and see which combination of w and b produces the correct y for a given x. If the computer tries out the combinations, and chooses the w and b for us, then this is called “machine learning.”

We had a wide ranging discussion about the ‘learning’ metaphor in this context. Do linear functions provide a good model of learning? Is it possible that humans “learn” by adjusting activations of neurons in the same manner that a machine learning algorithm adjusts the parameters of a large and complex linear function? The discuss was very wide-ranging, my notes are inadequate, and we have only just scratched the outermost surface of this topic into whose deeps we are about to plunge.

Computer programming is a highly metaphorical pursuit, as the wisest heads in the trade are quick to admit. Programmers’ source code is a textual model of the world they are trying to enact in their software. Programmers have different conceptualisations of what they are doing, and the code they write reflects the world-picture that tells them how to write it. Well—these are the ambits of our group, in any case, and the aptness of the “learning” metaphor will be a topic of conversation for many meetings to come…

  1. Once again, the moderator of the group didn’t look ahead, to realise that the chapter ended on the following page… but the discussion that waylaid us was good in any case. 

Now we have this insight

This week we read up to frame 53, on page 16 of The Little Learner.1

The aesthetics of recursion

This week the authors treated us to an extremely concise introduction to the theory of recursion. They presented recursion as a strategy of extreme parsimony. Recursion allows us to write “interesting programs” with no loops. Recursion allows us to implement integer addition without using +. One member of the group marvelled at the resulting programs. He found them elegant. Did anyone else?

We discussed more largely the value of parsimony in reasoning. One member of the group observed that parsimony—or elegance—is an aesthetic value that transcends disciplinary boundaries. In a recent article for Critical Inquiry, Imogen Forbes-Macphail observes that

Like mathematicians, literary scholars find beauty in the pursuit of their work; in the products of that work (critical arguments or scholarship); and in the objects of that scholarship, literary artifacts themselves. (2025, p. 481)

Forbes-Macphail is not the first to identify a significant aesthetic dimension in scientific thought, though it is interesting to argue that the aesthetics of mathematical and literary inquiry are similar. What does elegance mean in literary criticism? Do literary critics have the same relish for parsimony as LISP hackers like the authors of The Little Learner? Is an interpretation of The Rover more powerful if it can be made using fewer concepts? The old debate among creative writing instructors, about the merits of ‘minimalist’ and ‘maximalist’ style, rears its head again.

This discussion hearkens back to Knuth’s theory of ‘psychological correctness’, which we discussed last year. I also note Douglas Hofstadter and Melanie Mitchell’s brilliant argument about the vitality of ‘aesthetic perception’ in science, in their jointly-authored chapters for Fluid Concepts and Creative Analogies.

Now we have this insight

Several times in the exposition, the authors claim that something “gives us an insight,” or that we now “have this insight.” “Do we?” quipped one member of the group.

The central conceit of the book is that we, the readers, are identical with the voice in the second column. The second voice models our own experience. Of course, this whole literary structure implies that we are not the voice in the second column. The book constantly entreats us to compare ourselves to this model student, and respond to the teacherly voice in the first column in our own way. The irony of the form finds its counterpart in the irony of the reader’s response.

Why do we ‘invoke’?

The idea of recursion is that a function “invokes itself.” We paused for a while on the idea of “invocation.” Why is it that we “call” or “invoke” functions? What is the underlying metaphor?

The discussion recalled to me these famous sentences from the beginning of The Structure and Interpretation of Computer Programs:

The evolution of a process is directed by a pattern of rules called a program. People create programs to direct processes. In effect, we conjure the spirits of the computer with our spells. (1996, p. 2)

One member of the group preferred “invoke” to “call,” for the very reason that it implies that the “invoker” has command over the function they summon to their bidding. On the final verge of computation, at the very brink of the machine, when symbols have lost their meaning and the stack trace has buried itself in silicon, the programmer may find herself in the position of Byron’s Manfred:

I have no choice; there is no form on earth
Hideous or beautiful to me. Let him,
Who is most powerful of ye, take such aspect
As unto him may seem most fitting.—Come!

“Come, add1!” cries the wizard in his misery. “Come, add1, and increment my integer!”

The efficiency of Scheme

One member of the group asked how it is possible to write efficient programs in Scheme, when recursion is required for all looping.

The answer: tail-call optimisation. A topic slightly off the main track of our Critical Code Studies group! But a fascinating one regardless… parsimony strikes again.

References

Forbes-Macphail, Imogen. “The Four-Color Theorem and the Aesthetics of Computational Proof.” Critical Inquiry 51, no. 3 (March 2025): 470–91. doi:10.1086/734121.

Hofstadter, Douglas R., Daniel Defays, David Chalmers, Robert French, Melanie Mitchell, and Gary McGraw. Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. New York: Basic Books, 1995.

Sussman, Gerald J, Julie Sussman, and Harold Abelson. Structure and Interpretation of Computer Programs. Second edition. Cambridge: MIT Press, 1996.

Notes

  1. Due to a cognitive deficiency of the group leader, we did not actually complete Chapter 0, which we would have had ample time to do… 

What is, was and shall be

In our session this week, we continued to learn the basics of the Scheme/Racket programming language, working though pages 4-8 of The Little Learner.

As so often happens in close reading, our attention was arrested by an apparently innocuous word: “is.” The word “is” has a peculiar meaning in the language of the book. The authors frequently write that something “is” or “is the same as” something else. For example, they pose the question

What is

(area-of-rectangle 3.0)

?

The answer:

(λ (height)
  (* 3.0 height))

Or again later:

The expression

(add3 4)

is the same as

((λ (x)
  (+ 3 x))
 4)

There is a curious inversion to their presentation. First they present a series of these examples, where s-expressions are evaluated, some involving closures, in which a higher-order function returns a new function that ‘remembers’ some values passed to the higher-order function. Then, they admit that their use of “is” and “is the same as” is not entirely idiomatic:

This way of remembering arguments passed in for formals of outer functions inside inner functions is known as β-substitution.

In other words, the word “is” actually means “can be transformed into via β-substitution.” Two s-expressions “are the same expression” when they can be transformed in this way. But what does this transformation entail? It entails taking the name of something, e.g. add3, height, area-of-rectangle, and replacing it with its value. Is “is” the right word for this? Is the name of a thing “the same as” the thing?

This use of “is” conflicted with intuitions we had in the group. It seems paradoxical to say that the name of something “is the same as” the thing itself. Of course, it is reasonable in the context of evaluating Scheme code. Whenever the code is run, it will be evaluated, so there is a sense in which the code simply is what it evaluates to. This sense of “is” also makes sense in an intellectual culture dominated by mathematics. In everyday algebra, there is no real distinction between equality and identity.

$$3 + 2$$

really is

$$1 + 4$$

Isn’t it? They are equal. Who cares how they are written?1 Seeing this “is”, of course, can take some work. How many people can really remember why $$a^2$$ really is $$b^2 + c^2$$ in a right triangle?

In everyday life, we are quite happy to “dereference” or “substitute” names for the things themselves. When I ask you to “pass the pepper,” I’m quite happy when you hand me the pepper grinder. You ask, “Is this what you wanted?” I reply, “Yes, it is!”

But nonetheless there is something alarming in being told that two things “are” one another when you haven’t internalised the substitution process that allows you to move between them. And names do have a reality of which we are sometimes reminded. If I ask a Canadian to “pass me the pepper,” and they give me a capsicum, I may be disappointed.

The whole discussion reminded me of piece by Lewis Carroll, in which a person’s name has a name, which itself has a name, which itself has a name, and so on. I was sure that this infinite regress featured in Gödel, Escher, Bach, but I have tried and failed to find either the Carroll Story or the Hofstadter variation on it! Is an intimation of a thing the same as the name of a thing? Or is the intimation the thing? Or is the name the intimation? Can a vague recollection be substituted for a textual authority? Or only for a vague apprehension…?

We recommence next week on frame 24, at the top of page 9.

  1. I’m sure there are varieties of algebra where identity matters—but that is way beyond my knowledge! 

Psst! Psst! Psst!

Today we commenced The Little Learner, the text that will occupy the group for many months to come. We read the Preface and the first page of Chapter 0.

Our discussion focussed mainly on the book’s authorial persona and implied reader. For those of us in the group who have a mainly adversarial attitude towards AI, the book presented a challenge. Isn’t deep learning interesting and fun? Aren’t the algorithms elegant and surprisingly simple? Shouldn’t everyone dive into this fresh and exciting area of research, and learn how to do it?

To invite the reader into the text, Friedman and Mendhakar carefully establish the reader as a novice, and themselves as kind, avuncular teachers. The reader need only know “high-school maths” and have a minimum of “programming experience.” The book proceeds from these foundations in a strict order, to build up from simple pieces the whole complex machinery of modern deep learning.

As some in the group observed, this “novice” reader was already expected to know some terms of art. Concepts such as “problem domain,” “equalities,” “invariants,” “superset” and “subset” were introduced as though they were the general coinage of the realm. Of course, all textbook writers face the problem that their students need to somehow learn the language that even makes it possible to express knowledge of their subject. How can you learn anything about a topic without having the words to describe the topic? But we as a group are intrigued to see precisely who or what these writers assume an interested and relatively ignorant reader to be as the book progresses.

We discussed the possible ideological implications of the book. Is this a book that subtly asserts a “tech-bro” persona? Or does its goofy and academic tone bespeak a different attitude? In our disciplines, we worry endlessly about surveillance capitalism, about the power of tech billionaires, about the algorithmic mediation of human interaction. The writers of The Little Learner sidestep such issues. Deep leaning is fun. It’s for categorising cat photos, not for empowering intelligence agencies to more rapidly scan citizens’ text messages. It’s something anyone can do as a hobby, rather than a tool used by rich and powerful people to make themselves richer and more powerful.

Everyone agreed the book is fun, and the topic is interesting. We will see in coming months how we can reconcile the fun with the cultural critique.

Any code written in the sessions can be found in the Github repository for this reading.

The End of Literate Programming

The end of the affair

At our final meeting for 2024, we completed reading Knuth’s Literate Programming. In the final section, Knuth considers “Retrospects and Prospects” for literate programming. In this section, Knuth is explicit about who literate programming is for: computer scientists and systems programmers, rather than hobbyists. This context justifies many of Knuth’s arguments throughout the essay, about the kinds of literacy assumed by the WEB system. But it also widens the main gap in his philosophy—the gap between the programmer and the reader. He anticipates the programs will become works of literature, which implies a wide readership, but he restricts WEB to a small subset of people, resulting in a restricted writership. In this way Knuth sharpens the literary aspect of his enterprise, for indeed literature too is written by the few for the many to consume.

With that, our first major reading for the group came to an end.

After the end

We spent some time looking at the upshot of Knuth’s new programming system. On his website, he has published many literate programs, in addition to publishing three books written in either WEB or CWEB, which contain the programs for TeX, Metafont and the MMIX virtual machine.

Literate programming has inspired many programming systems, but not in the manner Knuth proposed. He saw WEB as a system for highly skilled programmers to write complex software systems. But WEB (and its descendent CWEB) have not found much use in this domain. Instead, programming language designers have designed ever more capable documentation-generation systems, which allow software to be composed in a more conventional format, but with excellent computer-generated documentation. Python includes pydoc as part of its standard library, for example, while Rust ships with rustdoc. Such tools allow a developer to include documentation in their code, and generate attractive websites for their software. They do not support the creation of elegant books of the kind that Knuth prefers, and in particular, do not free the programmer from the sytax of their chosen programming language.

The Knuthian ideal of literate programming has caught on in a different community: data science. Statisticians, digital humanists, data analysts, lab scientists and others frequently use tools such as EMACS org-mode, RMarkdown and Juypter Notebooks to write their software. This form of literate programming is nonetheless distinct from Knuth’s. Knuth foresaw systems programmers building complex reusable systems using literate tools. Data scientists tend to write more ephemeral, simple programs, which analyse a particular dataset or form the basis for a particular article or report. When a data scientist does take the time to develop a more complex and reusable piece of software, they are more likely to do so in the form of a simple R, Python or Julia package. While it is possible to write such software in a more literate style using tools such as nbdev, this is not a common practice.

It is a pity that Knuth’s vision of programming-as-literature has not gone mainsteam. Source code is the primary medium of communication for millions of people who work every day as programmers. They write software that affects all of us, and if this software were readable by the general public, then the systems that govern our lives could in principle be more open and democratic. Even an experienced programmer can find it difficult to find a reading path through a complex program. If essential pieces of software such as MediaWiki (i.e. Wikipedia), TensorFlow or Bluesky were written and published in a Knuthian style with a linear narrative, then more people might be inclined to read and debate the code.

What next?

We will reconvene in February 2025, as the summer recess draws to a close here in Australia. Stay tuned for our next reading, which will involve the source code of a Large Language Model. Tell then, Happy Holidays!