Code recitation as ASMR
07 May 2025The blog has been irregular of late due to my teaching schedule. But we are ploughing on through The Littler Learner. This week we reached the end of “Interlude I: The More We Extend, the Less Tensor We Get.”
The gist
Interlude I concludes our basic introduction to the tensor, the fundamental data structure of deep learning. Chapter 2 covered how to create tensors, and how to use them with the line
function to represent parameterised linear equations. “Interlude I” shows how to do arithmetic with tensors.
The joke in the title—”The More We Extend, the Less Tensor We Get,” has to do with the way a tensor’s rank changes when you perform arithmetic on it. The basic arithemetical operations, such as +
and /
, are defined only for scalar values such as 3
and 708.4
. To compute the “sum” or “quoutient” of a tensor or tensors, the concepts of “sum” and “quotient” need to be extended. That’s the first part of the title—”The More We Extend.” The second part of the title refers to the way that a tensor’s rank is affected by arithmetic. For example, if you take the sum of a tensor, its rank is reduced by one. Consider the following example:
$$ sum( \begin{Bmatrix} 3 & 22 \\ 6 & 5 \\ 9 & 4 \end{Bmatrix} ) $$
or, in Scheme (using malt):
(sum (tensor (tensor 3 6 9)
(tensor 22 5 4)))
This is a tensor2. It has rank 2. Intuitively, it is two-dimensional. The two dimensions are the rows and columns. You can see the number of dimensions in the Scheme/Racket code, because there are two layers of tensor: an outer tensor, and the two inner tensors.
When you compute the sum of this tensor, the rank drops by one, and you get a tensor1 as the answer:
$$ \begin{Bmatrix} 25 \\ 11 \\ 13 \end{Bmatrix} $$
(tensor 25 11 13) ; the answer
If you take the sum of this tensor, the rank falls by 1 again, and you get a tensor0, that is, a scalar:
$$ sum( \begin{Bmatrix} 25 \\ 11 \\ 13 \end{Bmatrix} ) = 49 $$
(sum (tensor 25 11 13))
; =
49
The more we extend—the more “extended” arithmetical operations we apply to a tensor—the less tensor we become—the lower becomes the tensor’s rank.
ASMR
As a group, we have been slowly working out how to actually recite the code in the book. It is remarkably difficult to read code in a natural way. One of the main problems is how literally to read the symbols. For example, try to read this out in English:
; a tensor-3
(tensor (tensor (tensor 3 4)
(tensor 7 8))
(tensor (tensor 21 9)
(tensor 601 1)))
One way is to read the symbols very literally:
Open parentheses
tensor
. Open parenthesestensor
. Open parenthesestensor
,3
,4
, close parentheses. Open parenthesestensor
,7
,8
, close parentheses. Close parentheses. Open parenthesestensor
. Open parenthesestensor
,21
,9
, close parentheses. Open parenthesestensor
,601
,1
, close parentheses. Close parentheses. Close parentheses.
The problems with this approach are obvious. If you managed to read that blockquote without your eyes glazing over, well done! Your powers of concentration are supreme. The approach that we have been nutting out as a group involves interpreting the symbols a bit more:
A tensor3, consisting of two tensors2, the first tensor2 containing the tensors1
3
-4
and7
-8
. The second tensor2 contains the tensors121
-9
and601
-1
.
This form of recitation is superior for human comprehension. Why are there all those parentheses, words and digits? Why, to create a tensor3, consisting of two tensors2 … etc. When we read this way, we are essentially “evaluating” the code in the way that the Scheme interpreter evaluates it. We work out what data structure is represented by all the parentheses and other symbols, and try to represent that data structure in a way that works with our hardware—our minds!
One waggish member of the group suggested that code-reading is a kind of ASMR. It is indeed hypnotic to read out so many of these tensors. When we recite a “same-as chart,” which shows the evaluation of a complex expression, the effect is similarly hypnotic. At the beginning of every line of a “same-as chart,” the reader says “which is the same as.” It is not the most intrinsically poetic refrain, but its repetition a dozen times in the space of a few minutes does lull the mind.
Does this “ASMR” quality tell us something about the nature of code as a form of writing? I think it might, but I’m not sure exactly what it tells us. I wonder if there is some useful analogy to be made between code and minimalism in art and music. Like minimalist art and music, code is repetitious, and it is this repetition more than anything that creates the ASMR effect noted by the group.
Where are the ANNs?
One member of the group expressed some—mild and good-natured—frustration at our progress through they book. “I don’t feel like I’ve learned anything about deep learning yet.”
I can understand this point of view, even though it is “wrong” in an obvious way. Another member of the group observed that she has already learned may things about deep learning that she wished she had known earlier. Learning about the data structures used to build artificial neural networks (ANNs) is clearly important, if you want to learn how ANNs are built!
But it is easy to see why a reader at this point in the book might be frustrated, and not feel that they have learned anything yet about ANNs. I think there are two good reasons for this feeling:
- The dialogic style of the book does not lend itself to summary and signposting. The authors do not explain why they are introducing certain topics at certain times. The book tries to maintain the Socratic framework in which the teacher (left column) has the entire subject mapped out in advance, and the student (right column) is just happy to follow the reasoning step-by-step without trying to glance at what’s coming next. This is an interesting decision on the authors’ part. Why not just have their student character ask the teacher from time to time why a given topic is important to the overall progress of the book?
- So far we have learned about data structures only. The
line
function allows a linear equation to be represented in a parameterised form, so its parameters can be learned. Thetensor
structure allows the parameters for many interlinkedline
functions to be stored in a convenient way, so thatline
s can be built up into complex networks that model the relationships between observed values in the training data. While data structures are very important, they somehow seem less essential that algorithms in computing. The authors decided it was important to learn the data structures before the algorithms, perhaps because it is impossible to demonstrate the algorithms without having data for the algorithms to work on. This decision to teach the data structures first makes good sense, but it is bound to cause frustration in certain readers, when they expect to be shown the algorithms that make an ANN work.
The Little Learner is a great book. We are making good progress through it, and it is a lot of fun to read aloud. It is an enlivening challenge to learn how to recite the code. It is an enlivening challenge to try to see the bigger picture as it is gradually sketched out piece-by-piece. Next week we do learn an algorithm, one of the most important algorithms of our time: gradient descent. We will have to see where that slippery slope lands us.