Skip to main content
Home

Main navigation

  • Home
User account menu
  • Log in

Breadcrumb

  1. Home

The Hundred-Page Language Models Book - Book Review

By Skander, 2 January, 2026
The one-hundred-page ML book

Iโ€™ve spent the past few days reading ๐˜›๐˜ฉ๐˜ฆ ๐˜–๐˜ฏ๐˜ฆ-๐˜๐˜ถ๐˜ฏ๐˜ฅ๐˜ณ๐˜ฆ๐˜ฅ-๐˜—๐˜ข๐˜จ๐˜ฆ ๐˜”๐˜ข๐˜ค๐˜ฉ๐˜ช๐˜ฏ๐˜ฆ ๐˜“๐˜ฆ๐˜ข๐˜ณ๐˜ฏ๐˜ช๐˜ฏ๐˜จ Book by Andriy Burkov, and Iโ€™m genuinely glad I did.
This is one of those rare books that manages to strike what feels like a ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ฒ๐—ฐ๐˜ ๐—ฏ๐—ฎ๐—น๐—ฎ๐—ป๐—ฐ๐—ฒ: it covers a remarkably wide range of machine learning algorithms and techniques in a very short space, explains them in a clear and engaging way, and yet ๐—ป๐—ฒ๐˜ƒ๐—ฒ๐—ฟ ๐˜€๐—ฎ๐—ฐ๐—ฟ๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฒ๐˜€ ๐—ฟ๐—ถ๐—ด๐—ผ๐—ฟ. The mathematical foundations are always there โ€” equations included โ€” but introduced only when they are truly needed.

The book starts by revisiting just enough linear algebra, calculus, and probability to get you going, and then gradually builds up concepts on a need-to-know basis. Fundamental regression and classification methods (linear and logistic regression, SVMs, k-NN) are explained intuitively, before moving on to ๐—ป๐—ฒ๐˜‚๐—ฟ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ๐˜€, ๐—–๐—ก๐—ก๐˜€, ๐—ฎ๐—ป๐—ฑ ๐—ฅ๐—ก๐—ก๐˜€. I particularly enjoyed the treatment of gated recurrent units: the explanation of how a ๐—š๐—ฅ๐—จ learns to read, write, and erase memory โ€” complete with the governing equations โ€” was crystal clear.

What also stood out to me is how pragmatic the book is. Beyond algorithms, it walks through the real steps of an ML project: feature engineering, model selection, hyperparameter tuning, and the ever-present biasโ€“variance tradeoff, which is discussed thoughtfully throughout. The sections on ensemble methods, especially bagging and gradient boosting presented side by side, were especially well done.

I also appreciated how the book clears up several modern confusions. It rigorously defines โ€œshallow learningโ€ as learning directly from features, reminding us that deep learning is not only about neural networks. It clarifies that embeddings donโ€™t necessarily live in higher-dimensional spaces โ€” a misconception reinforced by todayโ€™s focus on LLMs. And it shows how encoders and decoders go far beyond transformers, with practical examples using autoencoders and ladder networks for problems like limited labels or sparse recommender data.

Finally, the learning-to-rank section โ€” and in particular the discussion of ๐—Ÿ๐—ฎ๐—บ๐—ฏ๐—ฑ๐—ฎ๐— ๐—”๐—ฅ๐—ง โ€” really stood out to me. Although introduced in the context of information retrieval and ranking documents returned by a search engine, the ideas immediately generalize beyond search. I couldnโ€™t help but think about applications in logistics learning to sequence stops, prioritize actions, or even approximate solutions to  ๐—ฟ๐—ผ๐˜‚๐˜๐—ถ๐—ป๐—ด ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ๐˜€ like VRP or TSP. Itโ€™s one of those sections that doesnโ€™t just explain a method โ€” it actively expands how you think about where that method can be used.

I highly recommend this book to anyone who wants to start machine learning on solid footing. Itโ€™s equally valuable for seasoned practitioners like myself as a way to periodically step back, reconnect the dots, and remind ourselves what the field is really built on.

Related websites

https://themlbook.com/

  • Add new comment
Tags
Science books

My Apps

  • One-dimensional Cellular Automata Simulator
  • Collatz (Syracuse) Sequence Calculator / Visualizer
  • Erdล‘sโ€“Rรฉnyi Random Graph Generator / Analyzer
  • KMeans Animator
  • Language Family Explorer

New Articles

The Hundred-Page Language Models Book - Book Review
A Utility for Converting TSPLIB files to JSON Format
Escape YouTube Filter Bubble - An LLM-based Video Recommender
Implementing a 1-D Binary-State Cellular Automaton with TypeScript, Svelte, and PixiJS
A Parametric Approach to Cellular Automata Framework Design

Skander Kort