After several attempts, I finally found the energy to start writing a book. It grew out of lecture notes for a graduate class I taught last semester. I make the draft available so that I can get feedback before a (hopefully) final effort next semester.
The goal of the book is to present old and recent results in learning theory, for the most widely-used learning architectures. This book is geared towards theory-oriented students as well as students who want to acquire a basic mathematical understanding of algorithms used throughout machine learning and associated fields that are large users of learning methods, such as computer vision or natural language processing.
A particular effort is made to prove many results from first principles, while keeping the exposition as simple as possible. This will naturally lead to a choice of key results that show-case in simple but relevant instances the important concepts in learning theory. Some general results will also be presented without proofs. Of course the concept of first principles is subjective, and a good knowledge of linear algebra, probability theory and differential calculus will be assumed.
Moreover, I will focus on the part of learning theory that does not exist outside of algorithms that can be run in practice, and thus all algorithmic frameworks described in this book are routinely used. For most learning methods, some simple illustrative experiments are presented, with the plan to have accompanying code (Matlab, Julia, and Python) so that students can see for themselves that the algorithms are simple and effective in synthetic experiments.
This is not an introductory textbook on machine learning. There are already several good ones in several languages [1, 2]. Many topics are not covered, and many more are not covered in much depth. There are many good textbooks on learning theory that go deeper [3, 4, 5].
The choice of topics is arbitrary (and thus personal). Many important algorithmic frameworks are forgotten (e.g., reinforcement learning, unsupervised learning, etc.). Suggestions of extra themes are welcome! A few additional chapters are currently being written such as: ensemble learning, bandit optimization, probabilistic methods, structured prediction.
Help wanted!
This is still work in progress. In particular, there are still a lot of typos, probably some mistakes, and almost surely places where more details are needed; readers are most welcome to report them to me (and then get credit for it). Also, the bibliography is currently quite short and would benefit from some expansion (all suggestions welcome, in particular for giving proper credit).
Moreover, I am convinced that simpler mathematical arguments are possible in many places in the book. If you are aware of elegant and simple ideas that I have overlooked, please let me know.
References
[1] Chloé-Agathe Azencott. Introduction au Machine Learning. Dunod, 2019.
[2] Ethem Alpaydin. Introduction to Machine Learning. MIT Press, 2020.
[3] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2018.
[4] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[5] Andreas Christmann and Ingo Steinwart. Support Vector Machines. Springer, 2008.
Dear professor,
I think you should add some remark/discussion/reference at the end of each chapter. It is very helpful for self-study students.
Good suggestion. I’ll keep in mind when preparing the final version (but these may take a lot of time).
Francis
Hey, Francis! I’m writing a book, too. “Control Systems & Reinforcement Learning” will be complete by July, and published by CUP early in 2022. I need good references on ML so will be interested to see a draft of your book. I’m happy to share the 80% of my book that is complete. It is for super beginners.
Thank you for your interest. Considering you are preparing for your new book, This might be an appropriate time to contribute learning theory to TCS awesome list
https://github.com/mostafatouny/awesome-theoretical-computer-science
I was unaware of this list. It is very nice!
Francis
Hi Francis, I think found a typo on page i. You write that harder sections are denoted by diamonds and you list 2 diamonds , 2 diamonds, 3 diamonds rather than 1 diamond, 2 diamonds, 3 diamonds
Thanks.
I will look into it.
Francis
Hi,
I’m available for a reading if you need to.
Je parle Français aussi !
Regards,
Fabien
Merci!
Francis
Sounds great! Unfortunately I don’t have time to really be a useful reader for the book, but I will most certainly get a copy when it is available. The one thing I will though add is that I recently came across this blog post, which goes about how to self-publish a book:
http://theroadchoseme.com/how-i-self-published-a-professional-paperback-and-ebook-using-latex-and-pandoc?1
I found it very inspiring since I have a fairly strong dislike of the (academic) publishing system, and this allows one to avoid it.
Have fun writing, and do post here when you are done 🙂
Hi!
Not all academic publishers are evil. The university presses (like MIT Press, Cambridge University Press, etc.) typically offer good conditions which meet most researchers needs.
Disclaimer: I am a series editor at MIT Press (https://mitpress.mit.edu/books/series/adaptive-computation-and-machine-learning-series).
Francis
Hi Francis,
thanks for your work! It looks like a really nice book. Two quick questions:
1. What are the many differences with respect to Mohri et al., and Shalev-Shwartz and Ben-David?
2. Any reasons why you chose not to cover the growth function and the VC-dimension?
Thanks for your message.
The two books you mention are great, and a significant amount of material is shared. I would say that I put more emphasis on the estimation of real-valued prediction functions (this is why I don’t cover VC-dimension) and the links with high-dimensional statistics (with generalization bounds that depend more finely on the underlying problem, with assumptions that typically cannot be checked in practice, but which I think are more informative regarding why a learning method works or does not, see more details in this month blog post: https://francisbach.com/quest-for-adaptivity/).
Francis
Thanks for your reply! I see that you published a new draft. Great! Coming back to your answer, I see now that your book definitely puts more emphasis on the regression task than the other two ones. Also, with respect to those two texts, I see there’s extra stuff of high relevance to modern neural networks, such as for example:
– concentration inequalities for matrices.
– double descent. I think this is the first book which mentions it! Thanks a lot for including this topic.
Finally, do you think it would make sense to extend a bit the neural network part, including, e.g. a discussion of the Neural Tangent Kernel? Or is it out of scope?
Thanks for your encouraging message. As for the neural tangent kernel, I will most probably mention it when I take material from this blog post (https://francisbach.com/gradient-descent-for-wide-two-layer-neural-networks-implicit-bias/), but not at great length.
Francis
I just wanted to (randomly xD) let you know that I just read the table of contents and got really excited about the book. It feels like a great way to start getting a more solid theoretical grasp. I also love the fact that it covers more recent phenomena like the double descent curve, which are often not covered in slightly older but otherwise solid introductory machine learning books. Thanks for making this draft publicly available!
2nd year Ph.D. in applied math here : )
I started to look into this recently proposed first order method called SAM — sharpness-aware-minimization, which still have a lot of open questions. This method improves generalization in nn training, and I am trying to understand how it does.
Not sure if you are willing to add this to your book haha.
Very interesting topic indeed, but a bit too advanced for the book.
Best
Francis