Machine Learning Research Blog

Francis Bach

Menu
  • Home
  • About
  • Home page
Menu

I am writing a book!

Posted on April 5, 2021April 6, 2021 by Francis Bach

After several attempts, I finally found the energy to start writing a book. It grew out of lecture notes for a graduate class I taught last semester. I make the draft available so that I can get feedback before a (hopefully) final effort next semester.

The goal of the book is to present old and recent results in learning theory, for the most widely-used learning architectures. This book is geared towards theory-oriented students as well as students who want to acquire a basic mathematical understanding of algorithms used throughout machine learning and associated fields that are large users of learning methods, such as computer vision or natural language processing.

A particular effort is made to prove many results from first principles, while keeping the exposition as simple as possible. This will naturally lead to a choice of key results that show-case in simple but relevant instances the important concepts in learning theory. Some general results will also be presented without proofs. Of course the concept of first principles is subjective, and a good knowledge of linear algebra, probability theory and differential calculus will be assumed.

Moreover, I will focus on the part of learning theory that does not exist outside of algorithms that can be run in practice, and thus all algorithmic frameworks described in this book are routinely used. For most learning methods, some simple illustrative experiments are presented, with the plan to have accompanying code (Matlab, Julia, and Python) so that students can see for themselves that the algorithms are simple and effective in synthetic experiments.

This is not an introductory textbook on machine learning. There are already several good ones in several languages [1, 2]. Many topics are not covered, and many more are not covered in much depth. There are many good textbooks on learning theory that go deeper [3, 4, 5].

The choice of topics is arbitrary (and thus personal). Many important algorithmic frameworks are forgotten (e.g.,  reinforcement learning, unsupervised learning, etc.). Suggestions of extra themes are welcome! A few additional chapters are currently being written such as: ensemble learning, bandit optimization, probabilistic methods, structured prediction.

Help wanted!

This is still work in progress. In particular, there are still a lot of typos, probably some mistakes, and almost surely places where more details are needed; readers are most welcome to report them to me (and then get credit for it). Also, the bibliography is currently quite short and would benefit from some expansion (all suggestions welcome, in particular for giving proper credit).

Moreover, I am convinced that simpler mathematical arguments are possible in many places in the book. If you are aware of elegant and simple ideas that I have overlooked, please let me know.

References

[1] Chloé-Agathe Azencott. Introduction au Machine Learning. Dunod, 2019.
[2] Ethem Alpaydin. Introduction to Machine Learning. MIT Press, 2020.
[3] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2018.
[4] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[5] Andreas Christmann and Ingo Steinwart. Support Vector Machines. Springer, 2008.

18 thoughts on “I am writing a book!”

  1. Statistical algorithm lover says:
    April 5, 2021 at 1:13 pm

    Dear professor,

    I think you should add some remark/discussion/reference at the end of each chapter. It is very helpful for self-study students.

    Reply
    1. Francis Bach says:
      June 14, 2021 at 7:47 am

      Good suggestion. I’ll keep in mind when preparing the final version (but these may take a lot of time).
      Francis

      Reply
  2. Sean Meyn says:
    April 5, 2021 at 8:13 pm

    Hey, Francis! I’m writing a book, too. “Control Systems & Reinforcement Learning” will be complete by July, and published by CUP early in 2022. I need good references on ML so will be interested to see a draft of your book. I’m happy to share the 80% of my book that is complete. It is for super beginners.

    Reply
  3. Mostafa Touny says:
    April 5, 2021 at 10:03 pm

    Thank you for your interest. Considering you are preparing for your new book, This might be an appropriate time to contribute learning theory to TCS awesome list

    https://github.com/mostafatouny/awesome-theoretical-computer-science

    Reply
    1. Francis Bach says:
      June 14, 2021 at 7:44 am

      I was unaware of this list. It is very nice!
      Francis

      Reply
  4. Alan DenAde says:
    April 6, 2021 at 2:07 am

    Hi Francis, I think found a typo on page i. You write that harder sections are denoted by diamonds and you list 2 diamonds , 2 diamonds, 3 diamonds rather than 1 diamond, 2 diamonds, 3 diamonds

    Reply
    1. Francis Bach says:
      June 14, 2021 at 7:40 am

      Thanks.
      I will look into it.
      Francis

      Reply
  5. Fabien Champigny says:
    April 7, 2021 at 9:08 pm

    Hi,
    I’m available for a reading if you need to.
    Je parle Français aussi !

    Regards,
    Fabien

    Reply
    1. Francis Bach says:
      June 14, 2021 at 7:40 am

      Merci!
      Francis

      Reply
  6. Richard Oberdieck says:
    April 12, 2021 at 9:30 am

    Sounds great! Unfortunately I don’t have time to really be a useful reader for the book, but I will most certainly get a copy when it is available. The one thing I will though add is that I recently came across this blog post, which goes about how to self-publish a book:

    http://theroadchoseme.com/how-i-self-published-a-professional-paperback-and-ebook-using-latex-and-pandoc?1

    I found it very inspiring since I have a fairly strong dislike of the (academic) publishing system, and this allows one to avoid it.

    Have fun writing, and do post here when you are done 🙂

    Reply
    1. Francis Bach says:
      June 14, 2021 at 7:39 am

      Hi!
      Not all academic publishers are evil. The university presses (like MIT Press, Cambridge University Press, etc.) typically offer good conditions which meet most researchers needs.
      Disclaimer: I am a series editor at MIT Press (https://mitpress.mit.edu/books/series/adaptive-computation-and-machine-learning-series).
      Francis

      Reply
  7. Andrea Panizza says:
    June 22, 2021 at 4:33 pm

    Hi Francis,

    thanks for your work! It looks like a really nice book. Two quick questions:

    1. What are the many differences with respect to Mohri et al., and Shalev-Shwartz and Ben-David?
    2. Any reasons why you chose not to cover the growth function and the VC-dimension?

    Reply
    1. Francis Bach says:
      June 25, 2021 at 1:53 pm

      Thanks for your message.
      The two books you mention are great, and a significant amount of material is shared. I would say that I put more emphasis on the estimation of real-valued prediction functions (this is why I don’t cover VC-dimension) and the links with high-dimensional statistics (with generalization bounds that depend more finely on the underlying problem, with assumptions that typically cannot be checked in practice, but which I think are more informative regarding why a learning method works or does not, see more details in this month blog post: https://francisbach.com/quest-for-adaptivity/).
      Francis

      Reply
      1. Andrea Panizza says:
        June 28, 2021 at 7:37 pm

        Thanks for your reply! I see that you published a new draft. Great! Coming back to your answer, I see now that your book definitely puts more emphasis on the regression task than the other two ones. Also, with respect to those two texts, I see there’s extra stuff of high relevance to modern neural networks, such as for example:

        – concentration inequalities for matrices.
        – double descent. I think this is the first book which mentions it! Thanks a lot for including this topic.

        Finally, do you think it would make sense to extend a bit the neural network part, including, e.g. a discussion of the Neural Tangent Kernel? Or is it out of scope?

        Reply
        1. Francis Bach says:
          June 29, 2021 at 7:56 am

          Thanks for your encouraging message. As for the neural tangent kernel, I will most probably mention it when I take material from this blog post (https://francisbach.com/gradient-descent-for-wide-two-layer-neural-networks-implicit-bias/), but not at great length.
          Francis

          Reply
  8. Calimero says:
    June 25, 2021 at 12:29 pm

    I just wanted to (randomly xD) let you know that I just read the table of contents and got really excited about the book. It feels like a great way to start getting a more solid theoretical grasp. I also love the fact that it covers more recent phenomena like the double descent curve, which are often not covered in slightly older but otherwise solid introductory machine learning books. Thanks for making this draft publicly available!

    Reply
  9. Suavis says:
    March 2, 2024 at 4:53 am

    2nd year Ph.D. in applied math here : )

    I started to look into this recently proposed first order method called SAM — sharpness-aware-minimization, which still have a lot of open questions. This method improves generalization in nn training, and I am trying to understand how it does.

    Not sure if you are willing to add this to your book haha.

    Reply
    1. Francis Bach says:
      October 3, 2024 at 8:05 pm

      Very interesting topic indeed, but a bit too advanced for the book.
      Best
      Francis

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!
  • Scaling laws of optimization
  • Unraveling spectral properties of kernel matrices – I
  • Revisiting the classics: Jensen’s inequality

About

I am Francis Bach, a researcher at INRIA in the Computer Science department of Ecole Normale Supérieure, in Paris, France. I have been working on machine learning since 2000, with a focus on algorithmic and theoretical contributions, in particular in optimization. All of my papers can be downloaded from my web page or my Google Scholar page. I also have a Twitter account.

Recent Posts

  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!
  • Scaling laws of optimization
  • Unraveling spectral properties of kernel matrices – I
  • Revisiting the classics: Jensen’s inequality

Recent Comments

  • Francis Bach on Unraveling spectral properties of kernel matrices – II
  • Chanwoo Chun on Unraveling spectral properties of kernel matrices – II
  • Antonio Horta Ribeiro on Unraveling spectral properties of kernel matrices – II
  • Francis Bach on My book is (at last) out!
  • Francis Bach on Unraveling spectral properties of kernel matrices – I

Archives

  • March 2025
  • December 2024
  • October 2024
  • January 2024
  • March 2023
  • February 2023
  • December 2022
  • November 2022
  • September 2022
  • July 2022
  • April 2022
  • March 2022
  • February 2022
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019

Categories

  • Machine learning
  • Opinions
  • Optimization
  • Tools

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©2025 Machine Learning Research Blog | WordPress Theme by Superbthemes.com