Machine Learning Research Blog

Francis Bach

Menu
  • Home
  • About
  • Home page
Menu

Category: Machine learning

Machine learning concepts or tools

Closed-form dynamics beyond quadratics

Posted on March 5, 2026March 5, 2026 by Francis Bach

Quadratic functions are the workhorse of the analysis of iterative algorithms such as gradient-based optimization. They lead, in discrete and continuous time, to closed-form dynamics that treat all eigensubspaces of the Hessian matrix independently. This leads to simple math for understanding convergence behaviors (maximal step-size, condition number, acceleration, scaling laws for gradient descent or its…

Read more

Beyond Power Laws: Scaling Laws for Next-Token Prediction

Posted on September 27, 2025 by Frederik Kunstner and Francis Bach

The past posts on optimization scaling laws [1, 2] focused on problems that do not become significantly harder as the problem size increases. We showed that for some problems, as the dimension \(d\) goes to infinity, the optimality gap converges at a sublinear rate \(\Theta(k^{-p})\) for some power \(p\) depending on the problem, but independent…

Read more

Revisiting scaling laws via the z-transform

Posted on July 18, 2025 by Francis Bach

In the last few years, we have seen a surge of empirical and theoretical works about “scaling laws”, whose goals are to characterize the performance of learning methods based on various problem parameters (e.g., number of observations and parameters, or amount of compute). From a theoretical point of view, this marks a renewed interest in…

Read more

Unraveling spectral properties of kernel matrices – II

Posted on March 24, 2025March 24, 2025 by Francis Bach

This month, we pursue our exploration of spectral properties of kernel matrices. As mentioned in a previous post, understanding how eigenvalues decay is not only fun but also key to understanding algorithmic and statistical properties of many learning methods (see, e.g., chapter 7 of my book “Learning Theory from First Principles“). This month, we look…

Read more

My book is (at last) out!

Posted on December 21, 2024 by Francis Bach

Just in time for Christmas, I received two days ago the first hard copies of my book! It is a mix of feelings of relief and pride after 3 years of work. As most book writers will probably acknowledge, it took much longer than I expected when I started, but overall it was an enriching…

Read more

Scaling laws of optimization

Posted on October 5, 2024October 21, 2024 by Francis Bach

Scaling laws have been one of the key achievements of theoretical analysis in various fields of applied mathematics and computer science, answering the following key question: How fast does my method or my algorithm converge as a function of (potentially partially) observable problem parameters. For supervised machine learning and statistics, probably the simplest and oldest…

Read more

Unraveling spectral properties of kernel matrices – I

Posted on January 7, 2024January 12, 2024 by Francis Bach

Since my early PhD years, I have plotted and studied eigenvalues of kernel matrices. In the simplest setting, take independent and identically distributed (i.i.d.) data, such as in the cube below in 2 dimensions, take your favorite kernels, such as the Gaussian or Abel kernels, plot eigenvalues in decreasing order, and see what happens. The…

Read more

Revisiting the classics: Jensen’s inequality

Posted on March 13, 2023March 15, 2023 by Francis Bach

There are a few mathematical results that any researcher in applied mathematics uses on a daily basis. One of them is Jensen’s inequality, which allows bounding expectations of functions of random variables. This really happens a lot in any probabilistic arguments but also as a tool to generate inequalities and optimization algorithms. In this blog…

Read more

Rethinking SGD’s noise – II: Implicit Bias

Posted on September 18, 2022September 22, 2022 by Loucas Pillaud-Vivien and Scott Pesme

In the previous post, we showed (or at least tried to!) how the inherent noise of the stochastic gradient descent algorithm (SGD), in the context of modern overparametrised architectures, is structured and carries two important features: (i) it vanishes for interpolating solutions and (ii) it belongs to a low-dimensional manifold spanned by the gradients. Building…

Read more

Rethinking SGD’s noise

Posted on July 25, 2022August 3, 2022 by Loucas Pillaud-Vivien

It seemed a bit unfair to devote a blog to machine learning (ML) without talking about its current core algorithm: stochastic gradient descent (SGD). Indeed, SGD has become, year after year, the basic foundation of many algorithms used for large-scale ML problems. However, the history of stochastic approximation is much older than that of ML:…

Read more
  • 1
  • 2
  • 3
  • Next

Recent Posts

  • Closed-form dynamics beyond quadratics
  • Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Revisiting scaling laws via the z-transform
  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!

About

I am Francis Bach, a researcher at INRIA in the Computer Science department of Ecole Normale Supérieure, in Paris, France. I have been working on machine learning since 2000, with a focus on algorithmic and theoretical contributions, in particular in optimization. All of my papers can be downloaded from my web page or my Google Scholar page. I also have a Twitter account. I recently published a book “Learning Theory from First Principles“.

Recent Posts

  • Closed-form dynamics beyond quadratics
  • Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Revisiting scaling laws via the z-transform
  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!

Recent Comments

  • Francis Bach on Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Akshay on Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Francis Bach on My book is (at last) out!
  • Sidd on My book is (at last) out!
  • Francis Bach on Sums-of-squares for dummies: a view from the Fourier domain

Archives

  • March 2026
  • September 2025
  • July 2025
  • March 2025
  • December 2024
  • October 2024
  • January 2024
  • March 2023
  • February 2023
  • December 2022
  • November 2022
  • September 2022
  • July 2022
  • April 2022
  • March 2022
  • February 2022
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019

Categories

  • Machine learning
  • Opinions
  • Optimization
  • Tools

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©2026 Machine Learning Research Blog | WordPress Theme by Superbthemes.com