Machine Learning Research Blog

Francis Bach

Menu
  • Home
  • About
  • Home page
Menu

Going beyond least-squares – I : self-concordant analysis of Newton method

Posted on February 1, 2021February 25, 2021 by Francis Bach

Least-squares is a workhorse of optimization, machine learning, statistics, signal processing, and many other scientific fields. I find it particularly appealing (too much, according to some of my students and colleagues…), because all algorithms, such as stochastic gradient [1], and analyses, such as for kernel ridge regression [2], are much simpler and rely on reasonably…

Read more

Finding global minima with kernel approximations

Posted on January 5, 2021January 24, 2021 by Francis Bach

Last month, I showed how global optimization based only on accessing function values can be hard with no convexity assumption. In a nutshell, with limited smoothness, the number of function evaluations has to grow exponentially fast in dimension, which is a rather negative statement. On the positive side, this number does not grow as fast…

Read more

Optimization is as hard as approximation

Posted on December 18, 2020June 14, 2021 by Francis Bach

Optimization is a key tool in machine learning, where the goal is to achieve the best possible objective function value in a minimum amount of time. Obtaining any form of global guarantees can usually be done with convex objective functions, or with special cases such as risk minimization with one-hidden over-parameterized layer neural networks (see…

Read more

The Cauchy residue trick: spectral analysis made “easy”

Posted on November 7, 2020November 27, 2022 by Francis Bach

In many areas of machine learning, statistics and signal processing, eigenvalue decompositions are commonly used, e.g., in principal component analysis, spectral clustering, convergence analysis of Markov chains, convergence analysis of optimization algorithms, low-rank inducing regularizers, community detection, seriation, etc. Understanding how the spectral decomposition of a matrix changes as a function of a matrix is…

Read more

Polynomial magic III : Hermite polynomials

Posted on October 8, 2020 by Francis Bach

After two blog posts earlier this year on Chebyshev and Jacobi polynomials, I am coming back to orthogonal polynomials, with Hermite polynomials. This time, in terms of applications to machine learning, no acceleration, but some interesting closed-form expansions in positive-definite kernel methods. Definition and first properties There are many equivalent ways to define Hermite polynomials….

Read more

The many faces of integration by parts – II : Randomized smoothing and score functions

Posted on September 7, 2020January 10, 2021 by Francis Bach

This month I will follow-up on last month blog post and look at another application of integration by parts, which is central to many interesting algorithms in machine learning, optimization and statistics. In this post, I will consider extensions in higher dimensions, where we take integrals on a subset of \(\mathbb{R}^d\), and focus primarily on…

Read more

The many faces of integration by parts – I : Abel transformation

Posted on August 4, 2020August 13, 2020 by Francis Bach

Integration by parts is a highlight of any calculus class. It leads to multiple classical applications for integration of logarithms, exponentials, etc., and it is the source of an infinite number of exercises and applications to special functions. In this post, I will look at a classical discrete extension that is useful in machine learning…

Read more

Gradient descent for wide two-layer neural networks – II: Generalization and implicit bias

Posted on July 13, 2020July 27, 2020 by Lénaïc Chizat

In this blog post, we continue our investigation of gradient flows for wide two-layer “relu” neural networks. In the previous post, Francis explained that under suitable assumptions these dynamics converge to global minimizers of the training objective. Today, we build on this to understand qualitative aspects of the predictor learnt by such neural networks. The…

Read more

Gradient descent for wide two-layer neural networks – I : Global convergence

Posted on June 1, 2020November 15, 2022 by Francis Bach

Supervised learning methods come in a variety of flavors. While local averaging techniques such as nearest-neighbors or decision trees are often used with low-dimensional inputs where they can adapt to any potentially non-linear relationship between inputs and outputs, methods based on empirical risk minimization are the most commonly used in high-dimensional settings. Their principle is…

Read more

Effortless optimization through gradient flows

Posted on May 1, 2020May 22, 2020 by Francis Bach

Optimization algorithms often rely on simple intuitive principles, but their analysis quickly leads to a lot of algebra, where the original idea is not transparent. In last month post, Adrien Taylor explained how convergence proofs could be automated. This month, I will show how proof sketches can be obtained easily for algorithms based on gradient…

Read more
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next

Recent Posts

  • Closed-form dynamics beyond quadratics
  • Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Revisiting scaling laws via the z-transform
  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!

About

I am Francis Bach, a researcher at INRIA in the Computer Science department of Ecole Normale Supérieure, in Paris, France. I have been working on machine learning since 2000, with a focus on algorithmic and theoretical contributions, in particular in optimization. All of my papers can be downloaded from my web page or my Google Scholar page. I also have a Twitter account. I recently published a book “Learning Theory from First Principles“.

Recent Posts

  • Closed-form dynamics beyond quadratics
  • Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Revisiting scaling laws via the z-transform
  • Unraveling spectral properties of kernel matrices – II
  • My book is (at last) out!

Recent Comments

  • Francis Bach on Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Akshay on Beyond Power Laws: Scaling Laws for Next-Token Prediction
  • Francis Bach on My book is (at last) out!
  • Sidd on My book is (at last) out!
  • Francis Bach on Sums-of-squares for dummies: a view from the Fourier domain

Archives

  • March 2026
  • September 2025
  • July 2025
  • March 2025
  • December 2024
  • October 2024
  • January 2024
  • March 2023
  • February 2023
  • December 2022
  • November 2022
  • September 2022
  • July 2022
  • April 2022
  • March 2022
  • February 2022
  • July 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019

Categories

  • Machine learning
  • Opinions
  • Optimization
  • Tools

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©2026 Machine Learning Research Blog | WordPress Theme by Superbthemes.com