Francis Bach – Page 3 – Machine Learning Research Blog

The many faces of integration by parts – I : Abel transformation

Posted on August 4, 2020August 13, 2020 by Francis Bach

Integration by parts is a highlight of any calculus class. It leads to multiple classical applications for integration of logarithms, exponentials, etc., and it is the source of an infinite number of exercises and applications to special functions. In this post, I will look at a classical discrete extension that is useful in machine learning…

Gradient descent for wide two-layer neural networks – I : Global convergence

Posted on June 1, 2020November 15, 2022 by Francis Bach

Supervised learning methods come in a variety of flavors. While local averaging techniques such as nearest-neighbors or decision trees are often used with low-dimensional inputs where they can adapt to any potentially non-linear relationship between inputs and outputs, methods based on empirical risk minimization are the most commonly used in high-dimensional settings. Their principle is…

Effortless optimization through gradient flows

Posted on May 1, 2020May 22, 2020 by Francis Bach

Optimization algorithms often rely on simple intuitive principles, but their analysis quickly leads to a lot of algebra, where the original idea is not transparent. In last month post, Adrien Taylor explained how convergence proofs could be automated. This month, I will show how proof sketches can be obtained easily for algorithms based on gradient…

On the unreasonable effectiveness of Richardson extrapolation

Posted on March 1, 2020 by Francis Bach

This month, I will follow up on last month’s blog post, and describe classical techniques from numerical analysis that aim at accelerating the convergence of a vector sequence to its limit, by only combining elements of the sequence, and without the detailed knowledge of the iterative process that has led to this sequence. Last month,…

Acceleration without pain

Posted on February 4, 2020May 31, 2021 by Francis Bach

I don’t know of any user of iterative algorithms who has not complained one day about their convergence speed. Whether the data are too big, the processors not fast or numerous enough, waiting for an algorithm to converge unfortunately remains a core practical component of computer science and applied mathematics. This was already a concern…

The sum of a geometric series is all you need!

Posted on January 6, 2020March 10, 2020 by Francis Bach

I sometimes joke with my students about one of the main tools I have been using in the last ten years: the explicit sum of a geometric series. Why is this? From numbers to operators The simplest version of this basic result for real numbers is the following: $$ \forall r \neq 1, \ \forall…

Polynomial magic II : Jacobi polynomials

Posted on December 2, 2019April 16, 2020 by Francis Bach

Following up my last post on Chebyshev polynomials, another piece of polynomial magic this month. This time, Jacobi polynomials will be the main players. Since definitions and various formulas are not as intuitive as for Chebyshev polynomials, I will start by the machine learning / numerical analysis motivation, which is an elegant refinement of Chebyshev…

Polynomial magic I : Chebyshev polynomials

Posted on November 4, 2019December 1, 2019 by Francis Bach

Orthogonal polynomials pop up everywhere in applied mathematics and in particular in numerical analysis. Within machine learning and optimization, typically (a) they provide natural basis functions which are easy to manipulate, or (b) they can be used to model various acceleration mechanisms. In this post, I will describe one class of such polynomials, the Chebyshev…

Are all kernels cursed?

Posted on October 8, 2019October 28, 2019 by Francis Bach

The word “kernel” appears in many areas of science (it is even worse in French with “noyau”); it can have different meanings depending on context (see here for a nice short historical review for mathematics). Within machine learning and statistics, kernels are used in two related but different contexts, with different definitions and some kernels…

The Gumbel trick

Posted on September 2, 2019March 28, 2022 by Francis Bach

Quantities of the form $\displaystyle \log \Big( \sum_{i=1}^n \exp( x_i) \Big)$ for $x \in \mathbb{R}^n$, often referred to as “log-sum-exp” functions are ubiquitous in machine learning, as they appear in normalizing constants of exponential families, and thus in many supervised learning formulations such as softmax regression, but also more generally in (Bayesian or frequentist) probabilistic…