Tools – Page 2 – Machine Learning Research Blog

On the unreasonable effectiveness of Richardson extrapolation

Posted on March 1, 2020 by Francis Bach

This month, I will follow up on last month’s blog post, and describe classical techniques from numerical analysis that aim at accelerating the convergence of a vector sequence to its limit, by only combining elements of the sequence, and without the detailed knowledge of the iterative process that has led to this sequence. Last month,…

Acceleration without pain

Posted on February 4, 2020May 31, 2021 by Francis Bach

I don’t know of any user of iterative algorithms who has not complained one day about their convergence speed. Whether the data are too big, the processors not fast or numerous enough, waiting for an algorithm to converge unfortunately remains a core practical component of computer science and applied mathematics. This was already a concern…

The sum of a geometric series is all you need!

Posted on January 6, 2020March 10, 2020 by Francis Bach

I sometimes joke with my students about one of the main tools I have been using in the last ten years: the explicit sum of a geometric series. Why is this? From numbers to operators The simplest version of this basic result for real numbers is the following: $$ \forall r \neq 1, \ \forall…

Polynomial magic II : Jacobi polynomials

Posted on December 2, 2019April 16, 2020 by Francis Bach

Following up my last post on Chebyshev polynomials, another piece of polynomial magic this month. This time, Jacobi polynomials will be the main players. Since definitions and various formulas are not as intuitive as for Chebyshev polynomials, I will start by the machine learning / numerical analysis motivation, which is an elegant refinement of Chebyshev…

Polynomial magic I : Chebyshev polynomials

Posted on November 4, 2019December 1, 2019 by Francis Bach

Orthogonal polynomials pop up everywhere in applied mathematics and in particular in numerical analysis. Within machine learning and optimization, typically (a) they provide natural basis functions which are easy to manipulate, or (b) they can be used to model various acceleration mechanisms. In this post, I will describe one class of such polynomials, the Chebyshev…

The Gumbel trick

Posted on September 2, 2019March 28, 2022 by Francis Bach

Quantities of the form $\displaystyle \log \Big( \sum_{i=1}^n \exp( x_i) \Big)$ for $x \in \mathbb{R}^n$, often referred to as “log-sum-exp” functions are ubiquitous in machine learning, as they appear in normalizing constants of exponential families, and thus in many supervised learning formulations such as softmax regression, but also more generally in (Bayesian or frequentist) probabilistic…

The “η-trick” reloaded: multiple kernel learning

Posted on August 5, 2019August 5, 2019 by Francis Bach

In my previous post, I described various (potentially non-smooth) functions that have quadratic (and thus smooth) variational formulations, a possibility that I referred to as the η-trick. For example, in its simplest formulation, we have $ \displaystyle |w| = \min_{ \eta \geq 0} \frac{1}{2} \frac{w^2}{\eta} + \frac{1}{2} \eta$. While it seems most often used for…

The “η-trick” or the effectiveness of reweighted least-squares

Posted on July 1, 2019July 21, 2022 by Francis Bach

Optimizing a quadratic function is often considered “easy” as it is equivalent to solving a linear system, for which many algorithms exist. Thus, reformulating a non-quadratic optimization problem into a sequence of quadratic problems is a natural idea. While the standard generic way is Newton method, which is adapted to smooth (at least twice-differentiable) functions,…