Machine Learning Research Blog – Page 4

The sum of a geometric series is all you need!

Posted on January 6, 2020March 10, 2020 by Francis Bach

I sometimes joke with my students about one of the main tools I have been using in the last ten years: the explicit sum of a geometric series. Why is this? From numbers to operators The simplest version of this basic result for real numbers is the following: $$ \forall r \neq 1, \ \forall…

Polynomial magic II : Jacobi polynomials

Posted on December 2, 2019April 16, 2020 by Francis Bach

Following up my last post on Chebyshev polynomials, another piece of polynomial magic this month. This time, Jacobi polynomials will be the main players. Since definitions and various formulas are not as intuitive as for Chebyshev polynomials, I will start by the machine learning / numerical analysis motivation, which is an elegant refinement of Chebyshev…

Polynomial magic I : Chebyshev polynomials

Posted on November 4, 2019December 1, 2019 by Francis Bach

Orthogonal polynomials pop up everywhere in applied mathematics and in particular in numerical analysis. Within machine learning and optimization, typically (a) they provide natural basis functions which are easy to manipulate, or (b) they can be used to model various acceleration mechanisms. In this post, I will describe one class of such polynomials, the Chebyshev…

Are all kernels cursed?

Posted on October 8, 2019October 28, 2019 by Francis Bach

The word “kernel” appears in many areas of science (it is even worse in French with “noyau”); it can have different meanings depending on context (see here for a nice short historical review for mathematics). Within machine learning and statistics, kernels are used in two related but different contexts, with different definitions and some kernels…

The Gumbel trick

Posted on September 2, 2019March 28, 2022 by Francis Bach

Quantities of the form $\displaystyle \log \Big( \sum_{i=1}^n \exp( x_i) \Big)$ for $x \in \mathbb{R}^n$, often referred to as “log-sum-exp” functions are ubiquitous in machine learning, as they appear in normalizing constants of exponential families, and thus in many supervised learning formulations such as softmax regression, but also more generally in (Bayesian or frequentist) probabilistic…

The “η-trick” reloaded: multiple kernel learning

Posted on August 5, 2019August 5, 2019 by Francis Bach

In my previous post, I described various (potentially non-smooth) functions that have quadratic (and thus smooth) variational formulations, a possibility that I referred to as the η-trick. For example, in its simplest formulation, we have $ \displaystyle |w| = \min_{ \eta \geq 0} \frac{1}{2} \frac{w^2}{\eta} + \frac{1}{2} \eta$. While it seems most often used for…

The “η-trick” or the effectiveness of reweighted least-squares

Posted on July 1, 2019July 21, 2022 by Francis Bach

Optimizing a quadratic function is often considered “easy” as it is equivalent to solving a linear system, for which many algorithms exist. Thus, reformulating a non-quadratic optimization problem into a sequence of quadratic problems is a natural idea. While the standard generic way is Newton method, which is adapted to smooth (at least twice-differentiable) functions,…

I am starting a blog!

Posted on June 27, 2019June 27, 2019 by Francis Bach

I have finally found time to start a blog. Although I don’t know if I will find the energy to do all this in the next few months, I am planning to use this blog in a variety of ways, with one post every first monday of the month: To develop in reasonable depth some…