There are a few mathematical results that any researcher in applied mathematics uses on a daily basis. One of them is Jensen’s inequality, which allows bounding expectations of functions of random variables. This really happens a lot in any probabilistic arguments but also as a tool to generate inequalities and optimization algorithms. In this blog…
Category: Tools
Cute mathematical tools
Playing with positive definite matrices – II: entropy edition
Symmetric positive semi-definite (PSD) matrices come up in a variety of places in machine learning, statistics, and optimization, and more generally in most domains of applied mathematics. When estimating or optimizing over the set of such matrices, several geometries can be used. The most direct one is to consider PSD matrices as a convex set…
Playing with positive definite matrices – I: matrix monotony and convexity
In a series of a few blog posts, I will present classical and non-classical results on symmetric positive definite matrices. Beyond being mathematically exciting, they arise naturally a lot in machine learning and optimization, as Hessians of twice continuously differentiable convex functions and through kernel methods. In this post, I will focus on the benefits…
Approximating integrals with Laplace’s method
Integrals appear everywhere in all scientific fields, and their numerical computation is an active area of research. In the playbook of approximation techniques, my personal favorite is “la méthode de Laplace”, a must-know for students that like to cut integrals into pieces, that comes with lots of applications. We will be concerned with integrals of…
The Cauchy residue trick: spectral analysis made “easy”
In many areas of machine learning, statistics and signal processing, eigenvalue decompositions are commonly used, e.g., in principal component analysis, spectral clustering, convergence analysis of Markov chains, convergence analysis of optimization algorithms, low-rank inducing regularizers, community detection, seriation, etc. Understanding how the spectral decomposition of a matrix changes as a function of a matrix is…
Polynomial magic III : Hermite polynomials
After two blog posts earlier this year on Chebyshev and Jacobi polynomials, I am coming back to orthogonal polynomials, with Hermite polynomials. This time, in terms of applications to machine learning, no acceleration, but some interesting closed-form expansions in positive-definite kernel methods. Definition and first properties There are many equivalent ways to define Hermite polynomials….
The many faces of integration by parts – II : Randomized smoothing and score functions
This month I will follow-up on last month blog post and look at another application of integration by parts, which is central to many interesting algorithms in machine learning, optimization and statistics. In this post, I will consider extensions in higher dimensions, where we take integrals on a subset of \(\mathbb{R}^d\), and focus primarily on…
The many faces of integration by parts – I : Abel transformation
Integration by parts is a highlight of any calculus class. It leads to multiple classical applications for integration of logarithms, exponentials, etc., and it is the source of an infinite number of exercises and applications to special functions. In this post, I will look at a classical discrete extension that is useful in machine learning…
Effortless optimization through gradient flows
Optimization algorithms often rely on simple intuitive principles, but their analysis quickly leads to a lot of algebra, where the original idea is not transparent. In last month post, Adrien Taylor explained how convergence proofs could be automated. This month, I will show how proof sketches can be obtained easily for algorithms based on gradient…
Computer-aided analyses in optimization
In this blog post, I want to illustrate how computers can be great allies in designing (and verifying) convergence proofs for first-order optimization methods. This task can be daunting, and highly non-trivial, but nevertheless usually unavoidable when performing complexity analyses. A notable example is probably the convergence analysis of the stochastic average gradient (SAG) [1],…