Symmetric positive semi-definite (PSD) matrices come up in a variety of places in machine learning, statistics, and optimization, and more generally in most domains of applied mathematics. When estimating or optimizing over the set of such matrices, several geometries can be used. The most direct one is to consider PSD matrices as a convex set…
Category: Optimization
Playing with positive definite matrices – I: matrix monotony and convexity
In a series of a few blog posts, I will present classical and non-classical results on symmetric positive definite matrices. Beyond being mathematically exciting, they arise naturally a lot in machine learning and optimization, as Hessians of twice continuously differentiable convex functions and through kernel methods. In this post, I will focus on the benefits…
Approximating integrals with Laplace’s method
Integrals appear everywhere in all scientific fields, and their numerical computation is an active area of research. In the playbook of approximation techniques, my personal favorite is “la méthode de Laplace”, a must-know for students that like to cut integrals into pieces, that comes with lots of applications. We will be concerned with integrals of…
The quest for adaptivity
Most machine learning classes and textbooks mention that there is no universal supervised learning algorithm that can do reasonably well on all learning problems. Indeed, a series of “no free lunch theorems” state that even in a simple input space, for any learning algorithm, there always exists a bad conditional distribution of outputs given inputs…
Going beyond least-squares – II : Self-concordant analysis for logistic regression
Last month, we saw that self-concordance is a key property in optimization, to use local quadratic approximations in the sharpest possible way. In particular it was an affine-invariant quantity leading to a simple and elegant analysis of Newton method. The key assumption was a link between third and second-order derivatives, which took the following form…
Going beyond least-squares – I : self-concordant analysis of Newton method
Least-squares is a workhorse of optimization, machine learning, statistics, signal processing, and many other scientific fields. I find it particularly appealing (too much, according to some of my students and colleagues…), because all algorithms, such as stochastic gradient [1], and analyses, such as for kernel ridge regression [2], are much simpler and rely on reasonably…