This month I will follow-up on last month blog post and look at another application of integration by parts, which is central to many interesting algorithms in machine learning, optimization and statistics. In this post, I will consider extensions in higher dimensions, where we take integrals on a subset of \(\mathbb{R}^d\), and focus primarily on…
The many faces of integration by parts – I : Abel transformation
Integration by parts is a highlight of any calculus class. It leads to multiple classical applications for integration of logarithms, exponentials, etc., and it is the source of an infinite number of exercises and applications to special functions. In this post, I will look at a classical discrete extension that is useful in machine learning…
Gradient descent for wide two-layer neural networks – II: Generalization and implicit bias
In this blog post, we continue our investigation of gradient flows for wide two-layer “relu” neural networks. In the previous post, Francis explained that under suitable assumptions these dynamics converge to global minimizers of the training objective. Today, we build on this to understand qualitative aspects of the predictor learnt by such neural networks. The…
Gradient descent for wide two-layer neural networks – I : Global convergence
Supervised learning methods come in a variety of flavors. While local averaging techniques such as nearest-neighbors or decision trees are often used with low-dimensional inputs where they can adapt to any potentially non-linear relationship between inputs and outputs, methods based on empirical risk minimization are the most commonly used in high-dimensional settings. Their principle is…
Effortless optimization through gradient flows
Optimization algorithms often rely on simple intuitive principles, but their analysis quickly leads to a lot of algebra, where the original idea is not transparent. In last month post, Adrien Taylor explained how convergence proofs could be automated. This month, I will show how proof sketches can be obtained easily for algorithms based on gradient…
Computer-aided analyses in optimization
In this blog post, I want to illustrate how computers can be great allies in designing (and verifying) convergence proofs for first-order optimization methods. This task can be daunting, and highly non-trivial, but nevertheless usually unavoidable when performing complexity analyses. A notable example is probably the convergence analysis of the stochastic average gradient (SAG) [1],…
On the unreasonable effectiveness of Richardson extrapolation
This month, I will follow up on last month’s blog post, and describe classical techniques from numerical analysis that aim at accelerating the convergence of a vector sequence to its limit, by only combining elements of the sequence, and without the detailed knowledge of the iterative process that has led to this sequence. Last month,…
Acceleration without pain
I don’t know of any user of iterative algorithms who has not complained one day about their convergence speed. Whether the data are too big, the processors not fast or numerous enough, waiting for an algorithm to converge unfortunately remains a core practical component of computer science and applied mathematics. This was already a concern…
The sum of a geometric series is all you need!
I sometimes joke with my students about one of the main tools I have been using in the last ten years: the explicit sum of a geometric series. Why is this? From numbers to operators The simplest version of this basic result for real numbers is the following: $$ \forall r \neq 1, \ \forall…
Polynomial magic II : Jacobi polynomials
Following up my last post on Chebyshev polynomials, another piece of polynomial magic this month. This time, Jacobi polynomials will be the main players. Since definitions and various formulas are not as intuitive as for Chebyshev polynomials, I will start by the machine learning / numerical analysis motivation, which is an elegant refinement of Chebyshev…