In many areas of machine learning, statistics and signal processing, eigenvalue decompositions are commonly used, e.g., in principal component analysis, spectral clustering, convergence analysis of Markov chains, convergence analysis of optimization algorithms, low-rank inducing regularizers, community detection, seriation, etc.
Understanding how the spectral decomposition of a matrix changes as a function of a matrix is thus of primary importance, both algorithmically and theoretically. We thus need a perturbation analysis or more generally some differentiability properties for eigenvalues or eigenvectors [1], or any spectral function [2]. These properties can be obtained from many angles, but a generic tool can be used for all of these: it is a surprising and elegant application of Cauchy’s residue formula, which is due to Kato [3].
Before diving into spectral analysis, I will first present the Cauchy residue theorem and some nice applications in computing integrals that are needed in machine learning and kernel methods.
Cauchy residue formula
A function \(f : \mathbb{C} \to \mathbb{C}\) is said holomorphic in \(\lambda \in \mathbb{C}\) with derivative \(f'(\lambda) \in \mathbb{C}\), if is differentiable in \(\lambda\), that is if \(\displaystyle \frac{f(z)-f(\lambda)}{z-\lambda}\) tends to \(f'(\lambda)\) when \(z\) tends to \(\lambda\). Many classical functions are holomorphic on \(\mathbb{C}\) or portions thereof, such as the exponential, sines, cosines and their hyperbolic counterparts, rational functions, portions of the logarithm.
We consider a function which is holomorphic in a region of \(\mathbb{C}\) except in \(m\) values \(\lambda_1,\dots,\lambda_m \in \mathbb{C}\), which are usually referred to as poles. We also consider a simple closed directed contour \(\gamma\) in \(\mathbb{C}\) that goes strictly around the \(m\) values above.
The Cauchy residue formula gives an explicit formula for the contour integral along \(\gamma\):
$$ \oint_\gamma f(z) dz = 2 i \pi \sum_{j=1}^m {\rm Res}(f,\lambda_j), \tag{1}$$
where \({\rm Res}(f,\lambda)\) is called the residue of \(f\) at \(\lambda\) . If around \(\lambda\), \(f(z)\) has a series expansions in powers of \((z − \lambda)\), that is, \(\displaystyle f(z) = \sum_{k=-\infty}^{+\infty}a_k (z −\lambda)^k\), then \({\rm Res}(f,\lambda)=a_{-1}\).
For example, if \(\displaystyle f(z) = \frac{g(z)}{z-\lambda}\) with \(g\) holomorphic around \(\lambda\), then \({\rm Res}(f,\lambda) = g(\lambda)\), and more generally, if \(\displaystyle f(z) = \frac{g(z)}{(z-\lambda)^k}\) for \(k \geqslant 1\), then \(\displaystyle {\rm Res}(f,\lambda) = \frac{g^{(k-1)}(\lambda) }{(k-1)!}\). For more details on complex analysis, see [4].
The result above can be naturally extended to vector-valued functions (and thus to any matrix-valued function), by applying the identity to all components of the vector.
This result is due to Cauchy [10] in 1825. The original paper where this is presented is a nice read in French where you can find some pepits like “la fonction s’évanouit pour \(x = \infty\)”.
If you are already familiar with complex residues, you can skip the next section.
Where does it come from?
At first, the formula in Eq. (1) seems unsettling. Why doesn’t the result depend more explicitly on the contour \(\gamma\)? Where does the multiplicative term \( {2i\pi}\) come from? Here is a very partial and non rigorous account (go to the experts for more rigor!).
Complex-valued functions on \(\mathbb{C}\) can be seen as functions from \(\mathbb{R}^2\) to itself, by writing $$ f(x+iy) = u(x,y) + i v(x,y),$$ where \(u\) and \(v\) are real-valued functions. We have thus a function \((x,y) \mapsto (u(x,y),v(x,y))\) from \(\mathbb{R}^2\) to \(\mathbb{R}^2\). Expanding \(f(z+dz) = f(z) + f'(z) dz\), which is the definition of complex differentiability, into real and imaginary parts, we get (using \(i^2 = -1\)): $$\left\{ \begin{array}{l} u(x+dx,y+dy) = u(x,y) + {\rm Re}(f'(z)) dx\ – {\rm Im}(f'(z)) dy \\ v(x+dx,y+dy) = v(x,y) + {\rm Re}(f'(z)) dy + {\rm Im}(f'(z)) dx. \end{array}\right.$$ This leads to $$\left\{ \begin{array}{l} \displaystyle \frac{\partial u}{\partial x}(x,y) = {\rm Re}(f'(z)) \\ \displaystyle \frac{\partial u}{\partial y}(x,y) = \ – {\rm Im}(f'(z)) \\ \displaystyle \frac{\partial v}{\partial x}(x,y) = {\rm Im}(f'(z)) \\ \displaystyle \frac{\partial v}{\partial y}(x,y) = {\rm Re}(f'(z)). \end{array}\right.$$
This in turn leads to the Cauchy-Riemann equations \(\displaystyle \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}\) and \(\displaystyle \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}\), which are essentially necessary and sufficient conditions to be holomorphic. Thus holomorphic functions correspond to differentiable functions on \(\mathbb{R}^2\) with some equal partial derivatives. These equations are key to obtaining the Cauchy residue formula.
Contour integral with no poles. We first consider a contour integral over a contour \(\gamma\) enclosing a region \(\mathcal{D}\) where the function \(f\) is holomorphic everywhere. The contour \(\gamma\) is defined as a differentiable function \(\gamma: [0,1] \to \mathbb{C}\), and the integral is equal to $$\oint_\gamma f(z) dz = \int_0^1 \!\!f(\gamma(t)) \gamma'(t) dt = \int_0^1 \!\![u(x(t),y(t)) +i v(x(t),y(t))] [ x'(t) + i y'(t)] dt,$$ where \(x(t) = {\rm Re}(\gamma(t))\) and \(y(t) = {\rm Im}(\gamma(t))\). By expanding the product of complex numbers, it is thus equal to $$\int_0^1 [ u(x(t),y(t)) x'(t) \ – v(x(t),y(t))y'(t)] dt +i \int_0^1 [ v(x(t),y(t)) x'(t) +u (x(t),y(t))y'(t)] dt,$$ which we can rewrite in compact form as (with \(dx = x'(t) dt\) and \(dy = y'(t)dt\)): $$\oint_\gamma ( u \, dx\ – v \, dy ) + i \oint_\gamma ( v \, dx + u \, dy ).$$ We can then use Green’s theorem because our functions are differentiable on the entire region \(\mathcal{D}\) (the set “inside” the contour), to get $$\oint_\gamma ( u \, dx\ – v \, dy ) + i \oint_\gamma ( v \, dx + u \, dy ) =\ – \int\!\!\!\!\int_\mathcal{D} \! \Big( \frac{\partial v}{\partial x} + \frac{\partial u}{\partial y} \Big) dx dy \ – i \!\! \int\!\!\!\!\int_\mathcal{D} \!\Big( \frac{\partial u}{\partial x} – \frac{\partial v}{\partial y} \Big) dx dy.$$ Thus, because of the Cauchy-Riemann equations, the contour integral is always zero within the domain of differentiability of \(f\). Note that this extends to piecewise smooth contours \(\gamma\).
Circle and rational functions. For a circle contour of center \(\lambda \in \mathbb{C}\) and radius \(r\), we have, with \(\gamma(t) = \lambda + re^{ 2i \pi t}\): $$\oint_{\gamma} \frac{dz}{(z-\lambda)^k} =\int_0^{1} \frac{ 2r i \pi e^{2i\pi t}}{ r^k e^{2i\pi kt}}dt= \int_0^{1} r^{1-k} i e^{2i\pi (1-k)t} dt,$$ which is equal to zero if \(k \neq 1\), and to \(\int_0^{1} 2i\pi dt = 2 i \pi\) for \(k =1\). Thus, for a function with a series expansion, the Cauchy residue formula is true for the circle around a single pole, because only the term in \(\frac{1}{z-\lambda}\) contributes.
No dependence on the contour. Now that the Cauchy formula is true for the circle around a single pole, we can “deform” the contour below to a circle.
This can be done considering two contours \(\gamma_1\) and \(\gamma_2\) below with no poles inside, and thus with zero contour integrals, and for which the integrals along the added lines cancel.
This “shows” that the integral does not depend on the contour, and so in applications we can be quite liberal in the choice of contour. Note that similar constructions can be used to take into account several poles.
Before going to the spectral analysis of matrices, let us explore some cool choices of contours and integrands, and (again!) some positive definite kernels.
Classical examples
The Cauchy residue theorem can be used to compute integrals, by choosing the appropriate contour, looking for poles and computing the associated residues. Here are classical examples, before I show applications to kernel methods. See more examples in http://residuetheorem.com/, and many in [11].
Fourier transforms. For \(\omega>0\), we can compute \( \displaystyle \int_{-\infty}^\infty \!\! f(x) e^{ i \omega x} dx\) for holomorphic functions \(f\) by integrating on the real line and a big upper circle as shown below, with \(R\) tending to infinity (so that the contribution of the half-circle goes to zero because of the exponential term). This leads to \(2i \pi\) times the sum of all residues of the function \(z \mapsto f(z) e^{ i \omega z}\) in the upper half plane. See an example below related to kernel methods.
Trigonometric integrals. For holomorphic functions \(Q\), we can compute the integral \(\displaystyle \int_0^{2\pi} \!\!\! Q(\cos \theta, \sin \theta) d\theta\). Indeed, letting \(f(z) = \frac{1}{iz} Q\big( \frac{z+z^{-1}}{2}, \frac{z-z^{-1}}{2i} \big)\), it is exactly equal to the integral on the unit circle. The desired integral is then equal to \(2i\pi\) times the sum of all residues of \(f\) within the unit disk.
For example, when \(Q(\cos \theta, \sin \theta) = \frac{1}{2 + \sin \theta}\), we have \(f(z) = \frac{2}{z^2+4iz-1}\), with a single pole inside the unit circle, namely \(\lambda = i ( \sqrt{3}-2)\), and residue equal to \(-i / \sqrt{3}\), leading to \(\int_0^{2\pi} \frac{d\theta}{2+\sin \theta} = \frac{2\pi}{\sqrt{3}}\).
Series. If the function \(f\) is holomorphic and has no poles at integer real values, and satisfies some basic boundedness conditions, then $$\sum_{n \in \mathbb{Z}} f(n) = \ – \!\!\! \sum_{ \lambda \in {\rm poles}(f)} {\rm Res}\big( f(z) \pi \frac{\cos \pi z}{\sin \pi z} ,\lambda\big).$$ This is a simple consequence of the fact that the function \(z \mapsto \pi \frac{\cos \pi z}{\sin \pi z}\) has all integers \(n \in \mathbb{Z}\) as poles, with corresponding residue equal to \(1\). This is obtained from the contour below with \(m\) tending to infinity.
The same trick can be applied to \(\displaystyle \sum_{n \in \mathbb{Z}} (-1)^n f(n) =\ – \!\!\! \sum_{ \lambda \in {\rm poles}(f)} {\rm Res}\big( f(z) \pi \frac{1}{\sin \pi z} ,\lambda\big).\) See [7, Section 11.2] for more details. Experts will see an interesting link with the Euler-MacLaurin formula and Bernoulli polynomials.
Applications to kernel methods. In non-parametric estimation, regularization penalties are used to constrain real-values functions to be smooth. One such examples are combinations of squared \(L_2\) norms of derivatives. For functions \(f\) defined on an interval \(I\) of the real line, penalties are typically of the form \(\int_I \sum_{k=0}^s \alpha_k | f^{(k)}(x)|^2 dx\), for non-negative weights \(\alpha_0,\dots,\alpha_k\). For these Sobolev space norms, a positive definite kernel \(K\) can be used for estimation (see, e.g., last month blog post).
A classical question is: given the norm defined above, how to compute \(K\)? For \(I = \mathbb{R}\), then this can be done using Fourier transforms as: $$K(x,y) = \frac{1}{2\pi} \int_\mathbb{R} \frac{e^{i\omega(x-y)}}{\sum_{k=0}^s \alpha_k \omega^{2k}} d\omega.$$ This is exactly an integral of the form above, for which we can use the contour integration technique. For example, for \(\alpha_0=1\) and \(\alpha_1=a^2\), we get for \(x-y>0\), one pole \(i/a\) in the upper half plane for the function \(\frac{1}{1+a^2 z^2} = \frac{1}{(1+iaz)(1-iaz)}\), with residue \(-\frac{i}{2a} e^{-(x-y)/a}\), leading to the familiar exponential kernel \(K(x,y) = \frac{1}{2a} e^{-|x-y|/a}\). More complex kernels can be considered (see, e.g., [8, page 277], for \(\sum_{k=0}^s \alpha_k \omega^{2k} = 1 + \omega^{2s}\)).
We can also consider the same penalty on the unit interval \([0,1]\) with periodic functions, leading to the kernel (see [9] for more details): $$ K(x,y) = \sum_{n \in \mathbb{Z}} \frac{ e^{2in\pi(x-y)}}{\sum_{k=0}^s \alpha_k( 2n\pi)^s}.$$ For the same example as above, that is, \(\alpha_0=1\) and \(\alpha_1=a^2\), this leads to an infinite series on which we can apply the Cauchy residue formula as explained above. This leads to, for \(x-y \in [0,1/2]\), \(K(x,y) = \frac{1}{2a} \frac{ \cosh (\frac{1-2(x-y)}{2a})}{\sinh (\frac{1}{2a})}\). We can then extend by \(1\)-periodicity and parity to all \(x-y\). See the detailed computation at the end of the post.
Now that you are all experts in residue calculus, we can move on to spectral analysis.
Spectral analysis of symmetric matrices
We consider a symmetric matrix \(A \in \mathbb{R}^{n \times n}\), with its \(n\) ordered real eigenvalues \(\lambda_1 \geqslant \cdots \geqslant \lambda_n\), counted with their orders of multiplicity, and an orthonormal basis of their eigenvectors \(u_j \in \mathbb{R}^n\), \(j=1,\dots,n\). We have \(A = \sum_{j=1}^n \lambda_j u_j u_j^\top\). When we consider eigenvalues as functions of \(A\), we use the notation \(\lambda_j(A)\), \(j=1,\dots,n\). These functions are always well-defined even when eigenvalues are multiple (this is not the case for eigenvectors because of the invariance by orthogonal transformations).
The key property that we will use below is that we can express the so-called resolvent matrix \((z I – A)^{-1} \in \mathbb{C}^{n \times n}\), for \(z \in \mathbb{C}\), as: $$ (z I- A)^{-1} = \sum_{j=1}^n \frac{1}{z-\lambda_j} u_j u_j^\top. $$ The dependence on \(z\) of the form \( \displaystyle \frac{1}{z- \lambda_j}\) leads to a nice application of Cauchy residue formula.
Assuming the \(k\)-th eigenvalue \(\lambda_k\) is simple, we consider the contour \(\gamma\) going strictly around \(\lambda_k\) like below (for \(k=5\)).
We consider integrating the matrix above, which leads to: $$ \oint_\gamma
(z I- A)^{-1} dz = \sum_{j=1}^m \Big( \oint_\gamma \frac{1}{z – \lambda_j} dz \Big) u_j u_j^\top
= 2 i \pi \ u_k u_k^\top $$ using the identity \(\displaystyle \oint_\gamma \frac{1}{z – \lambda_j} dz = 1\) if \(j=k\) and \(0\) otherwise (because the pole is outside of \(\gamma\)). We thus obtain an expression for projectors on the one-dimensional eigen-subspace associated with the eigenvalue \(\lambda_k\).
With simple manipulations, we can also access the eigenvalues. Indeed, we have: $$ \oint_\gamma
(z I- A)^{-1} z dz = \sum_{j=1}^m \Big( \oint_\gamma \frac{z}{z – \lambda_j} dz \Big) u_j u_j^\top
= 2 i \pi \lambda_k u_k u_k^\top, $$ and by taking the trace, we obtain $$ \oint_\gamma
{\rm tr} \big[ z (z I- A)^{-1} \big] dz = \lambda_k. $$ The key benefit of these representations is that when the matrix \(A\) is slightly perturbed, then the same contour \(\gamma\) can be used to enclose the corresponding eigenvalues of the perturbed matrix, and perturbation results are simply obtained by taking gradients within the contour integral. Note that several eigenvalues may be summed up by selecting a contour englobing more than one eigenvalues.
Gradients of eigenvalues
The expression with contour integrals allows to derive simple formulas for gradients of eigenvalues. These can be obtained by other means [5], but using contour integrals shows that this is simply done by looking at the differential of \((z I – A)^{-1}\) and integrating it. The central component is the following expansion, which is a classical result in matrix differentiable calculus, with \(\|\Delta\|_2\) the operator norm of \(\Delta\) (i.e., its largest singular value): $$
(z I- A – \Delta)^{-1} = (z I – A)^{-1} + (z I- A)^{-1} \Delta (z I- A)^{-1} + o(\| \Delta\|_2). $$ Note here that the asymptotic remainder \(o(\| \Delta\|_2)\) can be made explicit.
By expanding the expression on the basis of eigenvectors of \(A\), we get $$
z (z I- A – \Delta)^{-1} – z (z I- A)^{-1} = \sum_{j=1}^n \sum_{\ell=1}^n u_j u_\ell^\top \frac{ z \cdot u_j^\top \Delta u_\ell}{(z-\lambda_j)(z-\lambda_\ell)} + o(\| \Delta \|_2).
$$ Taking the trace, the cross-product terms \({\rm tr}(u_j u_\ell^\top) = u_\ell^\top u_j\) disappear for \(j \neq \ell\), and we get: $$
{\rm tr} \big[ z (z I – A – \Delta)^{-1} \big] – {\rm tr} \big[ z (z I – A)^{-1} \big]= \sum_{j=1}^n \frac{ z \cdot u_j^\top \Delta u_j}{(z-\lambda_j)^2} + o(\| \Delta \|_2).
$$ This leads to, by contour integration:
$$
\lambda_{k}(A+\Delta) -\lambda_k(A)
=
\frac{1}{2i \pi} \oint_\gamma \Big[
\sum_{j=1}^n \frac{ z \cdot u_j^\top \Delta u_j}{(z-\lambda_j)^2} \Big] dz + o(\| \Delta \|_2).
$$ By keeping only the pole \(\lambda_k\) which is inside the contour \(\gamma\), we get $$ \lambda_{k}(A+\Delta) -\lambda_k(A)
=
\frac{1}{2i \pi} \oint_\gamma \Big[
\frac{ z \cdot u_k^\top \Delta u_k}{(z-\lambda_k)^2} \Big] dz + o(| \Delta |_2) \
= u_k^\top \Delta u_k + o(\| \Delta \|_2),
$$ using the identity \(\displaystyle
\oint_\gamma \frac{z dz}{(z – \lambda_k)^2} dz =
\oint_\gamma \Big( \frac{\lambda_k}{(z – \lambda_k)^2} + \frac{1}{z – \lambda_k} \Big) dz = 1\).
Thus the gradient of \(\lambda_k\) at a matrix \(A\) where the \(k\)-th eigenvalue is simple is simply \( u_k u_k^\top\), where \(u_k\) is a corresponding eigenvector. Note that this result can be simply obtained by the simple (rough) calculation: if \(x\) is a unit eigenvector of \(A\), then \(Ax =\lambda x\), and \(x^\top x = 1\), leading to \(x^\top dx = 0\) and \(dA\ x + A dx = d\lambda \ x + \lambda dx\), and by taking the dot product with \(x\), \(d\lambda = x^\top dA\ x + x^\top A dx = x^\top dA \ x + \lambda x^\top dx = x^\top dA \ x\), which is the same result. However, this reasoning is more cumbersome, and does not lead to neat approximation guarantees, in particular in the extensions below.
Other perturbation results
Given the gradient, other more classical perturbation results could de derived, such as Hessians of eigenvalues, or gradient of the projectors \(u_k u_k^\top\). Here I derive a perturbation result for the projector \(\Pi_k(A)=u_k u_k^\top\), when \(\lambda_k\) is a simple eigenvalue. Using the same technique as above, we get: $$ \Pi_k(A+\Delta )\ – \Pi_k(A) = \frac{1}{2i \pi} \oint_\gamma (z I- A)^{-1} \Delta (z I – A)^{-1}dz + o(\| \Delta\|_2),$$ which we can expand to the basis of eigenvectors as $$ \frac{1}{2i \pi} \oint_\gamma \sum_{j=1}^n \sum_{\ell=1}^n u_j u_j^\top \Delta u_\ell u_\ell^\top \frac{ dz}{(z-\lambda_\ell) (z-\lambda_j) } + o(\| \Delta\|_2).$$ We can then split in two, with the two terms (all others are equal to zero by lack of poles within \(\gamma\)): $$ \frac{1}{2i \pi} \oint_\gamma \sum_{j \neq k} u_j^\top \Delta u_k ( u_j u_k^\top + u_k u_j^\top) \frac{ dz}{(z-\lambda_k) (z-\lambda_j) }= \sum_{j \neq k} u_j^\top \Delta u_k ( u_j u_k^\top + u_k u_j^\top) \frac{1}{\lambda_k – \lambda_j} $$ and $$\frac{1}{2i \pi} \oint_\gamma u_k^\top \Delta u_k u_k u_k^\top \frac{ dz}{(z-\lambda_k)^2 } = 0 ,$$ finally leading to $$\Pi_k(A+\Delta ) \ – \Pi_k(A) = \sum_{j \neq k} \frac{u_j^\top \Delta u_k}{\lambda_k – \lambda_j} ( u_j u_k^\top + u_k u_j^\top) + o(\| \Delta\|_2),$$ from which we can compute the Jacobian of \(\Pi_k\).
Spectral functions
Spectral functions are functions on symmetric matrices defined as \(F(A) = \sum_{k=1}^n f(\lambda_k(A))\), for any real-valued function \(f\). For \(f(x) = x\), we get back the trace, for \(f(x) = \log x\) we get back the log determinant, and so on. The function \(F\) can be represented as $$F(A) = \sum_{k=1}^n f(\lambda_k(A)) = \frac{1}{2i \pi} \oint_\gamma f(z) {\rm tr} \big[ (z I – A)^{-1} \big] dz,$$ where the contour \(\gamma\) encloses all eigenvalues (as shown below).
This representation can be used to compute derivatives of \(F\), by simple derivations, to obtain the same result as [12].
Singular values of rectangular matrices
Singular value decompositions are also often used, for a rectangular matrix \(W \in \mathbb{R}^{n \times d}\). It consists in finding \(r\) pairs \((u_j,v_j) \in \mathbb{R}^{n} \times \mathbb{R}^d\), \(j=1,\dots,r\), of singular vectors and \(r\) positive singular values \(\sigma_1 \geqslant \cdots \geqslant \sigma_r > 0\) such that \(W = \sum_{j=1}^r \sigma_j u_j v_j^\top\) and \((u_1,\dots,u_r)\) and \((v_1,\dots,v_r)\) are orthonormal families.
There are two natural ways to relate the singular value decomposition to the classical eigenvalue decomposition of a symmetric matrix, first through \(WW^\top\) (or similarly \(W^\top W\)). Here it is more direct to consider the so-called Jordan-Wielandt matrix, defined by blocks as $$
\bar{W} = \left( \begin{array}{cc}
0 & W \\
W^\top & 0 \end{array} \right). $$ The matrix \(\bar{W}\) is symmetric, and its non zero eigenvalues are \(+\sigma_i\) and \(-\sigma_i\), \(i=1,\dots,r\), associated with the eigenvectors \(\frac{1}{\sqrt{2}} \left( \begin{array}{cc}
u_i \\ v_i \end{array} \right)\) and \(\frac{1}{\sqrt{2}} \left( \begin{array}{cc}
u_i \\ -v_i \end{array} \right)\).
All necessary results (derivatives of singular values \(\sigma_j\), or projectors \(u_j v_j^\top\) can be obtained from there); see more details, in, e.g., the appendix of [6].
Going beyond
In this post, I have shown various applications of the Cauchy residue formula, for computing integrals and for the spectral analysis of matrices. I have just scratched the surface of spectral analysis, and what I presented extends to many interesting situations, for example, to more general linear operators in infinite-dimensional spaces [3], or to the analysis fo the eigenvalue distribution of random matrices (see a nice and reasonably simple derivation of the semi-circular law from Terry Tao’s blog).
References
[1] Gilbert W. Stewart and Sun Ji-Huang. Matrix Perturbation Theory. Academic Press, 1990.
[2] Adrian Stephen Lewis. Derivatives of spectral functions. Mathematics of Operations Research, 21(3):576–588, 1996.
[3] Tosio Kato. Perturbation Theory for Linear Operators, volume 132. Springer, 2013.
[4] Serge Lang. Complex Analysis, volume 103. Springer, 2013.
[5] Jan R. Magnus. On differentiating eigenvalues and eigenvectors. Econometric Theory, 1(2):179–191, 1985.
[6] Francis Bach. Consistency of trace norm minimization. Journal of Machine Learning Research, 9:1019-1048, 2008.
[7] Joseph Bak, Donald J. Newman. Complex analysis. New York: Springer, 2010.
[8] Alain Berlinet, and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.
[9] Grace Wahba. Spline models for observational data. Society for Industrial and Applied Mathematics, 1990.
[10] Augustin Louis Cauchy, Mémoire sur les intégrales définies, prises entre des limites imaginaires, 1825, re-published in Bulletin des Sciences Mathématiques et Astronomiques, Tome 7, 265-304, 1874.
[11] Dragoslav S. Mitrinovic, and Jovan D. Keckic. The Cauchy method of residues: theory and applications. Vol. 9. Springer Science & Business Media, 1984.
[12] Adrian S. Lewis, and Hristo S. Sendov. Twice differentiable spectral functions. SIAM Journal on Matrix Analysis and Applications 23.2: 368-386, 2001.
Computing the Sobolev kernel
The goal is to compute the infinite sum $$\sum_{n \in \mathbb{Z}} \frac{e^{2i\pi q \cdot n}}{1+(2a \pi n)^2}$$ for \(q \in (0,1/2)\). We consider the function $$f(z) = \frac{e^{i\pi (2q-1) z}}{1+(2a \pi z)^2} \frac{\pi}{\sin (\pi z)}.$$ It is holomorphic on \(\mathbb{C}\) except at all integers \(n \in \mathbb{Z}\), where it has a simple pole with residue \(\displaystyle \frac{e^{i\pi (2q-1) n}}{1+(2a \pi n)^2} (-1)^n = \frac{e^{i\pi 2q n}}{1+(2a \pi n)^2}\), at \(z = i/(2a\pi)\) where it has a residue equal to \(\displaystyle \frac{e^{ – (2q-1)/(2a)}}{4ia\pi} \frac{\pi}{\sin (i/(2a))} = \ – \frac{e^{ – (2q-1)/(2a)}}{4a} \frac{1}{\sinh (1/(2a))}\), and at \(z = -i/(2a\pi)\) where it has a residue equal to \(\displaystyle \frac{e^{ (2q-1)/(2a)}}{4ia\pi} \frac{\pi}{\sin (i/(2a))} =\ – \frac{e^{ (2q-1)/(2a)}}{4a} \frac{1}{\sinh (1/(2a))}\). With all residues summing to zero (note that this fact requires a precise analysis of limits when \(m\) tends to infinity for the contour defined in the main text), we get: $$\sum_{n \in \mathbb{Z}} \frac{e^{2i\pi q \cdot n}}{1+(2a \pi n)^2} =\frac{e^{ – (2q-1)/(2a)}}{4a} \frac{1}{\sinh (1/(2a))}+ \frac{e^{ (2q-1)/(2a)}}{4a} \frac{1}{\sinh (1/(2a))} = \frac{1}{2a} \frac{ \cosh (\frac{2q-1}{2a})}{\sinh (\frac{1}{2a})}.$$
Excellent billet ! Ca permet de se décrasser les neurones. Et quand le lien “expert” est T.Tao tout va bien 🙂
Amazing post as usual. Do you know of any applications for computing the gradient of eigenvalues?
I can imagine applications in all fields where eigenvalues and eigenvectors are used. I personally used gradients of eigenvectors in one of my first papers, to learn the metric for spectral clustering (https://proceedings.neurips.cc/paper/2003/file/d04863f100d59b3eb688a11f95b0ae60-Paper.pdf). Back then I used a smooth approximation by only using a few steps of the power method.
Francis