In the previous post, we showed (or at least tried to!) how the inherent noise of the stochastic gradient descent algorithm (SGD), in the context of modern overparametrised architectures, is structured and carries two important features: (i) it vanishes for interpolating solutions and (ii) it belongs to a low-dimensional manifold spanned by the gradients. Building…