University of Twente Student Theses
Saddle-to-saddle behaviour in shallow diagonal linear neural networks
Kerk, Y. van de (2024) Saddle-to-saddle behaviour in shallow diagonal linear neural networks.
PDF
1MB |
Abstract: | Neural networks are usually trained with the gradient descent method, which leads to minima of the loss function that generalize well to new data. However, it is still unclear why this method works so well. It has been observed that for a neural network with small weight initialization, the minimisation of the loss function makes very little progress for some time, until a sharp transition to a lower value occurs, which corresponds to a new feature that is learned. This incremental learning process corresponds to the jumping from saddle to saddle of the loss function. The authors of (Pesme and Flammarion, 2023) describe this saddle-to-saddle behaviour of the gradient flow in a shallow diagonal linear neural network in the limit of vanishing initialization without restrictive assumptions on the data. Motivated by this, this study determines the saddle points of the loss function and investigates the influence of these points on the minimum solution found with gradient descent in a shallow diagonal linear neural network with small weight initialization. We found that the equilibrium points of the gradient flow correspond to sparse vectors that minimize the loss function over its non-zero coordinates. Furthermore, we conclude that the direction of the jumps does not correspond linearly with the expected direction indicated by the eigenvector corresponding to the largest positive eigenvalue of the linearized gradient flow. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 31 mathematics |
Programme: | Applied Mathematics BSc (56965) |
Link to this item: | https://purl.utwente.nl/essays/100681 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page