Continuous and Discrete Perspectives
How Euler’s Methods relate to Gradient Descent and Proximal Point
Differential equations are surprisingly useful
in optimization. Their discretizations yield
famous algorithms. Why care?
The analysis of continuous time ODEs
can be much simpler than
corresponding discrete versions.
For example, a convergence rate can be
established in a few sentences.
Moving from discrete to continuous
can also reveal more general ODE classes,
which can translate to novel
optimization algorithms.
Often, I find people surprised when
I remark about such connections.
To this end, below a few slides are linked.
These show the tip of the iceberg.
Enjoy!
⸻
Further reading:
"A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights" by Su, Boyd, and Candes.
"A Lyapunov Analysis of Accelerated Methods in Optimization" by Wilson, Recht, and Jordan.