Gradient descent is more than an update rule. Seen as an operator, it becomes a system component, enabling composable solvers.
By viewing T(x) = x - 𝛼 ∇f(x) as an operator, we can show:
🔹 fixed points = minimizers of f
🔹 f is L-smooth and convex ⇒ T is averaged
🔹 f is also strongly convex ⇒ T is a contraction
Upon showing T has these properties, we can immediately apply general fixed point convergence results, which include convergence rates.
Future posts will dive into the combination of this operator with others in splitting techniques. But first, check out the slides below for a lecture on the above listed properties of gradient descent. 👇
Enjoy!
Discussion about this post
No posts