Proximal gradient not only minimizes nonsmooth costs but can define smooth costs with the same minimizers. We call them forward backward envelopes.
This envelope is defined using the problem solved by each step of proximal gradient. Specifically, given functions f(x) and g(x), each update x⁺ of this algorithm at a point x minimizes
f(u) + g(x) + (u - x) • g'(x) + |u - x|^2 / (2a)
with respect to u. The associated envelope ϕₐ is defined by
ϕₐ(x) = minᵤ f(u) + g(x) + (u - x) • g'(x) + |u - x|^2 / (2a)
Using the proximal gradient step x⁺ above, this becomes
ϕₐ(x) = f(x⁺) + g(x) + (x⁺ - x) • g'(x) + |x⁺ - x|^2 / (2a)
Even though the original cost f+g may not be continuous, its forward backward envelope ϕₐ can be smooth. So, quasi-Newton methods (e.g. BFGS, L-BFGS) can be used to minimize the envelope function. These have been shown to converge significantly faster to minimizers of the original nonsmooth cost than proximal gradient.
For some graphic intuition of the envelope shape, check out the animation above. The original nonsmooth function is shown in green and its forward backward envelope is shown in orange.
Cheers,
Howard
⸻
𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗥𝗲𝗮𝗱𝗶𝗻𝗴:
Check out the neat papers "Forward-Backward Quasi-Newton Methods for Nonsmooth Optimization Problems" and "Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone line-search algorithms" by Lorenzo Stella, Panagiotis Patrinos, and Andreas Themelis.
Also, thanks are due to Felipe Antinas for inspiring and giving ideas for this diagram below 🙂. Soon I hope to make a similar one for the Douglas-Rachford Envelope.