Scaling Grothendieck

This blog serves as a chronicle of my ideas, thoughts, and projects. Also presented are expositions of topics in mathematics and engineering.


Gradient flows and partitioning problems

Proofs are hard, machine learning is easier, college is over. I still feel dumb, still feel full of myself, here we go. This post about an application of the heat flow equation to ML. Uh... you'll see what I mean. I thought most people wouldn't be exposed to these ideas and it would be nice to share contemporary results on this blog for once.


Gradient Flows

We start with a vector space $X$, and define a map $$E: X \to \mathbb{R}.$$
You're supposed to think of $E$ as “energy”, but it's math, what the heck, it's all abstract. Next, starting at some $u_0 \in X$, we define $$U(0) = u_0$$ and $$\partial_t U(t) = - \nabla E(U(t)),$$
the gradient flow, which is essentially just the continuous version of gradient descent.

The first question that arises is, why gradient flows? Well, a lot of problems in physics are driven by energy, and involve systems that evolve by making the energy smaller. Also, lots of optimization problems are simply finding $$\min_U E(U),$$ and for all such problems, except in the most trivial cases, $\lim_{t \to \infty } U(t)$, or, gradient descent is one's best bet.

The key property for a sensible application of gradient flow is that for $t > s$, we must have $E(U(t)) < E(U(s))$.

Examples

  • Let $E: \mathbb{R}^d \to \mathbb{R}$ as $$E: U \mapsto \frac{1}{2}||U||^2.$$ Then, we have $\nabla E(U) = U$ and the gradient flow $\partial_t U(t) = -U(t)$. In other words, we have the simple first-order differential equation $U'(t) = U(t)$ whose solution is $U(t) = U_0 e^{-t}$. This is a trivial example where the system has exponential decay to $0$ and never gets there. In most cases we care about how long it takes to get to the $\min$.
  • Now consider $E: \mathbb{R} \to \mathbb{R}$ as $$E(U) = |U|.$$ Here, we get $$U(t) = \begin{cases} U_0 - t \text{ if } t \leq U_0 \\0 \text{ if } t > U_0 \end{cases} $$ Here, we arrive at the solution in finite time (of course, assuming $U_0 > 0$).
  • One final example, before we begin the crux of the matter. Consider $$E(U) = \frac{1}{p} |U|^p,$$ where $E: \mathbb{R} \to \mathbb{R}$. Then, we have $U(t) \sim \frac{1}{t^{\frac{1}{p}}}$. Again, the question is, can one solve these problems in a reasonable amount of time.

Heat Flow

Consider the function $U$ over a Caccioppoli set $\Omega$, such that $U_0(x)$ gives the temperature at $x$ and $U(x, t)$ gives the evolution of the temperature with time $t$. Then, consider $$\partial_{t} \int_{\Omega} U(x, t) = \int_{\partial \Omega} \nabla U(x, t) \cdot n(x) dA,$$ which by Stokes' theorem is simply $$\int_{\Omega} \Delta U(x, t),$$ where $\Delta U(x, t)$ is the Laplacian.

Then, shrinking $\Omega$ down to a point, we get $\partial_t U = \Delta U$. The question, then, is, is this a gradient flow? Answer: of course it is, cf. post title.

Now, take it as a claim without a proof (since I don't actually know how to prove this yet (jk, I could probably do it, it's just PDEs)), this is the gradient flow of what's called the Dirichelet energy $$E(U) = \int_{\Omega} |\Delta U(x)|^2 dx.$$
One neat result we can immediately see is that $E(U + \tau V) \sim E(U) + \tau\langle \nabla E(U), V\rangle$ since $$\begin{align} E(U + \tau V) & = \int |\nabla U + \tau \nabla V| ^2\\ & = \int |\nabla U|^2 + \tau \int \nabla U \cdot \nabla V + O(\tau^2),\end{align}$$
and then ignoring the $O(\tau^2)$ term and integrating by parts gives us $$\int |\nabla U|^2 - \tau \int V(x) \cdot \Delta U(x) dx = \int |\nabla U|^2 - \tau \langle V, \Delta U \rangle $$

Thus, to summarize, we have $\nabla E(U) = - \Delta U$ and thus the gradient flow $$\partial_t U = \Delta U.$$

Aside: $$\lim_{\tau \to 0} \frac{E(U + \tau V) - E(U)}{\tau} = \frac{\partial E}{\partial V}$$
is called the first variation of $E$. The first variation itself isn't a gradient, but gradients of energies require both the first variation and an inner product.

The first variation tells you how the energy changes in a single direction $V$, but writing the gradient flow as a dot product lets you compute in any direction $V$.

So far, so good. Now, for the reason you came here. jk no one comes here.

Partitioning problems

Partitioning problems are well knowing in ML. They include K-means clustering, unsupervised image segmenting, etc. You already know all this. An interesting example of using partitioning techniques would be to tackle the problem of gerrymandering, where you want to partition a state such that each district has an equal number of people.

A partition of a region $\Omega$ is $(\Sigma_1, \dots, \Sigma_N)$ such that $$\Sigma_i \bigcap \Sigma_j = \phi$$ $\forall i \neq j$ and $$\bigcup_{j} \Sigma_j = \Omega.$$
The boundaries of the partition are denoted as $\delta \Sigma_{j}$, and the perimeter of the boundary is given as $Per(\partial \Sigma_j) = \int_{\partial \Sigma_j} dA$

Note: The partition is not a vector space, for example the "sum" of two partitions isn't a partition. Also, this set isn't convex and boundaries can be moving. Partitioning is a hard problem, we have very little to work with.

Two class partitioning

We have $\Omega = \Sigma \bigcup \Sigma ^c$, $\partial \Sigma = \partial \Sigma^c$. Then, measure heat flow: $$U_0(x) = \begin{cases}1 \text{ if } x \in \Sigma \\0 \text{ otherwise} \end{cases}$$
Then, we have $$Per(\partial \Sigma) = \frac{1}{\sqrt{\tau}} \int_{\Sigma^c} U(x, \tau) dx$$
The term inside the integral measures how much heat has escaped into the complement. When I first saw this equation, I was confused about the $\frac{1}{\sqrt{\tau}}$ term. Turns out, that's how the heat equation scales with time. That is, slope of temperature with time depends on ${\sqrt{\tau}}$ rather than $\tau$. Huh, the more you know.

And but so now, we have $$\begin{align}\frac{1}{\sqrt{\tau}} \int_{\Sigma^c} U(x, \tau) - U(x, 0) & = \frac{1}{\sqrt{\tau}} \int_{\Sigma^c} \int_{0}^{\tau} \partial_t U(x, s) ds\\ & = \frac{1}{\sqrt{\tau}} \int_{0}^{\tau} \int_{\Sigma^c} \nabla U(x, s) \cdot n(x) dA(x)\end{align}$$

One can show that $$\nabla U(x, s) = \frac{n(x)}{\sqrt{s}} + O(s),$$
and as the time parameter goes to $0$ the $O(s)$ term vanishes. Thus we have $$\nabla U(x, s) \sim \frac{n(x)}{\sqrt{s}},$$ and
$$Per(\partial \Sigma) = \frac{1}{\sqrt{\tau}} \int_{0}^{\tau} \int_{\partial \Sigma} \frac{n(x)}{\sqrt{s}} dA(x) ds = c \cdot \int_{\partial \Sigma} dA.$$


The main problem is that solving the heat equation in high dimensions is a pain in the computational ass since it requires using the Fast Fourier Transform, which has exponential complexity. So yeah, that's that. ¯\(ツ)/¯
Honestly this post was so tedious, maybe because it's 3 in the morning and I've allowed myself to give in to my weariness. Anyway, I'm going to stick to algebraic geometry, from now, forever.