Gotta Go Fast 🏃🏻💨💨 when generating data with score-based models

Ke Li, Rémi Piché-Taillefer, Tal Kachman, Ioannis Mitliagkas and me just released our new paper Gotta Go Fast When Generating Data with Score-Based Models. Score-based generative models are a new method to generate fake data from an unknown distribution. These models, although powerful, are extremely slow 🐌 at generating data. Our paper show how to generate data as fast as possible with score-based models while keeping the same quality of output. Generating data can be done by solving a Stochastic Differential Equation (SDE) so we tried a wide variety of SDE solvers in order to improve speed. Most of these solvers did not converge or were too slow. We thus decided to build a well-designed adaptive SDE solver that would give us both speed and quality.

Results

Key results:

Our SDE solver obtains optimal performance at a much faster speed than other methods
It generally obtains much higher quality images than competing methods given the same computational budget
it requires no step-size/schedule tuning
it works on all types of SDEs

How to make an SDE solver go fast ?

To be fast, an SDE solver must use adaptive step sizes, which means taking steps as large as possible to end up at the finish line as quickly as possible, but without ending off track (due to numerical errors)! How large the step size can be is determined by comparing two solving methods: a lower-order method (faster, less precise) and a higher-order method (slower, more precise). We estimate the SDE at time t+h, where h is the proposed step size. Suppose the estimation made by the lower-order method is close enough to the estimation made by the higher-order method. In that case, it is doing an excellent job, so we accept the lower-order estimation at t+h. We can then dynamically update the step size proportionally to how small the error between the lower-order and higher-order methods is. This is the basic idea behind fast adaptive ODE/SDE solvers. Because they need a lower-order method and a higher-order method , adaptive solvers need at least two function evaluations (one for the lower-order method and one extra for the higher-order method).

With two function evaluations, the step size needs to be at least twice as big to go faster than a fixed step size method with only one function evaluation. If we take more accurate solvers (for example, RK45 in ODEs, which requires 5 function evaluations), we need more function evaluations, which slows us down (by a factor of the number of function evaluations). However, we can handle much larger step sizes due to the improved precision of the chosen lower-order method; at least, this is what is supposed to happen. However, the idea that high-order methods allow you to take much bigger step sizes is only really true for ODEs; it is generally not the case for SDEs. This is because, even if you have a very accurate solver, if you end up adding a massive bunch of Gaussian noise (which you often do with SDEs), most of your improved accuracy is drown by the sea of Gaussian noise 🌊 and you end up with much less gain in accuracy than you would have hoped for (╯°□°）╯︵ ┻━┻. This means that high-order SDE solvers, which require many function evaluations, are rarely worth the trouble if your goal is more speed (over simply taking smaller steps with a less accurate solver). This is the sad reality of SDEs; they are a lot more troublesome than ODEs.

This suggests that we should only do two function evaluations to get adaptive step sizes, but no higher due to the increase in computation. We propose using Euler-Maruyama (EM) as the lower-order method and Improved Euler as the higher-order method. Although this allows us to generate data fast, EM is not accurate enough to handle large step sizes, and we end up with lower-quality images. We solve this issue with one simple trick: instead of taking the sample proposed by the lower-order method (EM), we instead take the sample proposed by higher-order method (Improved Euler). Technically, we have not tested the error of Improved Euler against a higher-order method, but if EM and Improved Euler are close enough, why not take the better one (Improved Euler)? Using this trick makes a huge difference in terms of quality and requires no additional function evaluations.

A few other key choices make us go faster, like taking the L-2 norm over the L-infinity norm; please see the paper for more details.

Conclusion

Our SDE solver allows us to generate data very fast and of high quality. Using our approach almost always leads to higher quality results at a faster speed than competing solvers for a given score-based model. Nevertheless, data generation remains relatively slow 🐌 . Instead of taking between 30min and 1h to generate a mini-batch of images, it now takes between 2min and 9min with our solver. Since we only change the data generation algorithm, we could still benefit from other score-based model training/architecture improvements. It would be interesting to try our SDE solver with various training methods, which make score-based models faster (1, 2).