A Programming Language for Differentiable Image Processing
ABOUT THE PROJECT
At a glance
We propose to develop a programming system which allows easy expression of end-to-end trainable image processing algo- rithms, while compiling to high performance implementations in modern heterogeneous and specialized processors, from embedded image processing DSPs (e.g., Qualcomm HVX) to FPGAs to large GPUs. We build on the Halide language and com- piler, extending it with native derivative and gradient transformations via automatic differentiation (AD). Unlike traditional AD systems, the exact strategy for generating derivative and adjoint code will be controllable by a language of composable primitives, in the spirit of Halide's language of "schedules" for controlling code generation. This will allow easy expression of portable, high-performance implementations of novel neural network layers, automatically tunable image and array pro- cessing programs, and optimizers for inverse reconstruction problems, all of which will help advance the state of the art in computational imaging and vision systems for autonomous driving from prototypes through deployment.
Motivation & Proposed Work
End-to-end learning and optimization are driving rapid progress in the state of the art for many data-driven problems, espe- cially in visual computing. Key to this progress is the surprising power of gradient-based optimization methods to find reason- able solutions to the nonlinear objectives that result from parameterizing long and complex computational pipelines like deep neural networks. These pipelines are not limited to simple mathematical functions, but can be arbitrary programs.
Modern machine learning frameworks are therefore built around programming systems which provide automatic differ- entiation as their central feature, to derive gradients of arbitrary programs in order to optimize through them [2,3]. These systems are increasingly appealing to developers of applications far beyond traditional neural networks, wherever easily tak- ing gradients of programs is required. In image processing and computational photography, many state-of-the-art methods now require gradients to optimize parameters or solve inverse/reconstruction problems through entire complex image pro- cessing algorithms [4,5,6,7]. However, the low level of abstraction offered by existing machine learning frameworks when used as general programming languages makes it hard to express more complex programs. These systems were largely designed to describe coarse-grained compositions of pre-made aggregate operations (neural network layers), so they make it difficult to define new computational operations.
At the same time, there is not a single way to compute gradients of a given function. Rather, automatic differentiation admits a large space of possible implementation choices (e.g., forward vs. reverse-mode; balancing storage/bandwidth and recomputation with checkpointing; polynomial factorization to control working set size; scatter-to-gather conversion when backpropagating through stencil computations). The AD logic in current frameworks is simple, optimized for the common case in neural network backpropagation where little care is needed at the coarse granularity of composing large aggregate operations; within these operations, fast derivative implementations must still be hand-written and carefully optimized. As a result, the efficiency with which these systems can compute gradients suffers when they are used to describe large composi- tions of many small and heterogeneous computations, as is required for complex programs in new domains like image pro- cessing. Because of the computational intensity of iterative optimization algorithms used for end-to-end learning, the perfor- mance of the gradient computations is critical. This is multiplied by the computational cost inherent in high quality image processing algorithms, where standard resolutions are tens of megapixels—orders of magnitude higher than are commonly used in popular vision tasks like object detection.
We propose to develop a new programming system for differentiable image processing computations. We will build on the Halide language and compiler, extending it with native derivative and gradient computation as a first-class language feature. The basic AD transformation rules, while sometimes subtle for general programs, are relatively well-known and low risk. The key challenges are in performance and integration with the Halide programming model.
First, Halide provides a powerful abstraction for portable high-performance image processing code by decoupling the functional algorithm specification from its "schedule," which controls its mapping into physical execution. In this spirit, we aim not to simply implement a single heuristic reverse-mode adjoint transformation for Halide programs, but a whole space of equivalent transformations, controllable by a compositional language of transformation choices. Choices like checkpointing are deeply related to concepts in Halide’s existing scheduling language. As with Halide schedules, very different choices will be most efficient for different programs, and on different hardware. This language will be usable both by human experts, and by automatic compilers and autotuners, to concisely describe optimized derivative and adjoint computations.
Second, because these AD transformations generate new code, not directly written by the programmer, existing primi- tives cannot refer to them for scheduling. Black box automatic scheduling is a natural fallback , but we believe it is also possible to use the well-defined relationships between the forward and adjoint forms to mechanically derive good schedules for the gradient computation from existing schedules for the forward form.
Finally, in the same way that expert programmers often need manual control over schedules to maximize performance, expert ML programmers often need the ability to precisely control key parts of the derived gradient code (to perform "gradient surgery"). This requires language support for selective control over gradient approximations in the context of an overall auto- matic transformation.
We aim to demonstrate the power of differentiable image processing for applications from end-to-end neural network training through advanced image processing operators, to automatically tuning the parameters of a camera pipeline on a corpus of examples, to inverse image reconstruction via end-to-end optimization. We will evaluate the effectiveness of our system in terms of both the simplicity and the performance of the resulting implementations, compared to existing frameworks and hand-coded derivatives. If successful, the resulting system will also be applicable to other programs cleanly expressible in the array- and stencil-oriented Halide programming model, from linear algebra and machine learning computations, to 3D physics simulations.
I plan to make the results of this work widely available, both by publishing it in a top conference such as SIGGRAPH, where most of my previous work has appeared, and by releasing the developed software as open source under a permissive (BSD) license, as all of my recent work has been. This will include both actively contributing the core implementation to the mainline Halide project, and releasing a suite of motivating applications both in our language, and in existing systems like TensorFlow and PyTorch for comparison. This will all be released to BDD.
1. J. Ragan-Kelley, A. Adams, D. Sharlet, Z. Stern, C. Barnes, S. Paris, M. Levoy, S. Amarasinghe, F. Durand. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. CACM Research High- lights, Dec. 2017.
2. M. Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.
3. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.
4. J. T. Barron and B. Poole. The Fast Bilateral Solver. ECCV 2015.
5. F. Heide, S. Diamond, M. Nießner, J. Ragan-Kelley, W. Heidrich, G. Wetzstein. ProxImaL: Efficient Image Optimization using Proximal Algorithms. SIGGRAPH 2016.
6. F. Heide et al. FlexISP: A Flexible Camera Image Processing Framework. SIGGRAPH Asia 2014.
7. M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, F. Durand. Deep Bilateral Learning for Real-Time Image Enhancement. SIG- GRAPH 2017.
8. R. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, K. Fatahalian. Automatically Scheduling Halide Image Processing Pipe- lines. SIGGRAPH 2016.
|Jonathan Ragan-Kelley||image processing, automatic differentiation, optimization, programming language|