Motion Estimation and Optical Flow

We permanently work on improving the quality of optical flow estimation and other motion estimation methods, such as point tracking or scence flow estimation. As optical flow is the corner stone of all video analysis, we believe that even the smallest improvement has large effects on the overall performance of video related methods. In the past we made a couple of important contributions to the field, the lastest and most revolutionary one being a convolutional network that can predict high accuracy optical flow almost in real-time.

High accuracy optical flow estimation based on a theory for warping

Code

Thomas Brox, A. Bruhn, N. Papenberg, J. Weickert

European Conference on Computer Vision (ECCV), Springer, LNCS, Vol.3024: 25-36, May 2004

This work largely reshaped the field of optical flow estimation. It contributes a theoretical justification for earlier warping approaches, an effective numerical scheme, the use of gradient constancy, and the use of the total variation norm in optical flow estimation. It is the most cited paper from our group.

Large displacement optical flow: descriptor matching in variational motion estimation

Publisher's Link/>

Code

Thomas Brox, J. Malik

IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3):500-513, 2011

We approached a general problem in optical flow estimation, that of large motion, by integrating combinatorial optimization ideas into a variational model.

Dense point trajectories by GPU-accelerated large displacement optical flow

Code

N. Sundaram, Thomas Brox, K. Keutzer

European Conference on Computer Vision (ECCV), Springer, LNCS, Sept. 2010

High quality optical flow allows to build point trackers that can track far more points than previous trackers, offer higher accuracy and, thanks to large displacement optical flow, can track even fast moving body limbs.

Stereoscopic scene flow computation for 3D motion understanding

Publisher's Link

A. Wedel, Thomas Brox, T. Vaudrey, C. Rabe, U. Franke, D. Cremers

International Journal of Computer Vision, 95(1):29-51, 2011

We also worked on stereo videos, where a combination of optical flow and disparity estimation allows to estimate 3D motion vectors in 3D space.

FlowNet: Learning Optical Flow with Convolutional Networks

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, P. Häusser, C. Hazırbaş, V. Golkov, P. Smagt, D. Cremers, Thomas Brox

IEEE International Conference on Computer Vision (ICCV), Dec 2015.

The latest finding is that optical flow estimation can be formulated as a learning problem. A convolutional network is trained end-to-end to predict the optical flow field for two input images. The network achieves competitive accuracy on Sintel and KITTI datasets at frame rates of 5 to 10 fps.

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Eddy Ilg, Nikolaus Mayer, T. Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

The improvement of our earlier FlowNet yields state-of-the-art accuracy while being orders of magnitude faster than competing optical flow methods. FlowNet 2.0 puts special emphasize on the synthesized data used for training and uses stacking of multiple networks to refine the optical flow computed at earlier stages.

Demo video of FlowNet

Demo videos on large displacement optical flow

The following demo videos show the optical flow computed on a couple of sequences. The color code for interpreting the direction of the flow vectors is shown on the left. For instance, green means a pixel is moving to the left. Videos are shown in the original resolution and frame rate. There are quantization artifacts from video compression even in the high quality material. The uncompressed results are as smooth as shown in the still image.

Shot from Miss Marple: A pocket full of rye

Low quality video (200kB) High quality video (4MB) Slow motion (4MB) Input images (1MB)

Another shot from Miss Marple: A pocket full of rye

Low quality video (500kB) High quality video (13MB) Slow motion (13MB) Input images (5MB)

Low quality video (1MB) High quality video (36MB) Slow motion (36MB) Input images (173MB)

Low quality video (1.6MB) High quality video (last part only, 12MB) Slow motion (last part only, 12MB) Input images (21MB)