Binaries/Code Datasets Open Source Software

Freiburg-Berkeley Motion Segmentation Dataset
Video Segmentation Benchmark
Image Sequences
TEM Dataset
TILDA Textile Texture Database
Training data for Exemplar CNN
Generated Matching Dataset
Training data for chair generation
Stereo Ego-motion Dataset
The "Flying Chairs" Dataset
Scene Flow Datasets
Human Part Segmentation Datasets  

Scene Flow Datasets: FlyingThings3D, Driving, Monkaa

This dataset collection has been used to train convolutional networks in our CVPR 2016 paper A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Here, we make all generated data freely available.

Terms of use

This dataset is provided for research purposes only and without any warranty. Any commercial use is prohibited. If you use the dataset or parts of it in your research, you should cite the aforementioned paper:

  author    = "N.Mayer and E.Ilg and P.H{\"a}usser and P.Fischer and D.Cremers and A.Dosovitskiy and T.Brox",
  title     = "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation",
  booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)",
  year      = "2016",
  note      = "arXiv:1512.02134",
  url       = "http://lmb.informatik.uni-freiburg.de/Publications/2016/MIFDB16"


The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences. For details on the characteristics and differences of the three subsets, we refer the reader to our paper. The following kinds of data are currently available:

Left view Right view Explanation
RGB cleanpass RGB finalpass

RGB stereo renderings: Rendered images are available in cleanpass and finalpass versions (the latter with more realistic—but also more difficult—effects such as motion blur and depth of field). Both versions can be downloaded as lossless PNG or high-quality lossy WebP images.

material segmentation material segmentation

Segmentations: Object-level and material-level segmentation images.

optical flow optical flow

Optical flow maps: The optical flow describes how pixels move between images (here, between time steps in a sequence). It is the projected screenspace component of full scene flow, and used in many computer vision applications.

disparity disparity

Disparity maps: Disparity here describes how pixels move between the two views of a stereo frame. It is a formulation of depth which is independent of camera intrinsics (although it depends on the configuration of the stereo rig), and can be seen as a special case of optical flow.

disparity change disparity change

Disparity change maps: Disparity alone is only valid for a single stereo frame. In image sequences, pixel disparities change with time. This disparity change data fills the gaps in scene flow that occur when one uses only optical flow and static disparity.

motion boundaries motion boundaries

Motion boundaries: Motion boundaries divide an image into regions with significantly different motion. They can be used to better judge the performance of an algorithm at discontinuities.

on request
on request

Raw 3D data: The original source from which we derive disparities, disparity changes, optical flows, and (by extension) motion boundaries. Due to their extremely unwieldy size and the fact that they offer relatively little which is not covered by the other data, we will make these packs available on request.

Camera data: Full intrinsic and extrinsic camera data is available for each view of every stereo frame in our dataset collection.


Example pack

Want to get your feet wet? Wish to check out our data without downloading gigabytes of archives first? Then get our sample pack!
  • Contains three consecutive frames from each dataset, in full resolution!
  • Includes samples from all available data: RGB, disparity, flow, segmentation...
  • Stereo pairs for RGB
  • Forward and backward optical flow
  • Less than 100 megabytes!
  • Handcrafted with love and care, just for you <3

Raw data
RGB images (cleanpass) PNG: .tar (37GB)
WebP: .tar (7.4GB)
PNG: .tar (6.3GB)
WebP: .tar (1.5GB)
PNG: .tar (9.1GB)
WebP: .tar (1.8GB)
RGB images (finalpass) PNG: .tar (43GB)
WebP: .tar (5.7GB)
PNG: .tar (6.1GB)
WebP: .tar (926MB)
PNG: .tar (17GB)
WebP: .tar (2.9GB)
Camera data .tar (15MB) .tar (1.8MB) .tar (3.7MB)
Object segmentation .tar.bz2 (409MB, unzipped 104GB) .tar.bz2 (78MB, unzipped 18GB) .tar.bz2 (83MB, unzipped 34GB)
Material segmentation .tar.bz2 (510MB, unzipped 104GB) .tar.bz2 (170MB, unzipped 18GB) .tar.bz2 (115MB, unzipped 34GB)
Raw 3D view data on request on request on request
Derived data
Disparity .tar.bz2 (87GB, unzipped 104GB) .tar.bz2 (9GB, unzipped 18GB) .tar.bz2 (28GB, unzipped 34GB)
Disparity change .tar.bz2 (116GB, unzipped 208GB) .tar.bz2 (22GB, unzipped 35GB) .tar.bz2 (35GB, unzipped 68GB)
Optical flow .tar.bz2 (311GB, unzipped 621GB) .tar.bz2 (50GB, unzipped 102GB) .tar.bz2 (89GB, unzipped 201GB)
Motion boundaries .tar.bz2 (615MB, unzipped 52GB) .tar.bz2 (206MB, unzipped 8.6GB) .tar.bz2 (106MB, unzipped 17GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).
For batch downloads, here is a list of URLs. MD5 checksums for all files can be found here.

Data formats and organization

  1. Use bunzip2 to decompress .tar.bz2 files, and use "tar xf <file.tar>" to unpack .tar archives. Caution, some archives expand to massively larger sizes.

  2. The RGB image packs are available in both cleanpass and finalpass settings. The cleanpass setting includes lighting and shading effects, but no additional effects. In contrast, finalpass images also contain motion blur and defocus blur.
    All RGB images are provided as both lossless PNG and lossy WebP (used in our experiments). WebP images are compressed using a quality setting of 95%, using the publicly available source code (version 0.5.0). WebP offers 80-90% smaller files than PNG, with virtually indistinguishable results.

  3. The virtual imaging sensor has a size of 32.0mmx18.0mm.
    Most scenes use a virtual focal length of 35.0mm. For those scenes, the virtual camera intrinsics matrix is given by

    fx=1050.0 0.0 cx=479.5
    0.0 fy=1050.0 cy=269.5
    0.0 0.0 1.0

    where (fx,fy) are focal lengths and (cx,cy) denotes the principal point.

    Some scenes in the Driving subset use a virtual focal length of 15.0mm (the directory structure describes this clearly). For those scenes, the intrinsics matrix is given by

    fx=450.0 0.0 cx=479.5
    0.0 fy=450.0 cy=269.5
    0.0 0.0 1.0

    Please note that due to Blender's coordinate system convention (see below), the focal length values (fx,fy) really should be negative numbers. Here we list the positive numbers because in practise this catch is only important when working on the raw 3D data.

  4. All data comes in a stereo setting, i.e. there are "left" and "right" subfolders for everything. The obligatory exception to this rule is the camera data where everything is stored in a single (small) text file per scene.

  5. Camera extrinsics data is stored as follows: Each camera_data.txt file contains the following entry for each frame of its scene:

    Frame <frame_id>\n frame_id is the frame index. All images and data files for this frame carry this name, as a four-digit number with leading zeroes for padding.
    L T00 T01 T02 T03 T10 ... T33\n Camera-to-world 4x4 matrix for the left view of the stereo pair in row-major order, i.e. (T00 T01 T02 T03) encodes the uppermost row from left to right.
    R T00 T01 T02 T03 T10 ... T33\n Ditto for the right view of the stereo pair.
    \n (an empty line)
    Frame <frame_id>\n (the next frame's index)
    ... (and so on)

    The camera-to-world matrices T encode a transformation from camera-space to world-space, i.e. multiplying a camera-space position column vector p_cam with T yields a world-space position column vector p_world = T*p_cam.

    The coordinate system is that of Blender: positive-X points to the right, positive-Y points upwards, positive-Z points "backwards", from the scene into the camera (left-hand rule with middle finger=x, index finger=y, thumb=z).

    The right stereo view's camera is translated by 1.0 Blender units, with no rotation relative to the left view's camera.

  6. The image origin (x=0,y=0) is located in the upper left corner, i.e. a flow vector of (x=10,y=10) points towards the lower right.

  7. Non-RGB data is provided in either PFM (single channel or three channels) or PGM format, depending on value range and dimensionality. While PFM is a defined standard (think "PGM/PPM for non-integer entries"), it is not widely supported. For C++, we recommend the very excellent CImg library. For Python+NumPy, see this code snippet.

    • Disparity is a single-channel PFM image. Note that disparities for both views are stored as positive numbers.

    • Disparity change is a single-channel PFM image.

    • Optical flow is a three-channel PFM image. Layer 0/1 contains the flow component in horizontal/vertical image direction, while layer 2 is empty (all zeroes).

    • Object and material segmentation are single-channel PFM images each. All indices are integer numbers.

    • Motion boundaries are PGM images. Background is 0, boundary pixels are 255.

  8. For data which depends on the direction of time (optical flow, disparity change, motion boundaries), we provide both forward and backward versions.

  9. Please note that the frame ranges differ between scenes and datasets:

    • FlyingThings3D: 6–15 for every scene
    • Driving: 1–300 or 1–800
    • Monkaa: Different for every scene
  10. The FlyingThings3D dataset is split into "TEST" and "TRAIN" parts. These two parts differ only in the assets used for rendering: All textures and all 3D model categories are entirely disjoint. However, both parts exhibit the same structure and characteristics. The "TRAIN" part is 5 times larger than the "TEST" part.

    Each of these parts is itself split into three subsets A, B, and C. The same rendering asset pools were used for each subset, but the object and camera motion paths are generated with different parameter settings. As a result, motion characteristics are not uniform across subsets.

    TODO: More details.

Frequently asked questions (FAQ)

  • Q: I want to use depth data, but you only provide disparity!
    A: Depth is perfectly equivalent to disparity as long as you know the focal length of the camera and the baseline of the stereo rig (both are given above). You can convert disparity to depth using this formula: depth = focallength*baseline/disparity

  • Q: I would like to render my own data. Can I get the Blender scene files?
    A: Sorry, but we do not have the necessary licenses for all the assets that we use. We cannot redistribute them.

Version history and changelog

  • 13Oct2016: Added sample pack, FAQ section

  • 05Oct2016: Updated md5 checksums for camera extrinsics files

  • 17Aug2016: Uploaded fixed camera extrinsics

  • 02May2016: Added md5 checksums

  • 28Apr2016: Fixed download links for Driving, Monkaa subsets

  • 25Apr2016: Fixed intrinsics matrices

  • Version 0.9 (April 2016): Initial release of most data. Version used for training the CVPR 2016 paper.


For questions, please contact Nikolaus Mayer ().