Home
Uni-Logo
 

Datasets

Binaries/Code Datasets Open Source Software

Freiburg-Berkeley Motion Segmentation Dataset
Video Segmentation Benchmark
Image Sequences
TEM Dataset
TILDA Textile Texture Database
Training data for Exemplar CNN
Generated Matching Dataset
Training data for chair generation
Stereo Ego-motion Dataset
Optical Flow Datasets: "Flying Chairs", "ChairsSDHom"
Scene Flow Datasets
Human Part Segmentation Datasets  
Rendered Handpose Dataset
Pedestrian Zone Scene


Scene Flow Datasets: FlyingThings3D, Driving, Monkaa

This dataset collection has been used to train convolutional networks in our CVPR 2016 paper A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Here, we make all generated data freely available.



Terms of use

This dataset is provided for research purposes only and without any warranty. Any commercial use is prohibited. If you use the dataset or parts of it in your research, you should cite the aforementioned paper:


@InProceedings{MIFDB16,
  author    = "N. Mayer and E. Ilg and P. H{\"a}usser and P. Fischer and D. Cremers and A. Dosovitskiy and T. Brox",
  title     = "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation",
  booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)",
  year      = "2016",
  note      = "arXiv:1512.02134",
  url       = "http://lmb.informatik.uni-freiburg.de/Publications/2016/MIFDB16"
}

Overview

The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences. For details on the characteristics and differences of the three subsets, we refer the reader to our paper. The following kinds of data are currently available:


Left view Right view Explanation
RGB cleanpass RGB finalpass

RGB stereo renderings: Rendered images are available in cleanpass and finalpass versions (the latter with more realistic—but also more difficult—effects such as motion blur and depth of field). Both versions can be downloaded as lossless PNG or high-quality lossy WebP images.

material segmentation material segmentation

Segmentations: Object-level and material-level segmentation images.

optical flow optical flow

Optical flow maps: The optical flow describes how pixels move between images (here, between time steps in a sequence). It is the projected screenspace component of full scene flow, and used in many computer vision applications.

disparity disparity

Disparity maps: Disparity here describes how pixels move between the two views of a stereo frame. It is a formulation of depth which is independent of camera intrinsics (although it depends on the configuration of the stereo rig), and can be seen as a special case of optical flow.

disparity change disparity change

Disparity change maps: Disparity alone is only valid for a single stereo frame. In image sequences, pixel disparities change with time. This disparity change data fills the gaps in scene flow that occur when one uses only optical flow and static disparity.

motion boundaries motion boundaries

Motion boundaries: Motion boundaries divide an image into regions with significantly different motion. They can be used to better judge the performance of an algorithm at discontinuities.

on request
on request

Raw 3D data: The original source from which we derive disparities, disparity changes, optical flows, and (by extension) motion boundaries. Due to their extremely unwieldy size and the fact that they offer relatively little which is not covered by the other data, we will make these packs available on request.

Camera data: Full intrinsic and extrinsic camera data is available for each view of every stereo frame in our dataset collection.



Downloads

Example pack

Want to get your feet wet? Wish to check out our data without downloading gigabytes of archives first? Then get our sample pack!
  • Contains three consecutive frames from each dataset, in full resolution!
  • Includes samples from all available data: RGB, disparity, flow, segmentation...
  • Stereo pairs for RGB
  • Forward and backward optical flow
  • Less than 100 megabytes!
  • Handcrafted with love and care, just for you <3

Full datasets

FlyingThings3D
Driving
Monkaa
Raw data
RGB images (cleanpass) PNG: .tar (37GB)
WebP: .tar (7.4GB)
PNG: .tar (6.3GB)
WebP: .tar (1.5GB)
PNG: .tar (9.1GB)
WebP: .tar (1.8GB)
RGB images (finalpass) PNG: .tar (43GB)
WebP: .tar (5.7GB)
PNG: .tar (6.1GB)
WebP: .tar (926MB)
PNG: .tar (17GB)
WebP: .tar (2.9GB)
Camera data .tar (15MB) .tar (1.8MB) .tar (3.7MB)
Object segmentation .tar.bz2 (409MB, unzipped 104GB) .tar.bz2 (78MB, unzipped 18GB) .tar.bz2 (83MB, unzipped 34GB)
Material segmentation .tar.bz2 (510MB, unzipped 104GB) .tar.bz2 (170MB, unzipped 18GB) .tar.bz2 (115MB, unzipped 34GB)
Raw 3D view data on request on request on request
Derived data
Disparity .tar.bz2 (87GB, unzipped 104GB) .tar.bz2 (9GB, unzipped 18GB) .tar.bz2 (28GB, unzipped 34GB)
Disparity change .tar.bz2 (116GB, unzipped 208GB) .tar.bz2 (22GB, unzipped 35GB) .tar.bz2 (35GB, unzipped 68GB)
Optical flow .tar.bz2 (311GB, unzipped 621GB) .tar.bz2 (50GB, unzipped 102GB) .tar.bz2 (89GB, unzipped 201GB)
Motion boundaries .tar.bz2 (615MB, unzipped 52GB) .tar.bz2 (206MB, unzipped 8.6GB) .tar.bz2 (106MB, unzipped 17GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).
For batch downloads, here is a list of URLs. MD5 checksums for all files can be found here.


DispNet/FlowNet2.0 dataset subsets

For our network training and testing in the DispNet, FlowNet2.0 etc. papers, we omitted some extremely hard samples from the FlyingThings3D dataset. Here you can download these subsets for the modalities which we used:

FlyingThings3D subset
Subset data
RGB images (cleanpass) .tar.bz2 (35GB, unzipped 35GB)
Object segmentation .tar.bz2 (570MB, unzipped 674MB)
Disparity .tar.bz2 (5GB, unzipped 102GB)
Disparity change .tar.bz2 (2.4GB, unzipped 182GB)
Optical flow .tar.bz2 (75GB, unzipped 364GB)
Motion boundaries .tar.bz2 (979MB, unzipped 1167MB)
Derived data
Disparity Occlusions .tar.bz2 (420MB, unzipped 525MB)
Disparity Occlusion Weights .tar.bz2 (9GB, unzipped 102GB)
Flow Occlusions .tar.bz2 (691MB, unzipped 889MB)
Flow Occlusion Weights .tar.bz2 (15GB, unzipped 182GB)
Motion Boundary Weights .tar.bz2 (12G, unzipped 137GB)
Depth Boundaries .tar.bz2 (654MB, unzipped 755MB)
Depth Boundary Weights .tar.bz2 (11GB, unzipped 102GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).
For batch downloads, here is a list of URLs. MD5 checksums for all files can be found here.




Data formats and organization

  1. Download handy Python IO routines. (Read/write .float3/.flo/.ppm/.pgm/.png/.jpg/.pfm)


  2. Use bunzip2 to decompress .tar.bz2 files, and use "tar xf <file.tar>" to unpack .tar archives. Caution, some archives expand to massively larger sizes.


  3. The RGB image packs are available in both cleanpass and finalpass settings. The cleanpass setting includes lighting and shading effects, but no additional effects. In contrast, finalpass images also contain motion blur and defocus blur.
    All RGB images are provided as both lossless PNG and lossy WebP (used in our experiments). WebP images are compressed using a quality setting of 95%, using the publicly available source code (version 0.5.0). WebP offers 80-90% smaller files than PNG, with virtually indistinguishable results.


  4. The virtual imaging sensor has a size of 32.0mmx18.0mm.
    Most scenes use a virtual focal length of 35.0mm. For those scenes, the virtual camera intrinsics matrix is given by

    fx=1050.0 0.0 cx=479.5
    0.0 fy=1050.0 cy=269.5
    0.0 0.0 1.0

    where (fx,fy) are focal lengths and (cx,cy) denotes the principal point.

    Some scenes in the Driving subset use a virtual focal length of 15.0mm (the directory structure describes this clearly). For those scenes, the intrinsics matrix is given by

    fx=450.0 0.0 cx=479.5
    0.0 fy=450.0 cy=269.5
    0.0 0.0 1.0

    Please note that due to Blender's coordinate system convention (see below), the focal length values (fx,fy) really should be negative numbers. Here we list the positive numbers because in practise this catch is only important when working on the raw 3D data.


  5. All data comes in a stereo setting, i.e. there are "left" and "right" subfolders for everything. The obligatory exception to this rule is the camera data where everything is stored in a single (small) text file per scene.


  6. Camera extrinsics data is stored as follows: Each camera_data.txt file contains the following entry for each frame of its scene:

    ...
    Frame <frame_id>\n frame_id is the frame index. All images and data files for this frame carry this name, as a four-digit number with leading zeroes for padding.
    L T00 T01 T02 T03 T10 ... T33\n Camera-to-world 4x4 matrix for the left view of the stereo pair in row-major order, i.e. (T00 T01 T02 T03) encodes the uppermost row from left to right.
    R T00 T01 T02 T03 T10 ... T33\n Ditto for the right view of the stereo pair.
    \n (an empty line)
    Frame <frame_id>\n (the next frame's index)
    ... (and so on)


    The camera-to-world matrices T encode a transformation from camera-space to world-space, i.e. multiplying a camera-space position column vector p_cam with T yields a world-space position column vector p_world = T*p_cam.


    The coordinate system is that of Blender: positive-X points to the right, positive-Y points upwards, positive-Z points "backwards", from the scene into the camera (left-hand rule with middle finger=x, index finger=y, thumb=z).


    The right stereo view's camera is translated by 1.0 Blender units, with no rotation relative to the left view's camera.


  7. The image origin (x=0,y=0) is located in the upper left corner, i.e. a flow vector of (x=10,y=10) points towards the lower right.


  8. Non-RGB data is provided in either PFM (single channel or three channels) or PGM format, depending on value range and dimensionality. While PFM is a defined standard (think "PGM/PPM for non-integer entries"), it is not widely supported. For C++, we recommend the very excellent CImg library. For Python+NumPy, see this code snippet.

    • Disparity is a single-channel PFM image. Note that disparities for both views are stored as positive numbers.

    • Disparity change is a single-channel PFM image.

    • Optical flow is a three-channel PFM image. Layer 0/1 contains the flow component in horizontal/vertical image direction, while layer 2 is empty (all zeroes).

    • Object and material segmentation are single-channel PFM images each. All indices are integer numbers.

    • Motion boundaries are PGM images. Background is 0, boundary pixels are 255.

  9. For data which depends on the direction of time (optical flow, disparity change, motion boundaries), we provide both forward and backward versions.


  10. Please note that the frame ranges differ between scenes and datasets:

    • FlyingThings3D: 6–15 for every scene
    • Driving: 1–300 or 1–800
    • Monkaa: Different for every scene
  11. The FlyingThings3D dataset is split into "TEST" and "TRAIN" parts. These two parts differ only in the assets used for rendering: All textures and all 3D model categories are entirely disjoint. However, both parts exhibit the same structure and characteristics. The "TRAIN" part is 5 times larger than the "TEST" part.

    Each of these parts is itself split into three subsets A, B, and C. The same rendering asset pools were used for each subset, but the object and camera motion paths are generated with different parameter settings. As a result, motion characteristics are not uniform across subsets.


  12. We did not use the entire FlyingThings3D dataset for DispNet, FlowNet2.0 etc.: samples with extremely difficult data were omitted. See here for a list of images which we did not use.


Frequently asked questions (FAQ)

  • Q: I want to use depth data, but you only provide disparity!
    A: Depth is perfectly equivalent to disparity as long as you know the focal length of the camera and the baseline of the stereo rig (both are given above). You can convert disparity to depth using this formula: depth = focallength*baseline/disparity
    Note that the focal length unit in this equation is pixels, not (milli)meters.

  • Q: I would like to render my own data. Can I get the Blender scene files?
    A: We do not have the necessary licenses for all the assets that we use. We cannot redistribute them.

  • Q: How do I get depth in meters? What is the metric scale of the dataset?
    A: No metric scale or measure exists for these datasets. The focal length is listed as "35mm", but this is only within Blender. It cannot be extrapolated to depth measures in the dataset.

  • Q: What is the disparity/depth range of the dataset?
    A: There is no feasible fixed range. If you normalize the values, expect extreme outliers.

  • Q: Are semantic segmentation labels / class names available?
    A: No, these datasets have no notion of semantics. The labels are random across objects and scenes without any consistency guarantees. The labels are essentially meaningless.

  • Q: Where are the "0016.png" images?
    A: The "0015.pfm" flow files in "into_future" direction (and "0006.pfm" in "into_past" direction) describe optical flows, but their target images ("0016.png"/"0005.png") are not included in the dataset. If you want data for supervised flow training, please ignore these files.


Changelog

  • 20Aug2018: Added downloads for FlyingThings3D subset used in our papers

  • 20Jul2018: Added list of unused samples

  • 13Oct2016: Added sample pack, FAQ section

  • 05Oct2016: Updated md5 checksums for camera extrinsics files

  • 17Aug2016: Uploaded fixed camera extrinsics

  • 02May2016: Added md5 checksums

  • 28Apr2016: Fixed download links for Driving, Monkaa subsets

  • 25Apr2016: Fixed intrinsics matrices

  • XXApr2016: Initial release of most data.


Contact

For questions, please contact Nikolaus Mayer ().