Our training and test samples consist of manually labeled
pedestrian and non-pedestrian bounding boxes in images captured
from a vehicle-mounted calibrated stereo camera rig
in an urban environment. For each manually labeled pedestrian, we
created additional samples by geometric
jittering. Non-pedestrian samples were the result of a shape
detection pre-processing step with relaxed threshold setting,
i.e. containing a bias towards
more difficult patterns.
Dense stereo is computed using the semi-global matching
algorithm (H. Hirschmueller, Stereo processing by semi-global
matching and mutual information, IEEE PAMI, 30(2):328-341, 2008) To
compute dense optical flow, we use structure- and
motion-adaptive regularized flow (A. Wedel et al., Structure- and
motion-adaptive regularization for high accuracy optic
flow, ICCV, 2009).
Training and test samples have a resolution of 48 x 96 pixels with a 12-pixel border around the pedestrians. Note, that the experiments in our paper (see above) were done on 36 x 84 pixel images with a border of 6 pixels, i.e. crops of the provided dataset, with a three-component layout corresponding to head, torso, legs. For publication of the dataset, we chose to provide images with a larger border and without a pre-defined component layout, to allow for higher flexibility in the selection of components.