How is this dataset structured?
The dataset is structured by sequences. Inside each sequence you'll find the frames that compose it. A frame is composed of 4 color images, 4 sets of 2D joints as projected in each of the image planes, 4 bounding boxes, 1 set of 3D points as provided by the Leap Motion Controller and 4 sets of 3D points as reproejcted to each camera coordinate frame.
- data_1
- 0_webcam_1.jpg
- 0_webcam_2.jpg
- 0_webcam_3.jpg
- 0_webcam_4.jpg
- 0_joints.txt
- 1_webcam_1.jpg
- 1_webcam_2.jpg
- 1_webcam_3.jpg
- 1_webcam_4.jpg
- 1_joints.txt
- 2_webcam_1.jpg
- 2_webcam_2.jpg
- 2_webcam_3.jpg
- 2_webcam_4.jpg
- 2_joints.txt
- data_2
- ...
The files are named X_type_Y. X denotes the number of frame. As it was continuously recorded, the frames are numered in ascending order. Type denotes the type of data:
- webcam: Its the sample. A color image of a hand.
- joints: 3D joints positions in the real world coordinate frame
Finally, Y denotes the source camera. As we captured data from 4 different cameras at the same time, we provide the image, the 2D points and the bounding box of the hand for each camera for each frame.
How can I obtain the bounding boxes, the 2D projections and the 3D projections?
Along with the dataset itself, it is provided also a set of python scripts (in the utils folder) that will allow you to compute locally the aformentioned data using the calibration files that now we provide (in the calibrations folder). The folder utils contains the following scripts:
- generate2Dpoints.py: It will create the 2D joints projections for each image
- generate3Dpoints.py: It will create the 3D joints projections for each camera
- generateBBoxes.py: It will create the bounding box of the hands for each image