Fallen Person Dataset
This data set consists of depth images for training and test of fallen person detection systems. The data does not contain the fall event itself, but rather the lying pose after a fall. The training set consists of more than 90,000 depth images of ten persons, the test set consists of 7727 depth images of 45 robot drive sequences.
For training we recorded ten persons in different lying poses. Each pose was repeated eight times in order to get different views in front of the sensor. Since we used NDT maps in our detector, we did not record single snapshots of a view. Instead, we recorded continuously. This way, we captured movements of a person too and thus, the switches between poses. Since we want to detect persons after a fall event with a mobile robot, we also included a simulated movement noise by slightly moving the recording sensor.
We recorded the data as raw depth images. Each pixel corresponds to the distance of an object to the sensor plane in millimeter. In order to get the humans free from the rest of a scene, we estimated the ground plane and removed the corresponding ground plane pixels (each scene only contains a person and a ground plane, no other objects). We do not publish the planes but we used them to compute foreground pixel masks. These can be used to separate the pixels corresponding to the persons from the image. But these masks may not be perfect. Therefore, calculating the ground plane by yourself with different parameters could result in better masks. Furthermore, we recorded background depth images without a person. So a simple background subtraction between a depth image and the background image could be an option too (e.g. in combination with some morphological operations). But due to the sensor movement, the plane estimation variant or our masks should be the clear better choice.
As the recording was done while the persons were moving, samples where persons do not truly lie on the ground could be included too (e.g. sitting instead of lying). If you require the set free of such examples, they need to be manually deleted.
The depth images of the recorded persons are separated into directories. Since a continuous recording could be exhausting, we splitted them into takes. Per person the number of takes differ. In each take directory we stored a depth background image, i.e. a learned background model without the person. The "images" directory contains the actual depth images of the human.
The name of the background image is "Background_Image_NR.pgm" where NR is the number of the background image within that take (since we decided to use the ground plane estimation for person separation this should be always 000001).
The images are named as "Image_FrameNumber_BG_BackgroundNumber.pgm". FrameNumber is thereby the number of the recording frame of the take. BackgroundNumber is the number of the corresponding background image (should be always 000001).
The intrinsic parameters of the sensor are stored as "intrinsics.txt" in the base directory.
The test data consists of 45 sequences captured with a mobile robot, which was driving through our living lab. Per sequence up to one fallen person was present. In order to test a detector for false positives, the whole drive through the apartment of the robot was captured, i.e. that the fallen person is not present in each frame. Furthermore, we included sequences without persons, where we placed objects on the ground or recorded a dog in order to test for difficult negative examples. Each is a set of depth image and label files.
Each driving sequence is in a separate directory and contains subdirectories for images and for labels respectively. Images and labels are in ascending order. Each depth image pgm-file has a corresponding label xml-file. The label file consists of the size of the depth image of the recording sensor, the intrinsic parameters (once normalized and once matching the image size) and a transformation (3D rigid transform) between the 3D ground truth bounding box and the sensor. Furthermore, for each frame, where at least a single point of the depth image lies within the 3D bounding box, the bounding box is included in 3D and in 2D (rectangle, bounding box projected onto the image plane). When there is no bounding box information available then the corresponding frame does not contain a person lying on the ground.
All data was captured with an Asus Xtion depth camera. The depth frames have a resoultion of 640 x 480. Each pixel corresponds to the distance of an object to the sensor plane in millimeter.
For the test sequences, the sensor was mounted on our mobile robot. There, it was inclined downwards. For training, we used a separate sensor on a tripod. We tried to get a similar pose, like on the robot.
For all sensors the intrinsic parameters are available. For the robot sequences, the extrinsic parameters are available too.
If you consider using the data sets on this page, please reference the following:
Lewandowski, B., Wengefeld, T., Schmiedel, Th., Gross, H.-M.
I See You Lying on the Ground - Can I Help You? Fast Fallen Person Detection in 3D with a Mobile Robot.
In: IEEE Int. Symp. on Robot and Human Interactive Communication (RO-MAN), Lisbon, Portugal, pp. 74-80, IEEE 2017
Send the completed form by email to email@example.com to acquire a login. Please note that the dataset is available for academic use only (academic institutions and non-profit organizations) and needs to be signed by a permanent staff member of any of these institutes. Therefore, you may have to ask your supervisor to sign.