Camera Measurement of Physiological Vital Signs

[Home]

Datasets

Public datasets serve two important purposes for the research community. First, they provide access to data to researchers who many not have the means to collect their own, lowering the bar to entry. Second, the provide a transparent testing set to fairly compare computational methods and set benchmarks. Descriptions of benchmark datasets should include details of the imaging device, lighting and participant demographic information. In addition to videos and gold-standard contact measurements.
There are a number of extremely valuable datasets that have been released publicly. Here is a table with links and information about how to obtain them:

Name Details
Mahnob-HCI This dataset was originally collected for the purposes of creating systems for implicit tagging of multimedia content. Videos of 27 participants (15 women, 12 men) were collected while they were wearing an ECG sensor. This was one of the earliest public datasets which included videos and time synchronized physiological groundtruth. One limitation of this data set is the heavy video compression which means that physiological information in the videos is somewhat attenuated. Videos were recorded at a resolution of 780x580 and 61Hz. Most analyses~\cite{li2014remote,chen2018deepphys}, use a 30-second clip (frames from 306 through 2135) from 527 video sequences.
BP4D+ The BP4D+ data set is a multimodal data set containing time synchronized 3D 2D thermal and physiological recordings. This large data set contains videos of 140 subjects and ten emotional sitting tasks. The videos astorg in relatively uncompressed format add the data set contains a relatively broad range of ages 18 to 66 and ethnic or racial diversity. Furthermore unlike many other datasets was there contains a majority female. of note is that this data set does not include either PPG or ECT gold standard measures but rather contains pulse pressure waves as measured fire fingercuff. The post pressure wave is similar to but different in morphology to the PPG signal. RGB videos were recorded at a resolution of 1040x1392 (Note: this is portrait) and 24 Hz.
VIPL-HR Is the largest multimodal data set with videos and time synchronized physiological recordings it contains 2378 RGB or visible light videos and 752 near infrared videos of 107 subjects. Gold-standard PPG, heart rate and SpO$_2$ were recorded. Videos were recorded with three RGB cameras and one NIR camera: i) an RGB Logitech C310 at resolution 960×720 and 25 Hz, ii) a RealSense F200 NIR camera at resolution 640×480 and RGB camera at 1920×1080, both 30 Hz, iii) an RGB HUAWEI P9 at resolution 1920×1080 and 30 Hz.
COHFACE Contains RGB video recordings synchronized with cardiac (PPG) and respiratory signals. The dataset includes 160 one-minute long video sequences of 40 subjects (12 females and 28 males). The video sequences have been recorded with a Logitech HD C525 at a resolution of 640x480 pixels and a frame-rate of 20Hz. Gold-standard measurements were acquired using the Thought Technologies BioGraph Infiniti system.
UBFC-rPPG Is a similar RGB video dataset, collected with a Logitech C920 HD Pro at 30Hz with a resolution of 640x480 in uncompressed 8-bit RGB format. A CMS50E transmissive pulse oximeter was used to obtain the gold-standard PPG data. During the recording, the subjects were seated one meter from the camera. All experiments are conducted indoors with a mixture of sunlight and indoor illumination.
UBFC-PHYS Is another public multimodal dataset with RGB videos, in which 56 subjects (46 women and 10 men) participated in Trier Social Stress Test (TSST) inspired experiment. Three tasks (rest, speech and arithmetic) were completed by each subject resulting in 168 videos. Gold-BVP and EDA measurements were collected via a wristband (Empatica E4). Before and after the experiment, participants completed a form to calculate their self-reported anxiety scores. The video recordings were at resolution 1024x1024 and 35Hz.
Rice CameraHRV This dataset consists of activities with complex facial movement, containing video recordings of 12 subjects (8 male, six female) during stationary, reading, talking, video watching and deep breathing tasks (total of 60 recordings). Each video is 2 minutes in duration. Gold-standard PPG data were collected using an FDA approved pulse oximeter. The camera recordings were made with a Blackfly BFLY-U3-23S6C (Point Grey Research) with Sony IMX249 sensor. Frames were captured at a resolution of 1920x1200 and 30Hz.
MERL-Rice NIR Pulse Contains recordings (19) of drivers in a cockpit driving around a city and recordings (18) stationary in a garage. Each video recorded in the garage is two minutes in duration and those recorded while driving are 2-5 minutes long. The 18 (16 male, two female) subjects were healthy, aged 25–60 years. Four of the subjects were recorded at night and 14 during the day. Recordings were made with NIR (Point Grey Grasshopper GS3-U3-41C6NIR-C) and RGB (FLIR Grasshopper3 GS3-PGE23S6C-C) cameras mounted on the dashboard in front of the subject. The NIR camera was fitted with a 940 nm hard-coated optical density bandpass filter from Edmund Optics with a 10 nm passband. Frames were captured at a resolution of 640x640 and 30Hz (no gamma correction and with fixed exposure). Gold-standard PPG data were recorded with a CMS 50D+ finger pulse oximeter at 60Hz.
PURE Recordings of 10 subjects (8 male, 2 female) each during six tasks. The videos were captured with an RGB eco274CVGE camera (SVS-Vistek GmbH) at a resolution of 640x480 and 60 Hz. The subjects were seated in front of the camera at an average distance of 1.1 meters and lit from the front with ambient natural light through a window. Gold-standard measures of PPG and SpO$_2$ were collected with a pulox CMS50E attached to the finger. The six tasks were described a follows: i) The subject was seated, stationary and looking directly into the camera. ii) The subject was asked to talk while avoiding additional head motion. iii) The participant moved their head in a horizontal translational manner at an average speed proportional to the size of the face within the video. iv) Similar to the previous task with twice the velocity. v) Subjects were asked to orient their head towards targets placed in an arc around the camera in a predefined sequence. The motions were designed to be random, and not periodic (approx. 20° roations). vi) Similar to the previous task with larger head rotations (approx. 35° rotations).
rPPG The rPPG dataset includes 52 recording from three RGB cameras: a Logitech C920 webcam at resolution 1920×1080 (WMV2 video codec), a Microsoft VX800 webcam at resolution 640 × 480 (WMV3 video codec), and a Lenovo B590 laptop integrated webcam at resolution 640× 480 pixels (WMV3 video codec). All recordings were 24-bit depth (3x 8-bit per channel) at 15 Hz. The duration of the recordings was between 60 and 80 seconds. Between 2 and 14 videos were recorded for eight healthy subjects (7 male, 1 female, 24 to 37 years). Primary illumination was ambient daylight and indoor lighting. Subjects were seated 0.5-0.7 m from the camera. Gold-standard PR measures were collected via a Choicemmed MD300C318 pulse oximeter. Participants completed a combination of stationary and head motion tasks. In the motion tasks, subjects rotated their head from right to left (with 120° amplitude), from up to down (with 100° amplitude). Subject was also asked to speak and change facial expressions.
OBF The Oulu Bio-Face (OBF) database includes facial videos recorded from healthy subjects and from patients with atrial fibrillation. Recordings were made with an RGB and NIR camera. The subjects were seated one meter from the cameras. Two light sources were placed either side of the cameras and illuminated the face at 45 degree angle from a distance of 1.5 meters. According to their published work the authors plan to make this dataset publicly available; however, we were unable to find information about how to access it at the time of writing.
PFF The PPG From Face (PPF) database includes facial videos of 13 subjects each during five tasks (65 videos total). Each video is 3 minutes, recorded with resolution 1280x720 at 50 Hz. Gold-standard PR was collected via two Mio Alpha II wrist heart rate monitors (the average PR of the two readings is used). The subjects were seated in front of the camera at a distance of 0.5 meters. The five tasks were: 1) The subject was seated stationary with fluorescent illumination. 2) The subject moved their head/body in a horizontal translational motion (right and left) with a frequency between 0.2-0.5 Hz. Flourescent lights were on. 3) The subject was seated stationary with ambient illumination primarily from windows and a computer monitor. 4) The same as task 2 with ambient illumination primarily from windows and a computer monitor. 5) The same illumination condition as Task 1, each subject was riding on an exercise bike at a constant speed.
CMU PPG Videos were recorded from 140 subjects subjects in India (44) and Sierra Leone (96). Three deidentified videos were generated from each face video, one each of the forehead, left cheek and right cheek. A rectangular region of resolution 60x30 of the forehead, a square region of resolution 25x25 pixels of the left cheek and a square region of 25x25 pixels of the right cheek. Videos were recorded at 15 Hz.
VicarPPG VicarPPG includes twenty recordings of 10 subjects (ages 20-35). Each video is 90 seconds, recorded at resolution 720x1280 and 30Hz. Two videos were recorded for each subject, in the first they were stationary after a rest period and in the second they were stationary after a physical exercise task. Gold-standard PPG waveforms were recorded using a CMS50 Pulse Oximeter attached to the subject's fingertip.