This is the dataset download and document page for DeeperForensics-1.0.
DeeperForensics-1.0, a new dataset for real-world face forgery detection, features three appealing properties: good quality, large scale, and high diversity. The full dataset includes 48,475 source videos and 11,000 manipulated videos, an order of magnitude larger than existing datasets. The source videos are carefully collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. Besides, 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity. We will provide some detailed information as follows. You can also refer to our paper.
To download our dataset, please read the Terms of Use first and then fill out this form (if you have an educational email address, i.e., *.edu*, please use it). The download link will be sent to you once your request is approved. If you are unable to access the form or have any other questions, please contact us by sending an email to deeperforensics@gmail.com.
After you download all the files, simply run the script (bash version >= 4) provided in the dataset root:
bash unzip.sh
or manually unzip all the files. It may take a few minutes / hours. After unzipping the files, you will get all the data in the following folder structure:
DeeperForensics-1.0
|--lists
|--manipulated_videos_distortions_meta
<several meta files of distortion information in the manipulated videos>
|--manipulated_videos_lists
<several lists of the manipulated videos>
|--source_videos_lists
<several lists of the source videos>
|--splits
<train, val, test data splits for your reference>
|--manipulated_videos
<11 folders named by the type of variants for the manipulated videos>
|--source_videos
<100 folders named by the ID of the actors for the source videos, and their subfolders named by the source video information>
|--terms_of_use.pdf
We will provide some detailed information about DeeperForensics-1.0 dataset. You can also refer to our paper for related information.
We carefully collect 48,475 source videos in total to improve the quality of our dataset.
The resolution of all the source videos is 1080p: 1920 * 1080.
In most cases, we ask the actors to speak naturally to avoid excessive frames that show a closed mouth.
We invite 100 paid actors from 26 countries to record the source videos. We obtain consents from all the actors for using and manipulating their faces to avoid the portrait right issues. The actors have 4 typical skin tones: white, black, yellow, brown. Their ages range from 20 to 45 years old to match the most common age group appearing on real-world videos.
There are 55 males and 45 females. The actor folder ID starting with 'M' (e.g., M004) is male, and the one starting with 'W' (e.g., W101) indicates female.
There are 40 Asian actors in the source videos, and another 60 actors are from other parts of the world. You can also distinguish them by looking at the first digit of the actor ID. If it is '1' (e.g., W101), she is an Asian actor; If it is '0' (e.g., M004), he is an actor from other parts of the world.
light_<light_direction>
shows the illumination setting. The <light_direction>
can be:
down
, left
, leftdown
, leftup
, right
, rightdown
, rightup
, uniform
, up
.
Note that the direction up
means 'top', and down
indicates 'bottom', in line with our paper.
The uniform
indicates that illuminations are from all the directions, which are common in our life.<emotion>
shows them directly:
neutral
, happy
, surprise
, angry
, contempt
, sad
, disgust
, fear
.
Furthermore, the actors are asked to perform 53 expressions defined by 3DMM blendshapes to
supplement some extremely exaggerated expressions. The subfolder named as BlendShape
contains
this part of source videos. In most cases, the blendshapes are only recorded under uniform illumination.camera_<camera_direction>
shows the camera setting.
The <camera_direction>
can be:
down
, front
, left
, leftfront
, right
, rightfront
, up
.
Note that the direction up
means 'oblique-above', and down
indicates 'oblique-below',
in line with our paper. To ensure covering all the angles, the actors are asked to turn their
heads naturally (only under uniform illumination, see our paper
for the reason).<ID>_light_<light_direction>_<emotion>_camera_<camera_direction>.mp4
or
<ID>_BlendShape_camera_<camera_direction>.mp4
.Extra source videos: we also record some extra source videos for the blendshapes under
diverse illuminations for your reference. Some ID folders of the actors have a subfolder
named as BlendShape_extra
that contains the extra source videos. We also provide the list of the
extra source videos. The name of extra source videos is in form of:
<ID>_BlendShape_extra_light_<light_direction>_camera_<camera_direction>.mp4
.
Missing source videos: some source videos under a specific light condition or with blendshapes are missing (very few cases compared to the dataset size). We also provide the list of the missing source videos. In fact, we have been waiting for a long time to conduct a supplementary data collection. Affected by COVID-19, we have not found a chance to do it.
We provide 11,000 manipulated videos in total with good quality and high diversity.
The manipulated videos are generated by our newly proposed many-to-many end-to-end face swapping method, DF-VAE, to improve quality.
7 types of perturbations at 5 intensity levels are applied to improve diversity and better simulate real-world scenarios.
About the 'real' part for face forgery detection model training, please read the following Target videos section.
We provide 1,000 raw manipulated videos mentioned in our paper, which are generated by
DF-VAE in an end-to-end manner. They are in the end_to_end
subfolder.
Besides, we provide additional 1,000 raw manipulated videos using the reenacted faces by
DF-VAE. We further manually postprocess the reenacted faces with the original frames by color
matching, warping, affine transformation, etc., resulting in these videos. Some videos look
better using this method, while the others do not. It can be just considered as an alternative
that has similar results. The videos are in the reenact_postprocess
subfolder.
In line with our paper, We apply random-type distortions to the 1,000 raw manipulated videos
( end_to_end
) at 5 different intensity levels, producing a total of 5,000 manipulated videos.
They are in the end_to_end_level_<i>
subfolders, where <i>
ranges from 1
to 5
.
Besides, an additional of 1,000 manipulated videos are generated by adding random-type,
random-level distortions to the raw manipulated videos (end_to_end
). The videos are in the
end_to_end_random_level
subfolder.
end_to_end
). The videos are in the end_to_end_mix_<j>_distortions
subfolders, where <j>
ranges from 2
to 4
.<target_id>_<source_id>.mp4
. <target_id>
is the three-digit ID of the target
videos, and <source_id>
is the actor ID of our collected source videos.
This means the face of the actor in the source video is swapped onto the target video. Note: to
improve quality and keep high fidelity, we intentionally perform face swapping with the
corresponding gender (i.e., man-to-man, woman-to-woman). Because we can infer the gender
from <source_id>
, this can also be considered as the gender annotation for the
target videos.How can we use DeeperForensics-1.0 dataset for face forgery detection model training?
Please be careful that our collected source videos are not the 'real' part for detection, although they are very useful for a lot of face-related research, e.g., face generation, illumination transfer, etc. We use the source videos to improve face manipulation quality.
The 'real' part for face forensics models are the target videos, i.e., 1,000 refined YouTube videos collected by FaceForensics++, in line with our paper.
Please note: in this version, the target videos are NOT any part of DeeperForensics-1.0 dataset. We strictly follow the dataset non-distributed agreement. Thus, we cannot provide this part. You should download the target videos from FaceForensics++ with C23 compression rate. We will consider updating the target videos in the next version.
The 'real' target videos from FaceForensics++ do not contain any perturbations. To conduct experiments in the new real-world settings, you can use our provided perturbation codes together with the distortion meta files to add corresponding distortions to the target videos by yourself.
We also consider it as a way to give users more freedom to apply perturbations. Our setting in this version might not be the best. We welcome everyone to help improve the benchmark to better simulate real-world scenarios.
The list of all the source videos is named as source_videos_list.txt
, and that of the
extra source videos is named as source_videos_extra_list.txt
. Besides, the list of the missing
source videos is named as source_videos_missing_list.txt
.
The lists of different types of variants for the manipulated videos are named as
manipulated_videos_<type_of_variant>_list.txt
, where <type_of_variant>
is also the subfolder
name of the manipulated videos.
The distortion meta files for the manipulated videos are also provided, named as
manipulated_videos_<type_of_variant>_meta.txt
, where <type_of_variant>
is also the subfolder
name of the perturbed manipulated videos. Each line in the meta file is in the form of:
<manipulated_video_path> <first_type>:<first_level> <second_type>:<second_level> ...
in line with the definition of meta file in our perturbation codes (you can also find the meaning of the abbreviations in the perturbation code argument explanation). The sequence of distortions also reflects the order of adding them. You can easily reproduce the perturbed videos using the perturbation codes and the distortion meta files.
We also provide the data splits for your reference, named as train.txt
, val.txt
and
test.txt
. Each line of these files shows a manipulated ('fake') video names. You can easily find the
corresponding the 'real' videos (i.e., target videos) using the first three digit ID.
Please note that you might not strictly reproduce the results in our paper for several reasons:
The video number ratio of our splits for training, validation and testing is 703 : 96 : 21. The swapped source identity ratio is 71 : 9 : 20.
If you are interested in finding a new setting for real-world face forgery detection using our dataset, we encourage you to do so. However, please avoid data leak (i.e., you should randomly choose unrepeated identities of the source faces, and group all the manipulated videos according to the source identities). We welcome everyone to make our benchmark more comprehensive.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。