arxiv:2408.05366

DeepSpeak Dataset v1.0

Published on Aug 9

· Submitted by

matybohacek on Aug 15

Upvote

Authors:

Sarah Barrington ,

Matyas Bohacek ,

Hany Farid

Abstract

We describe a large-scale dataset--{\em DeepSpeak}--of real and deepfake footage of people talking and gesturing in front of their webcams. The real videos in this first version of the dataset consist of 9 hours of footage from 220 diverse individuals. Constituting more than 25 hours of footage, the fake videos consist of a range of different state-of-the-art face-swap and lip-sync deepfakes with natural and AI-generated voices. We expect to release future versions of this dataset with different and updated deepfake technologies. This dataset is made freely available for research and non-commercial uses; requests for commercial use will be considered.

View arXiv page View PDF Add to collection

Community

matybohacek

Paper author Paper submitter Aug 15

The DeepSpeak dataset contains over 43 hours of real and deepfake footage of people talking and gesturing in front of their webcams. The source data was collected from a diverse set of participants in their natural environments and the deepfakes were generated using state-of-the-art open-source lip-sync and face-swap software. The dataset is available to the digital forensics research community via Hugging Face Datasets.