Extracting Character Images from Videos

(Chinese Doc:https://deepghs.github.io/waifuc/main/tutorials-CN/crawl_videos/index.html )

Install Additional Dependencies

waifuc also provides a method to extract images from videos. Before running, you need to install additional dependencies, including the pyav library, for video processing:

pip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[video]

Extract Images from Video Files

In waifuc, you can use VideoSource to process video files, extract frames, and save them as images. Here is an example:

 1from waifuc.export import SaveExporter
 2from waifuc.source import VideoSource
 3
 4if __name__ == '__main__':
 5    source = VideoSource(
 6        '/data/videos/[IrizaRaws] Oresuki - 03 (BDRip 1920x1080 x264 10bit FLAC).mkv'
 7    )
 8    source.export(
 9        SaveExporter('/data/dstdataset')
10    )

The saved images look like this:

../../_images/video_simple1.png

Extract Images from a Folder Containing Videos

In many cases, you may want to process an entire series of downloaded anime videos stored in the same folder. You can directly extract frames from a folder containing videos, as shown in the following code:

1from waifuc.export import SaveExporter
2from waifuc.source import VideoSource
3
4if __name__ == '__main__':
5    source = VideoSource.from_directory('/data/videos')
6    source.export(
7        SaveExporter('/data/dstdataset')
8    )

This code will iterate through all video files in the /data/videos path, extract frames, and save them to the /data/dstdataset folder.

Extract Character Images from a Folder Containing Videos

To extract character images from a video folder, you just need to add the PersonSplitAction to the code, as shown below:

 1from waifuc.action import PersonSplitAction
 2from waifuc.export import SaveExporter
 3from waifuc.source import VideoSource
 4
 5if __name__ == '__main__':
 6    source = VideoSource.from_directory('/data/videos')
 7    source = source.attach(
 8        PersonSplitAction(),
 9    )
10    source.export(
11        SaveExporter('/data/dstdataset')
12    )

The code above extracts frames from videos and saves portraits obtained from those frames, as shown below:

../../_images/video_split1.png

However, it seems that some images are not suitable for training. Therefore, in actual anime videos, you can add more actions to obtain a higher quality training dataset. For example, the following code:

 1from waifuc.action import PersonSplitAction, FilterSimilarAction, FileOrderAction, MinSizeFilterAction, FaceCountAction
 2from waifuc.export import SaveExporter
 3from waifuc.source import VideoSource
 4
 5if __name__ == '__main__':
 6    source = VideoSource.from_directory('/data/videos')
 7    source = source.attach(
 8        # filter similar on full frames (e.g. OPs, EDs)
 9        FilterSimilarAction(),
10
11        # split for each person
12        PersonSplitAction(),
13
14        # must contain only 1 face
15        FaceCountAction(1),
16
17        # filter images with min(width, height) < 320
18        MinSizeFilterAction(320),
19
20        # filter similar person images
21        FilterSimilarAction(),
22
23        # rename the files in order with png format
24        FileOrderAction(ext='.png'),
25    )
26    source.export(
27        SaveExporter('/data/dstdataset')
28    )

This will result in the following dataset:

../../_images/video_split_better1.png