Extracting Character Images from Videos

(Chinese Doc：https://deepghs.github.io/waifuc/main/tutorials-CN/crawl_videos/index.html )

Install Additional Dependencies

waifuc also provides a method to extract images from videos. Before running, you need to install additional dependencies, including the pyav library, for video processing:

pip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[video]

Extract Images from Video Files

In waifuc, you can use VideoSource to process video files, extract frames, and save them as images. Here is an example:

from waifuc.export import SaveExporter
from waifuc.source import VideoSource

if __name__ == '__main__':
    source = VideoSource(
        '/data/videos/[IrizaRaws] Oresuki - 03 (BDRip 1920x1080 x264 10bit FLAC).mkv'
    )
    source.export(
        SaveExporter('/data/dstdataset')
    )

The saved images look like this:

Extract Images from a Folder Containing Videos

In many cases, you may want to process an entire series of downloaded anime videos stored in the same folder. You can directly extract frames from a folder containing videos, as shown in the following code:

from waifuc.export import SaveExporter
from waifuc.source import VideoSource

if __name__ == '__main__':
    source = VideoSource.from_directory('/data/videos')
    source.export(
        SaveExporter('/data/dstdataset')
    )

This code will iterate through all video files in the /data/videos path, extract frames, and save them to the /data/dstdataset folder.

Extract Character Images from a Folder Containing Videos

To extract character images from a video folder, you just need to add the PersonSplitAction to the code, as shown below:

from waifuc.action import PersonSplitAction
from waifuc.export import SaveExporter
from waifuc.source import VideoSource

if __name__ == '__main__':
    source = VideoSource.from_directory('/data/videos')
    source = source.attach(
        PersonSplitAction(),
    )
    source.export(
        SaveExporter('/data/dstdataset')
    )

The code above extracts frames from videos and saves portraits obtained from those frames, as shown below:

However, it seems that some images are not suitable for training. Therefore, in actual anime videos, you can add more actions to obtain a higher quality training dataset. For example, the following code:

from waifuc.action import PersonSplitAction, FilterSimilarAction, FileOrderAction, MinSizeFilterAction, FaceCountAction
from waifuc.export import SaveExporter
from waifuc.source import VideoSource

if __name__ == '__main__':
    source = VideoSource.from_directory('/data/videos')
    source = source.attach(
        # filter similar on full frames (e.g. OPs, EDs)
        FilterSimilarAction(),

        # split for each person
        PersonSplitAction(),

        # must contain only 1 face
        FaceCountAction(1),

        # filter images with min(width, height) < 320
        MinSizeFilterAction(320),

        # filter similar person images
        FilterSimilarAction(),

        # rename the files in order with png format
        FileOrderAction(ext='.png'),
    )
    source.export(
        SaveExporter('/data/dstdataset')
    )

This will result in the following dataset: