Training a LoRA for My Waifu

(Chinese Doc：https://deepghs.github.io/waifuc/main/tutorials-CN/quick_start/index.html )

Before we dive in, please note the following:

You should have Python installed and a basic understanding of Python. For Python tutorials, you can refer to https://python.swaroopch.com/.
Complete the installation of the waifuc library. Check out the installation guide here.

What Should I Do?

First, get to know your waifu, especially her name and origin. For example, my waifu is the beautiful demon girl with red hair named Surtr from the mobile game Arknights.
Find her on an image website and note the search keywords. For Surtr, you can use the tag surtr_(arknights) on Danbooru to find her images.

Save the following code in a file named crawl.py:

from waifuc.action import NoMonochromeAction, FilterSimilarAction, \
    TaggingAction, PersonSplitAction, FaceCountAction, FirstNSelectAction, \
    CCIPAction, ModeConvertAction, ClassFilterAction, RandomFilenameAction, AlignMinSizeAction
from waifuc.export import TextualInversionExporter
from waifuc.source import DanbooruSource

if __name__ == '__main__':
    # put your tag here, e.g., surtr_(arknights)
    s = DanbooruSource(['surtr_(arknights)'])

    # crawl images, process them, and then save them to a directory with the given format
    s.attach(
        # preprocess images with a white background RGB
        ModeConvertAction('RGB', 'white'),

        # pre-filtering for images
        NoMonochromeAction(),  # no monochrome, greyscale or sketch
        ClassFilterAction(['illustration', 'bangumi']),  # no comic or 3d images
        FilterSimilarAction('all'),  # filter duplicated images

        # human processing
        FaceCountAction(1),  # drop images with 0 or >1 faces
        PersonSplitAction(),  # crop for each person
        FaceCountAction(1),

        # CCIP, filter the character you may not want to see in the dataset
        CCIPAction(),

        # if min(height, weight) > 800, resize it to 800
        AlignMinSizeAction(800),

        # tagging with wd14 v2, if you don't need a character tag, set character_threshold=1.01
        TaggingAction(force=True),

        FilterSimilarAction('all'),  # filter again
        FirstNSelectAction(200),  # when the 200th images reach this step, stop this pipeline
        RandomFilenameAction(ext='.png'),  # random rename files
    ).export(
        # save to /data/surtr_dataset directory
        # you can change it to your directory
        TextualInversionExporter('/data/surtr_dataset')
    )

Run the following command to execute the code:
python crawl.py
After running the code, open the /data/surtr_dataset folder to find surtr’s training dataset. It contains 200 images, each with a corresponding txt file for tags.
Congratulations! You can now use the /data/surtr_dataset folder for LoRA training. 🎉

How Does the Code Work?

In the provided code, we’ve constructed a comprehensive pipeline using waifuc, encompassing three main components, which are also the primary modules of waifuc:

Data Source (Source): Responsible for loading data into the pipeline.
- DanbooruSource: Used to crawl images from the Danbooru website and load them into the pipeline. In this example, we perform image retrieval on Danbooru using the tag surtr_(arknights).
Data Processing (Action): Processes the loaded image data.
- ModeConvertAction: Converts image formats; in this example, it converts to the RGB format and adds a white background to transparent images.
- NoMonochromeAction: Filters monochrome images (greyscale, line art, monochrome comics, etc.).
- ClassFilterAction: Filters specified types of images; in this example, it retains only `illustration` (illustrations) and `bangumi` (anime screenshots), while comic (comics) and 3D (3D images like those from Koikatsu and MikuMikuDance) are filtered out.
- FilterSimilarAction: Filters similar images to prevent duplicate images from entering the dataset.
- FaceCountAction: Filters images based on the number of faces; in this example, it keeps only images with exactly one face and filters out images with no faces or multiple faces.
- PersonSplitAction: Splits images to isolate individual characters within the picture.
- CCIPAction: Filters out irrelevant characters introduced into the pipeline; in this example, images of characters other than Surtr are filtered out.
- AlignMinSizeAction: Compresses oversized images; in this example, it resizes images with a short side longer than 800 pixels to ensure the short side does not exceed 800 pixels.
- TaggingAction: Applies tagging to images using wd14v2 tagger; in this example, as the images are sourced from Danbooru, which already includes tag information, setting force to True instructs the tagger to reapply tags.
- FirstNSelectAction: Retains only the first several images; when the number of images reaching this step reaches the set quantity (200 in this example), the pipeline terminates further crawling and processing.
- RandomFilenameAction: Randomly renames images and uses .png as the file extension when saving.
Data Export (Exporter): Exports the processed data.
- TextualInversionExporter: Exports the processed data in the format of images and txt labels to the specified path (/data/surtr_dataset in this case).

The overall process is illustrated below:

Building upon this foundation, you can:

Replace the data source to crawl data from other websites. We support various data sources for different websites, including Pixiv, Sankaku, Rule34, AnimePictures, Zerochan, Konachan, Duitang, Huashi6, and more.
If you have images in your local folder, you can use LocalSource to load them into the pipeline for processing.
Add or remove Actions to modify the data filtering and processing flow.
Replace the Exporter to save images in different formats to meet diverse data usage requirements.

The comprehensive workflow is presented above. This flexible and customizable structure allows you to adapt the pipeline to various scenarios.

Good News!

After reviewing the example, you may notice that the pipeline requires finding character tags on image websites before loading data into the pipeline. This process seems cumbersome, requiring manual user intervention.

The good news is, we’ve partially solved this problem! You can now input the character’s name directly (in Chinese, Japanese, or English), and the pipeline will automatically generate data sources from various websites for data crawling.

To use this feature, install the additional dependency:

pip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[gchar]

This library, gchar, contains a pre-built character database with character tables for several mobile games and corresponding tags on various image websites. It currently supports limited games, including Genshin Impact, Arknights, Azur Lane, Blue Archive, Girls’ Front-Line, and more. All the supported games are listed here.

Replace s = DanbooruSource(['surtr_(arknights)']) with s = GcharAutoSource('surtr') in the code to automatically select the most suitable data source for your character. The full code looks like this:

from waifuc.action import NoMonochromeAction, FilterSimilarAction, \
    PersonSplitAction, TaggingAction, FaceCountAction, FirstNSelectAction, \
    CCIPAction, ModeConvertAction, ClassFilterAction, RandomFilenameAction, AlignMinSizeAction
from waifuc.export import TextualInversionExporter
from waifuc.source import GcharAutoSource

if __name__ == '__main__':
    # data source for surtr in arknights, images from many sites will be crawled
    # all supported games and sites can be found at
    # https://narugo1992.github.io/gchar/main/best_practice/supported/index.html#supported-games-and-sites
    # ATTENTION: GcharAutoSource required `git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[gchar]`
    s = GcharAutoSource('surtr')

    # crawl images, process them, and then save them to a directory with given format
    s.attach(
        # preprocess images with white background RGB
        ModeConvertAction('RGB', 'white'),

        # pre-filtering for images
        NoMonochromeAction(),  # no monochrome, greyscale or sketch
        ClassFilterAction(['illustration', 'bangumi']),  # no comic or 3d images
        FilterSimilarAction('all'),  # filter duplicated images

        # human processing
        FaceCountAction(1),  # drop images with 0 or >1 faces
        PersonSplitAction(),  # crop for each person
        FaceCountAction(1),

        # CCIP, filter the character you may not want to see in dataset
        CCIPAction(),

        # if min(height, weight) > 800, resize it to 800
        AlignMinSizeAction(800),

        # tagging with wd14 v2, if you don't need character tag, set character_threshold=1.01
        TaggingAction(force=True),

        FilterSimilarAction('all'),  # filter again
        FirstNSelectAction(200),  # when the 200th images reach this step, stop this pipeline
        RandomFilenameAction(ext='.png'),  # random rename files
    ).export(
        # save to /data/surtr_dataset directory
        # you can change it to your own directory
        TextualInversionExporter('/data/surtr_dataset')
    )

Now you can enjoy automatic data source selection for data crawling. Happy training! 🚀