Data

The file is adapted from Shikra. You can access shikra's data from Google Driver or Baidu Netdisk. Then download the additional files in Google Drive. This compressed file exclusively contains the annotations in jsonl format. Once you extract this ZIP file, please ensure that you place it under the data folder. After extraction, the directory structure should appear as follows:

|- config
|- mllm
|- data
    |-- blip_laion_cc_sbu_558k.jsonl
    |-- CAP_coco2014_train.jsonl
    |-- CWB_flickr30k_train.jsonl
    ...

Please note that the images can be downloaded separately from their official website. You can update the dataset image_folder configuration in the config/_base_/dataset/DEFAULT_XXXX_XXXXXX.py directory accordingly.

For example, if you are working with the Flickr30k trainset in config/_base_/dataset/DEFAULT_TRAIN_DATASET.py, you can update the image_folder field as follows:

flickr=dict(
    type='FlickrDataset',
    filename=r'{{fileDirname}}/../../../data/CWB_flickr30k_train.jsonl',
    image_folder=r'zz1424:s3://production-public-flickr_image/Flickr_Image/unzip/flickr30k_images/flickr30k_images',
    template_file=r'{{fileDirname}}/template/flickr30k.json',
    ),

to

flickr=dict(
    type='FlickrDataset',
    filename=r'{{fileDirname}}/../../../data/CWB_flickr30k_train.jsonl',
    image_folder=r'path/to/flickr30k_images/on/your/computer',
    template_file=r'{{fileDirname}}/template/flickr30k.json',
    ),

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Data

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Data