To address the challenges associated with pre-training models for industrial applications, we have curated a comprehensive dataset called Ind-2M. This dataset has been specifically collected from the industrial scenarios. Ind-2M comprises a vast collection of 2.2 million industrial images, sourced from both publicly available industrial datasets and data obtained through web crawling. The dataset encompasses 1.6 million images of non-defective industrial products, as well as 0.6 million images depicting various defects found in industrial settings. By leveraging the Ind-2M dataset, our objective is to facilitate the advancement of industrial representation through pre-training models.
This project has open-sourced the data crawler part of the Ind-2M dataset. Due to copyright issues, the publicly available data portion of the Ind-2M dataset should be downloaded based on the indexes provided in the paper. The portion of the data crawled in the Ind-2M dataset is called Ind-2M-Crawling, which includes 221,062 industrial product images and 614,002 industrial defect images. Downloading and using this dataset should comply with the dataset's license, and the dataset paper should be cited.
https://drive.google.com/drive/folders/19bUh_S114CPiFQQH1_ezMT4A5xgAi3y0?usp=sharing
If you use Ind-2M in your research, please use the following BibTeX entry.
@article{zhu2024pixel,
title={Pixel-level Contrastive Pre-Trainer for Industrial Image Representation},
author={Zhu, Bingke and Chen, Yingying and Tang, Ming and Wang, Jinqiao},
journal={IEEE Transactions on Instrumentation and Measurement},
year={2024},
publisher={IEEE}
}