graviti
产品公开数据集应用市场解决方案知识库关于我们
952
0
20
The IIIT 5K-word
概要
讨论
代码
活动
33c447aa-8ce2-11eb-b816-506b4b419b4c
6e0c379·
Jun 28, 2021 12:26 AM
·1Commits

Overview

The IIIT 5K-word dataset is harvested from Google image search. Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images. The dataset is divided into train and test parts. This dataset can be used for large lexicon cropped word recognition. We also provide a lexicon of more than 0.5 million dictionary words with this dataset.

Instruction

(Usage: Case insensitive small/medium/large lexicon cropped word recognition)

  1. Open Matlab

  2. Load testdata

  3. A structure testdata will be loaded. This structure has four fields. (a) ImgName The cropped word image name.

    (b) GroundTruth Specifies the ground truth text corresponding to the cropped word (c) smallLexi Contains a lexicon list of 50 words per image (referred to as small size lexicon in the paper) (d) mediumLexi Contains a lexicon list of 1000 words per image (the medium size lexicon)

Citation

If you use this dataset, please cite:

@InProceedings{MishraBMVC12,
  author    = "Mishra, A. and Alahari, K. and Jawahar, C.~V.",
  title     = "Scene Text Recognition using Higher Order Language Priors",
  booktitle = "BMVC",
  year      = "2012",
}
🎉感谢DL数据集的贡献
数据集信息
应用场景暂无
标注类型暂无
任务类型暂无
LicenseUnknown
更新时间2020-12-31 17:27:11
数据概要
数据格式暂无
数据数量5,000
已标注数量0
文件大小100.96KB
版权归属方
CVIT
标注方
未知
了解更多和支持