graviti
产品公开数据集应用市场解决方案知识库关于我们
902
0
33
Text in the wild
概要
讨论
代码
活动
33c44621-8ce2-11eb-b816-506b4b419b4c
ceb3fc6·
Jun 28, 2021 12:53 AM
·1Commits

Overview

We provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3850 unique ones annotated by experts in over 30000 street view images.
This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc.

Data Annotation

Overall information file (../data/annotations/info.json) is UTF-8 (no BOM) encoded JSON.
The data struct for this information file is described below.

information:
{
    train: [image_meta_0, image_meta_1, image_meta_2, ...],
    val: [image_meta_0, image_meta_1, image_meta_2, ...],
    test_cls: [image_meta_0, image_meta_1, image_meta_2, ...],
    test_det: [image_meta_0, image_meta_1, image_meta_2, ...],
}

image_meta:
{
    image_id: str,
    file_name: str,
    width: int,
    height: int,
}

train, val, test_cls, test_det keys denote to training set, validation set, testing set for classification, testing set for detection, respectively.
The resolution of each image is always 2048×2048.
Image ID is a 7-digits string, the first digit of image ID indicates the camera orientation in the following rule.

'0': back
'1': left
'2': front
'3': right

The file_name filed doesn't contain directory name, and is always image_id + '.jpg'.
More information about data annotation could be found here

Citation

@article{yuan2019ctw,
  author  = {Tai{-}Ling Yuan and Zhe Zhu and Kun Xu and Cheng{-}Jun Li and Tai{-}Jiang Mu and Shi{-}Min Hu},
  title   = {A Large Chinese Text Dataset in the Wild},
  journal = {Journal of Computer Science and Technology},
  volume  = {34},
  number  = {3},
  pages   = {509--521},
  year    = {2019},
}
🎉感谢DL数据集的贡献
数据集信息
应用场景暂无
标注类型暂无
任务类型暂无
LicenseCC BY-NC-SA 4.0
更新时间2020-12-31 17:33:50
数据概要
数据格式暂无
数据数量0
已标注数量0
文件大小24.84MB
版权归属方
TSINGHUA UNIVERSITY - TencentJoint Laboratory
标注方
未知
了解更多和支持