graviti logo产品公开数据集关于我们
登录
252
0
20
The Street View Text
创建来自Hello Dataset / Robert
概要
活动

Overview

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses.

Data Collection

We used Amazon's Mechanical Turk to harvest and label the images from Google Street View. To build the data set, we created several Human Intelligence Tasks (HITs) to be completed on Mechanical Turk.

Workers are assigned a unique city and are requested to acquire 20 images that contain text from Google Street view. They were instructed to: (1) perform a Search Nearby:* on their city, (2) examine the businesses in the search results, and (3) look at the associated street view for images containing text from the business name. If words are found, they compose the scene to minimize skew, save a screen shot, and record the business name and address.

Data Annotation

Workers are presented with an image and a list of candidate words to label with bounding boxes. This contrasts with the ICDAR Robust Reading data set in that we only label words associated with businesses. We used Alex Sorokin's Annotation Toolkit to support bounding box image annotation. For each image, we obtained a list of local business names using the Search Nearby:* in Google Maps at the image's address. We stored the top 20 business results for each image, typically resulting in 50 unique words. To summarize, the SVT data set consists of images collected from Google Street View, where each image is annotated with bounding boxes around words from businesses around where the image was taken.

Citation

@inproceedings{wang2011end,
  title={End-to-end scene text recognition},
  author={Wang, Kai and Babenko, Boris and Belongie, Serge},
  booktitle={2011 International Conference on Computer Vision},
  pages={1457--1464},
  year={2011},
  organization={IEEE}
}
@inproceedings{wang2010word,
  title={Word spotting in the wild},
  author={Wang, Kai and Belongie, Serge},
  booktitle={European Conference on Computer Vision},
  pages={591--604},
  year={2010},
  organization={Springer}
}
数据集信息
应用场景OCR/Text Detection
标注类型Box2D
LicenseUnknown
更新时间2021-03-24 22:49:55
数据概要
数据格式Image
数据数量0
文件大小113KB
标注数量0
版权归属方
Department of Computer Science and Engineering, University of California, San Diego
标注方
未知
了解更多和支持
相关数据集
SVHN
创建来自AChenQ
COCO-Text
创建来自Robert
MLT-2019
创建来自Robert
立即开始构建AI
免费开始联系我们