graviti logo产品公开数据集关于我们
登录
2.3K
1
33
20 Newsgroups
概要
讨论
代码
活动
v0.0.1
40c0c18·
Jun 29, 2021 1:49 PM
·1Commits
Initial commit

Overview

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. To the best of my knowledge, it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.
The data is organized into 20 different newsgroups, each corresponding to a different topic. Some of the newsgroups are very closely related to each other (e.g. comp.sys.ibm.pc.hardware / comp.sys.mac.hardware), while others are highly unrelated (e.g misc.forsale / soc.religion.christian).

Citation

Please use the following citation when referencing the dataset:

@inproceedings{Lang95
author = {Ken Lang},
title = {Newsweeder: Learning to filter netnews},
year = {1995},
booktitle = {Proceedings of the Twelfth International Conference on Machine Learning},
pages = {331-339},
}
数据预览
查看数据
🎉感谢Data Decorators的贡献
数据集信息
应用场景NLP
标注类型ClassificationText
任务类型暂无
LicenseUnknown
更新时间2021-03-24 23:26:15
数据概要
数据格式Text
数据数量57.67K
已标注数量59670
文件大小111MB
版权归属方
Jason Rennie
标注方
未知
了解更多和支持
立即开始构建AI
免费开始联系我们