graviti logo产品公开数据集关于我们
登录
450
0
18
Stanford Sentiment Treebank
创建来自Hello Dataset / Robert
概要
代码
活动

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt file. The Matlab code of this paper will show you how to read this format if you are not familiar with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt contains the sentence index (corresponding to the index in datasetSentences.txt file) followed by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning
and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}
数据集信息
应用场景NLP
标注类型Text
LicenseUnknown
更新时间2021-03-24 22:54:02
数据概要
数据格式Text
数据数量0
已标注数量0
文件大小11KB
版权归属方
Stanford
标注方
未知
了解更多和支持
立即开始构建AI
免费开始联系我们