已收录 268921 条政策
 政策提纲
  • 暂无提纲
Syntactically annotated Ngrams for Google Books
[摘要] In this thesis, we present a new edition of the Google Books Ngram Corpus, describing how often words and phrases were used over a period of five centuries, in eight languages; it aggregates data from 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and head-modifier dependency relationships are recorded. We generate these annotations automatically from the Google Books text, using statistical models that are specifically adapted to the historical text found in these books. The new edition will facilitate the study of linguistic trends, especially those related to the evolution of syntax. We present our initial findings from the annotated Ngrams in the new edition, including studies of the change in various words;; primary parts of speech over time, and to find the words most closely related to a given set of topics.
[发布日期]  [发布机构] Massachusetts Institute of Technology
[效力级别]  [学科分类] 
[关键词]  [时效性] 
   浏览次数:4      统一登录查看全文      激活码登录查看全文