A Word Elimination Strategy for Learning Document Representation
[摘要] Computing word vectors based on neural network has motivations on document representation. Word elimination can enhance the extract effective of the quality of valuable feature of a document. In this paper, we propose a model named PV-IDF to eliminate redundancy and refine features to improve the performance of the classify model, with which tokens that carry semantic information of a document. The results show that PV-IDF model achieves state-of-art performance, especially for short-length document representation.
[发布日期] [发布机构] Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, China^1;School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, China^2
[效力级别] 无线电电子学 [学科分类] 计算机科学(综合)
[关键词] Document Representation;Semantic information;State-of-art performance;Word vectors [时效性]