Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks
[摘要] Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. Authors of texts identified by the competitive neural networks, which use these effective features.
[发布日期] [发布机构]
[效力级别] [学科分类] 计算数学
[关键词] principal components;authorship attribution;stylometry;text categorization;stylistic features;syntactic characteristics;multilayer preceptor;competitive learning;artificial neural network [时效性]