PRINCIPAL COMPONENT ANALYSIS FOR AUTHORSHIP ATTRIBUTION
被引:0
|
作者:
Jamak, Amir
论文数: 0引用数: 0
h-index: 0
机构:
Int Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & HercegInt Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & Herceg
Jamak, Amir
[1
]
Savatic, Alen
论文数: 0引用数: 0
h-index: 0
机构:
Int Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & HercegInt Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & Herceg
Savatic, Alen
[1
]
Can, Mehmet
论文数: 0引用数: 0
h-index: 0
机构:
Int Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & HercegInt Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & Herceg
Can, Mehmet
[1
]
机构:
[1] Int Univ Sarajevo, Fac Engn & Nat Sci, Hrasnicka Cesta 15, Sarajevo 71000, Bosnia & Herceg
principal components;
authorship attribution;
stylometry;
text categorization;
function words;
classification task;
stylistic features;
syntactic characteristics;
D O I:
暂无
中图分类号:
C93 [管理学];
O22 [运筹学];
学科分类号:
070105 ;
12 ;
1201 ;
1202 ;
120202 ;
摘要:
A common problem in statistical pattern recognition is that of feature selection or feature extraction. Feature selection refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. In this paper the data collected by counting words and characters in around a thousand paragraphs of each sample book underwent a principal component analysis performed using heural networks. Then first of the principal components is used to distinguished the books authored by a certain author.