Measuring complexity with multifractals in texts. Translation effects

被引:31
|
作者
Ausloos, M. [1 ]
机构
[1] Univ Liege, B-4000 Liege, Euroland, Belgium
关键词
LONG-RANGE CORRELATIONS; DETRENDED FLUCTUATION ANALYSIS; TIME-SERIES; LANGUAGE; STATISTICS; NETWORKS; ENGLISH; PHYSICS;
D O I
10.1016/j.chaos.2012.06.016
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Should quality be almost a synonymous of complexity? To measure quality appears to be audacious, even very subjective. It is hereby proposed to use a multifractal approach in order to quantify quality, thus through complexity measures. A one-dimensional system is examined. It is known that (all) written texts can be one-dimensional nonlinear maps. Thus, several written texts by the same author are considered, together with their translation, into an unusual language. Esperanto, and asa baseline their corresponding shuffled versions. Different one-dimensional time series can be used: e.g. (i) one based on word lengths, (ii) the other based on word frequencies: both are used for studying, comparing and discussing the map structure. It is shown that a variety in style can be measured through the D(q) and f(alpha) curves characterizing multifractal objects. This allows to observe on the one hand whether natural and artificial languages significantly influence the writing and the translation. and whether one author's texts differ technically from each other. In fact, the f(alpha) curves of the original texts are similar to each other, but the translated text shows marked differences. However in each case, the f(alpha) curves are far from being parabolic, - in contrast to the shuffled texts. Moreover, the Esperanto text has more extreme values. Criteria are thereby suggested for estimating a text quality, as if it is a time series only. A model is introduced in order to substantiate the findings: it consists in considering a text as a random Cantor set resulting from a binomial cascade of long and short words with appropriate weights. In an appendix, a connection is given with an analysis of turbulence by statistics based on Tsallis generalized entropy. In a second appendix, another view of text (language) complexity is outlined within the copying mistake map concept. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1349 / 1357
页数:9
相关论文
共 50 条