Ontologies and tag-statistics

被引:4
|
作者
Tibely, Gergely [2 ]
Pollner, Peter [1 ]
Vicsek, Tamas [1 ,2 ]
Palla, Gergely [1 ]
机构
[1] Stat & Biol Phys Res Grp HAS, H-1117 Budapest, Hungary
[2] Eotvos Lorand Univ, Dept Biol Phys, H-1117 Budapest, Hungary
来源
NEW JOURNAL OF PHYSICS | 2012年 / 14卷
关键词
COLLECTIVE DYNAMICS; COMMUNITY STRUCTURE; NETWORKS; IDENTIFICATION; EMERGENCE;
D O I
10.1088/1367-2630/14/5/053009
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems. We analyse the relation between the tag-frequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e. g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e. g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Improving Tag Clouds with Ontologies and Semantics
    Rinaldi, Antonio M.
    2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 139 - 143
  • [2] Using Ontologies for Official Statistics: The Istat Experience
    Aracri, Raffaella M.
    Radini, Roberta
    Scannapieco, Monica
    Tosco, Laura
    CURRENT TRENDS IN WEB ENGINEERING, ICWE 2017, 2018, 10544 : 166 - 172
  • [3] Improving Automatic Semantic Tag Recommendation through Fuzzy Ontologies
    Alexopoulos, Panos
    Wallace, Manolis
    2012 SEVENTH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION AND PERSONALIZATION (SMAP 2012), 2012, : 37 - 41
  • [4] Integrating Tagging into the Web of Data: Overview and Combination of Existing Tag Ontologies
    Kim, Hak-Lae
    Scerri, Simon
    Passant, Alexandre
    Breslin, John G.
    Kim, Hong-Gee
    JOURNAL OF INTERNET TECHNOLOGY, 2011, 12 (04): : 561 - 571
  • [5] Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics
    Lee, Sihyoung
    De Neve, Wesley
    Ro, Yong Man
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2010, 25 (10) : 761 - 773
  • [6] US statistics TAG to ISO/TC 69 seeks new members
    不详
    QUALITY PROGRESS, 2001, 34 (06) : 24 - 25
  • [7] The Research on Electronic Tag Quantity Estimate Arithmetic Based on Probability Statistics
    Zhou, Lin
    Li, Zhen
    Chen, Yingmei
    Li, Tong
    INTERNET OF THINGS-BK, 2012, 312 : 254 - +
  • [8] Formalized Conflicts Detection Based on the Analysis of Multiple Emails: An Approach Combining Statistics and Ontologies
    Zakaria, Chahnez
    Cure, Olivier
    Salzano, Gabriella
    Smaili, Kamel
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2009, PT 1, 2009, 5870 : 94 - +
  • [9] Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics
    Zhu, Rui
    Hu, Yingjie
    Janowicz, Krzysztof
    McKenzie, Grant
    TRANSACTIONS IN GIS, 2016, 20 (03) : 333 - 355
  • [10] Statistics of protein epitope signature tag design within the Swedish human proteom resource
    Szigyarto, C. Al-Khalili
    Berglund, L.
    Sivertsson, A.
    Lindskog, M.
    Rockberg, J.
    Westberg, J.
    Agaton, L.
    Persson, A.
    Uhlen, M.
    MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (10) : S91 - S91