CLUO: WEB-SCALE TEXT MINING SYSTEM FOR OPEN SOURCE INTELLIGENCE PURPOSES

被引:4
|
作者
Maciolek, Przemyslaw [1 ,2 ]
Dobrowolski, Grzegorz [2 ]
机构
[1] Luminis Res Sp Zoo, Rzeszow, Poland
[2] AGH Univ Sci & Technol, Krakow, Poland
来源
COMPUTER SCIENCE-AGH | 2013年 / 14卷 / 01期
关键词
Text Mining; Big Data; OSINT; Natural Language Processing; monitoring;
D O I
10.7494/csci.2013.14.1.45
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The amount of textual information published on the Internet is considered to be in billions of web pages, blog posts, comments, social media updates and others. Analyzing such quantities of data requires high level of distribution - both data and computing. This is especially true in case of complex algorithms, often used in text mining tasks. The paper presents a prototype implementation of CLUO - an Open Source Intelligence (OSINT) system, which extracts and analyzes significant quantities of openly available information.
引用
收藏
页码:45 / 62
页数:18
相关论文
共 50 条
  • [1] Source Retrieval for Web-Scale Text Reuse Detection
    Hagen, Matthias
    Potthast, Martin
    Adineh, Payam
    Fatehifar, Ehsan
    Stein, Benno
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2091 - 2094
  • [2] Web Mining for Open Source Intelligence
    Best, Clive
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION, 2008, : 321 - 325
  • [3] Constructing and Mining Web-Scale Knowledge Graphs
    Bordes, Antoine
    Gabrilovich, Evgeniy
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1967 - 1967
  • [4] Constructing and Mining Web-Scale Knowledge Graphs
    Gabrilovich, Evgeniy
    Usunier, Nicolas
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 1195 - 1197
  • [5] OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
    Laurenconu, Hugo
    Saulnieru, Lucile
    Tronchonu, Leo
    Bekmanu, Stas
    Singhu, Amanpreet
    Lozhkov, Anton
    Wang, Thomas
    Karamcheti, Siddharth
    Rush, Alexander M.
    Kiela, Douwe
    Cord, Matthieu
    Sanhu, Victor
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Building web-scale data mining infrastructure for search
    Ma, Wei-Ying
    [J]. PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 9 - 9
  • [7] Stalker, a Multilingual Text Mining Search Engine for Open Source Intelligence
    Neri, F.
    Pettoni, M.
    [J]. PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE IN SECURITY FOR INFORMATION SYSTEMS CISIS 2008, 2009, 53 : 35 - +
  • [8] Stalker, a multilingual text mining search engine for Open Source Intelligence
    Neri, F.
    Pettoni, Ten Col. M.
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION, 2008, : 314 - 320
  • [9] Candidate Document Retrieval for Web-Scale Text Reuse Detection
    Hagen, Matthias
    Stein, Benno
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 356 - 367
  • [10] A Web-scale system for scientific knowledge exploration
    Shen, Zhihong
    Ma, Hao
    Wang, Kuansan
    [J]. 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 87 - 92