Big Data Analytics, Text Mining and Modern English Language

被引:4
|
作者
Alam, Saqib [1 ]
Yao, Nianmin [1 ]
机构
[1] Dalian Univ Technol, Dept Elect Informat & Elect Engn, Black Bldg,Linggong Rd 2, Dalian 116024, Peoples R China
关键词
Text mining; TF-IDF; English language; Speed of linguistic changes; DOCUMENT FREQUENCY; EVOLUTION;
D O I
10.1007/s10723-018-9452-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The modern English Language took centuries to convert from old English. The word hath' of old English for example, has taken centuries to become have' in the modern English Language. If these changes had not been occurred there would not have been the possibility of modern words. A text written in fifteen century can be difficult to read and if we go back a couple of more centuries, it would be like reading a different language. In this paper, we have used the text mining techniques to analyze the old and modern English languages. We have introduced the Common-Words Counting algorithm that identifies common words of 15(th) century that diminishes gradually in the later centuries. We computed the speed of linguistic changes and identified the reasons behind them. For this purpose, 34000 text books were downloaded from Project Gutenberg of different authors, between 15(th) to 19(th) centuries. These books were categorized into five centuries in the range from 15(th) to 19(th) centuries. We selected most common words from the books of 15(th) century and calculated their frequencies in other centuries. We calculated the sum of Term Frequency-Inverse Document Frequency (TF-IDF) of these words and proved that frequencies of words were decreasing from 15(th) century to 19(th) century with some words even disappeared in other centuries, such as doth', hath', punt, guise and selfe'. We calculated the speed of changing of words using the slope formula. We proved that the words were changing during each century with the speed of changing of words being the lowest during 16(th) - 17(th) centuries and the highest during 18(th) - 19(th) centuries which shows that the old words or their spellings were changed to the modern words during 18(th) - 19(th) centuries. The industrialization, modernization, and British Empire invasion were the key factors, which changed the old English language into modern English language.
引用
收藏
页码:357 / 366
页数:10
相关论文
共 50 条
  • [21] Big data text analytics: an enabler of knowledge management
    Khan, Zaheer
    Vorley, Tim
    [J]. JOURNAL OF KNOWLEDGE MANAGEMENT, 2017, 21 (01) : 18 - 34
  • [22] Big Data Analytics Using Data Mining Techniques: A Survey
    Mittal, Shweta
    Sangwan, Om Prakash
    [J]. ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 264 - 273
  • [23] Big Data Visualisation and Visual Analytics for Music Data Mining
    Barkwell, Katrina E.
    Cuzzocrea, Alfredo
    Leung, Carson K.
    Ocran, Ashley A.
    Sanderson, Jennifer M.
    Stewart, James Ayrton
    Wodi, Bryan H.
    [J]. 2018 22ND INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2018, : 235 - 240
  • [24] A Survey of Social Media, Big Data, Data Mining, and Analytics
    Oliverio, Jared
    [J]. JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2018, 3 (03):
  • [25] RETRACTED: Big Data Analytics integrated AAC Framework for English language teaching (Retracted Article)
    Zhao, Yang
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (2) : 291 - 304
  • [26] Artificial intelligence and big data analytics in mining geomechanics
    McGaughey, J.
    [J]. JOURNAL OF THE SOUTHERN AFRICAN INSTITUTE OF MINING AND METALLURGY, 2020, 120 (01) : 15 - 21
  • [27] A Big Data Analytics Framework for Supporting Multidimensional Mining over Big Healthcare Data
    Bochicchio, Mario
    Cuzzocrea, Alfredo
    Vaira, Lucia
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 508 - 513
  • [28] English Research Based on Big Data and Data Mining
    Chen, Dafeng
    Han, Bingqing
    [J]. MATERIAL SCIENCE, CIVIL ENGINEERING AND ARCHITECTURE SCIENCE, MECHANICAL ENGINEERING AND MANUFACTURING TECHNOLOGY II, 2014, 651-653 : 2462 - 2465
  • [29] Understanding Big Data Analytics Workloads on Modern Processors
    Jia, Zhen
    Zhan, Jianfeng
    Wang, Lei
    Luo, Chunjie
    Gao, Wanling
    Jin, Yi
    Han, Rui
    Zhang, Lixin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1797 - 1810
  • [30] VIM: A Big Data Analytics Tool for Data Visualization and Knowledge Mining
    Arafat, Sk. Shariful Islam
    Hossain, Md Shakil
    Hasan, Md. Mahmudul
    Imam, S. M. Al-Hossain
    Islam, Md. Mofijul
    Saha, Sanjay
    Shatabda, Swakkhar
    Juthi, Tamanna Islam
    [J]. 2017 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2017), 2017, : 224 - 227