From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing

被引:0
|
作者
Anne-Stine Ruud Husevåg
机构
[1] Oslo Metropolitan University,
关键词
Named entity recognition; Multimedia indexing; Metadata; Audiovisual archives;
D O I
暂无
中图分类号
学科分类号
摘要
This paper explores the possible role of named entities extracted from text in subtitles in automatic indexing of TV programs. This is done by analyzing entity types, name density and name frequencies in subtitles and metadata records from different genres of TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Further analysis of the metadata records indicates an increase in use of named entities in metadata in accordance with the frequency the entities have in the subtitles. The most substantial difference was between a frequency of one or two, where the named entities with a frequency of two in the subtitles were twice as likely to be present in the metadata records. Personal names, geographical names and names of organizations were the most prominent entity types in both the news subtitles and news metadata, while persons, creative works and locations are the most prominent in culture programs. It is not possible to extract all the named entities in the manually created metadata records by applying named entity recognition to the subtitles for the same programs, but it is possible to find a large subset of named entities for some categories in certain genres. The results reported in this paper show that subtitles are a good source for personal names for all the genres covered in our study, and for creative works in literature programs. In total, it was possible to find 38% of the named entities in metadata records for news programs, 32% for literature programs, while 21% of the named entities in metadata records for talk shows were also present in the subtitles for the programs.
引用
收藏
页码:241 / 251
页数:10
相关论文
共 4 条
  • [1] From subtitles to substantial metadata: examining characteristics of named entities and their role in indexing
    Husevag, Anne-Stine Ruud
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2019, 20 (03) : 241 - 251
  • [2] Named Entities as a Metadata Resource for Indexing and Searching Information
    Izo, Flavio
    Oliveira, Elias
    Badue, Claudine
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 838 - 848
  • [3] Exploring the Role of Named Entities in Automatic Indexing
    Husevag, Anne-Stine Ruud
    [J]. CHIIR'17: PROCEEDINGS OF THE 2017 CONFERENCE HUMAN INFORMATION INTERACTION AND RETRIEVAL, 2017, : 393 - 394
  • [4] FROM LINGUISTICS TO ONTOLOGIES The Role of Named Entities in the Conceptualisation Process
    Omrane, Nouha
    Nazarenko, Adeline
    Szulman, Sylvie
    [J]. KEOD 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND ONTOLOGY DEVELOPMENT, 2011, : 249 - 254