Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

被引:0
|
作者
Jalalzai, Hamid [1 ]
Colombo, Pierre [1 ]
Clavel, Chloe [1 ]
Gaussier, Eric [2 ]
Varni, Giovanna [1 ]
Vignon, Emmanuel [3 ]
Sabourin, Anne [1 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Paris, France
[2] Univ Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
[3] IBM France, Bois Colombes, France
关键词
MULTIVARIATE EXTREMES; NOVELTY DETECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a scale invariance property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attributes, e.g. positive or negative sentiments.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Comparison and Classification of Flexible Distributions for Multivariate Skew and Heavy-Tailed Data
    Babic, Sladana
    Ley, Christophe
    Veredas, David
    [J]. SYMMETRY-BASEL, 2019, 11 (10):
  • [2] Asymptotic Expansions for Heavy-Tailed Data
    Pastor, Giancarlo
    Mora-Jimenez, Inmaculada
    Caamano, Antonio J.
    Jantti, Riku
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) : 444 - 448
  • [3] On Fuzzy Clustering for Heavy-Tailed Data
    Taheri, S. Mahmoud
    Mohammadpour, A.
    Atiyah, Israa
    [J]. 2017 5TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2017, : 202 - 206
  • [4] Head/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution
    Jiang, Bin
    [J]. PROFESSIONAL GEOGRAPHER, 2013, 65 (03): : 482 - 494
  • [5] Threshold selection for multivariate heavy-tailed data
    Wan, Phyllis
    Davis, Richard A.
    [J]. EXTREMES, 2019, 22 (01) : 131 - 166
  • [6] On Uniqueness of Moving Average Representations of Heavy-tailed Stationary Processes
    Gourieroux, Christian
    Zakoian, Jean-Michel
    [J]. JOURNAL OF TIME SERIES ANALYSIS, 2015, 36 (06) : 876 - 887
  • [7] Anytime Guarantees under Heavy-Tailed Data
    Holland, Matthew J.
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6918 - 6925
  • [8] Robust Nonparametric Regression for Heavy-Tailed Data
    Gorji, Ferdos
    Aminghafari, Mina
    [J]. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2020, 25 (03) : 277 - 291
  • [9] Heavy-tailed distributions for building stock data
    Bradley, Patrick Erik
    Behnisch, Martin
    [J]. ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2019, 46 (07) : 1281 - 1296
  • [10] Semiparametric exponential families for heavy-tailed data
    Fithian, William
    Wager, Stefan
    [J]. BIOMETRIKA, 2015, 102 (02) : 486 - 493