Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”
Rapport d’accompagnement à la thèse de doctorat: "Modèles d’Alignement Probabilistes Génératifs pour les Mots et Sous-mots: une Exploration Systématique des Limites et Potentialités des Paramétrisations Neuronales”
Résumé
This is a companion document to the Ph.D. dissertation "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations” [Ngo Ho, 2021]. This document contains an exhaustive collection of graphs and tables related the analysis of various aspects of automatic word alignment, such as for instance the aligned/unaligned words, rare/unknown words, function/content words, word order divergences, etc; and for six language pairs: English with French, German, Romanian, Czech, Japanese and Vietnamese. We mostly analyze statistical word alignment models (Giza++ and Fastalign) as well as several variants based on neural models: IBM style word alignment models including context-independent models, contextual models, and character-based models; variants of a fully generative neural model based on variational autoencoders. We also document a deep analysis for Byte-Pair-Encoding, a subword tokenization algorithm. For information regarding these various methods, please refer to the thesis.
Domaines
Informatique [cs]
Fichier principal
Companion report to the PhD dissertation - Generative Probabilistic Alignment Models for Words and Subwords - a Systematic Exploration of the Limits and Potentials of Neural Parametrizations.pdf (67.04 Mo)
Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)