Unknown Claims: Generation of Fact-Checking Training Examples from Unstructured and Structured Data

Jean-Flavien Bussotti; Luca Ragazzi; Giacomo Frisoni; Gianluca Moro; Paolo Papotti

Communication Dans Un Congrès Année : 2024

Unknown Claims: Generation of Fact-Checking Training Examples from Unstructured and Structured Data

(1) , (2) , (2) , (2) , (1)

1
2

Jean-Flavien Bussotti

Fonction : Auteur
PersonId : 1436946

Eurecom [Sophia Antipolis]

Luca Ragazzi

Fonction : Auteur
PersonId : 1436947

Department of Computer Science and Engineering, University of Bologna

Giacomo Frisoni

Fonction : Auteur
PersonId : 1436948

Department of Computer Science and Engineering, University of Bologna

Gianluca Moro

Fonction : Auteur
PersonId : 1436949

Department of Computer Science and Engineering, University of Bologna

Paolo Papotti

Fonction : Auteur
PersonId : 1436950

Eurecom [Sophia Antipolis]

Résumé

Computational fact-checking (FC) relies on supervised models to verify claims based on given evidence, requiring a resource-intensive process to annotate large volumes of training data. We introduce UNOWN, a novel framework that generates training instances for FC systems automatically using both textual and tabular content. UNOWN selects relevant evidence and generates supporting and refuting claims with advanced negation artifacts. Designed to be flexible, UNOWN accommodates various strategies for evidence selection and claim generation, offering unparalleled adaptability. We comprehensively evaluate UNOWN on both text-only and table+text benchmarks, including FEVEROUS, SCIFACT, and MMFC, a new multi-modal FC dataset. Our results prove that UNOWN examples are of comparable quality to expert-labeled data, even enabling models to achieve up to 5% higher accuracy.The code, data, and models are available at https: //github.com/disi-unibo-nlp/unown

Domaines

Informatique [cs]

Fichier principal

1411_Unknown_Claims_Generation.pdf (6.34 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Centre De Documentation Eurecom : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04768749

Soumis le : mercredi 6 novembre 2024-10:00:37

Dernière modification le : mercredi 13 novembre 2024-03:33:19

Dates et versions

hal-04768749 , version 1 (06-11-2024)

Identifiants

HAL Id : hal-04768749 , version 1

Citer

Jean-Flavien Bussotti, Luca Ragazzi, Giacomo Frisoni, Gianluca Moro, Paolo Papotti. Unknown Claims: Generation of Fact-Checking Training Examples from Unstructured and Structured Data. EMNLP 2024, Conference on Empirical Methods in Natural Language Processing, ACL, Nov 2024, Miami, United States. ⟨hal-04768749⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EURECOM ANR

0 Consultations

0 Téléchargements

Unknown Claims: Generation of Fact-Checking Training Examples from Unstructured and Structured Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager