When adversarial attacks become interpretable counterfactual explanations

Mathieu Serrurier; Franck Mamalet; Thomas Fel; Louis Béthune; Thibaut Boissin

Pré-Publication, Document De Travail Année : 2022

When adversarial attacks become interpretable counterfactual explanations

, , , ,

Mathieu Serrurier

Fonction : Auteur
PersonId : 740736
IdHAL : mathieu-serrurier
ORCID : 0000-0002-8959-1091
IdRef : 116980206

Franck Mamalet

Fonction : Auteur
PersonId : 751026
IdHAL : franck-mamalet

Thomas Fel

Fonction : Auteur
PersonId : 750192
IdHAL : thomas-fel
ORCID : 0000-0002-2202-4615

Louis Béthune

Fonction : Auteur
PersonId : 1173856
IdHAL : louis-bethune

Thibaut Boissin

Fonction : Auteur
PersonId : 1196816
IdHAL : thibaut-boissin

Résumé

We argue that, when learning a 1-Lipschitz neural network with the dual loss of an optimal transportation problem, the gradient of the model is both the direction of the transportation plan and the direction to the closest adversarial attack. Traveling along the gradient to the decision boundary is no more an adversarial attack but becomes a counterfactual explanation, explicitly transporting from one class to the other. Through extensive experiments on XAI metrics, we find that the simple saliency map method, applied on such networks, becomes a reliable explanation, and outperforms the state-of-the-art explanation approaches on unconstrained models. The proposed networks were already known to be certifiably robust, and we prove that they are also explainable with a fast and simple method.

Domaines

Machine Learning [stat.ML] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

hkr_explainability_Arxiv.pdf (28.44 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Franck MAMALET : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03693355

Soumis le : vendredi 10 juin 2022-14:21:44

Dernière modification le : mercredi 28 juin 2023-03:58:21

Dates et versions

hal-03693355 , version 1 (10-06-2022)

hal-03693355 , version 2 (20-06-2023)

hal-03693355 , version 3 (02-02-2024)

Identifiants

HAL Id : hal-03693355 , version 1
ARXIV : 2206.06854

Citer

Mathieu Serrurier, Franck Mamalet, Thomas Fel, Louis Béthune, Thibaut Boissin. When adversarial attacks become interpretable counterfactual explanations. 2022. ⟨hal-03693355v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

196 Consultations

96 Téléchargements

When adversarial attacks become interpretable counterfactual explanations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager