Over-Approximating Neural Networks for Verification, Robustness and Explainability

Abstract

We show in this thesis several applications of over-approximations of neural networks to improve trust in AI systems in three areas: verifications, robust training and (formal) explainability.

Remarks

The template used is Kaobook. My academic sibling Denis Mazzucato also used it, you may find his source files useful.

Long abstract

The success of deep learning since the early 2010s has led to its adoption in a wide range of applications, from image classification to natural language processing. However, deploying AI models — especially in safety-critical applications — requires strong behavioral guarantees as well as means to understand and explain their predictions. Robustness — the ability of a model to maintain performance under small input changes — is an important safety requirement and helps in formally explaining model predictions.

Over-approximation techniques, introduced by the machine learning and formal methods communities, have emerged as a key tool to verify neural networks and to train them with provable robustness goals. By bounding the range of possible network outputs, they make it possible to prove desirable properties for all possible inputs within a specified input domain. In this thesis we investigate how over-approximations can be effectively exploited to further improve the verification, training, and formal explainability of neural networks.

Such over-approximations are essential to make the analysis tractable for neural networks of moderate scale, but they also tend to be imprecise, in particular for networks trained with no robustness goals. In a first contribution we leverage key information from over-approximations to efficiently tighten bounds when verifying trained networks with low-dimensional input spaces. This technique is implemented in PyRAT, a neural network verification tool, and contributed to its success in several benchmarks of the latest edition of the International Verification of Neural Networks Competition (VNN-COMP 2024).

A second contribution shows that certified training - using over-approximations to minimize a sound bound on the worst-case loss - can be mixed with adversarial training - computing the loss over perturbed training samples - to improve empirical robustness and prevent catastrophic overfitting, a failure mode of single-step adversarial training, under specific experimental settings.

Finally, we propose to use over-approximations to train for formal explainability. To avoid the trivial case where robustness voids all explanations, we introduce Feature Subset Certified Training, a new scheme that enforces robustness only over selected subsets of input features, a first step towards better trade-offs between accuracy and conciseness of the explanations.

Together, these contributions illustrate how over-approximations can be leveraged towards better robustness and explainability, supporting the development of safer and more trustworthy AI systems.