On Using Certified Training towards Empirical Robustness

Abstract

This paper investigates whether certified training methods—typically designed to provide formal robustness guarantees—can also yield strong empirical robustness against adversarial attacks. Focusing on the expressive-loss variant of Interval Bound Propagation (Exp-IBP), the authors show that certified training can prevent catastrophic overfitting, a common failure mode in single-step adversarial training. Surprisingly, when properly tuned, certified training can match or even outperform certain multi-step adversarial training baselines while remaining computationally efficient. The paper also introduces a lightweight regularizer on network over-approximations that achieves similar empirical robustness improvements at significantly lower computational cost. These findings challenge the conventional belief that certified defenses must sacrifice empirical performance and suggest that certified training can serve as a practical and cost-effective alternative for robust model development.

Publication
In Transactions on Machine Learning Research (TMLR).

More detail can easily be written here using Markdown and $\LaTeX$ math code.

Related