Detecting adversarial manipulation using inductive Venn-ABERS predictors
[摘要] Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable. (C) 2020 The Author(s). Published by Elsevier B.V.
[发布日期] 2020-11-27 [发布机构]
[效力级别] [学科分类]
[关键词] Adversarial robustness;Conformal prediction;Supervised learning;Deep learning [时效性]