Skip to content

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective


Notifications You must be signed in to change notification settings


Repository files navigation

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective [arXiv]


This work presents a holistic study of the impact of architectural choice on adversarial robustness.

(Left) Impact of architectural components on adversarial robustness on CIFAR-10, relative to that of adversarial training methods. (Right) Progress of SotA robust accuracy against AutoAttack without additional data on CIFAR-10 with $\ell_{\infty}$ perturbations of $\epsilon=8/255$ chronologically.

Impact of Block-level Design

The design of a block primarily comprises its topology, type of convolution and kernel size, choice of activation, and normalization. We examine these elements independently through controlled experiments and propose a novel residual block, dubbed RobustResBlock, based on our observations. An overview of RobustResBlock is provided below:


Table 1. White-box adversarial robustness of WRN with RobustResBlock

$^{\#}\rm{P}$ $^{\#}\rm{F}$ $\rm{PGD}^{20}$ $\rm{CW}^{40}$
$D=4$, $W=10$ 39.6M 6.00G 57.70 54.71 [BaiduDisk]
$D=5$, $W=12$ 70.5M 10.6G 58.46 55.56 [BaiduDisk]
$D=7$, $W=14$ 133M 19.6G 59.41 56.62 [BaiduDisk]
$D=11$, $W=16$ 270M 39.3G 60.48 57.78 [BaiduDisk]

Impact of Network-level Design

Independent Scaling by Depth ( $D_{1}$ : $D_2$ : $D_3$ = $2$ : $2$ : $1$ )

We allow the depth of each stage ( $D_{i\in\{1,2,3\}}$ ) to vary among $\{2, 3, 4, 5, 7, 9, 11\}$, details and pre-trained checkpoints of $7^{3} = 343$ depth settings are available from here.


Independent Scaling by Width ( $W_{1}$ : $W_2$ : $W_3$ = $2$ : $2.5$ : $1$ )

We allow the width (in terms of widening factors) of each stage ( $W_{i\in\{1,2,3\}}$ ) to vary among $\{4, 6, 8, 10, 12, 14, 16, 20\}$, details and pre-trained checkpoints of $8^{3} = 512$ width settings are available from here.


Interplay between Depth and Width ( $\sum D_{i}$ : $\sum W_{i}$ = $7$ : $3$ )



Table 2. Performance of independent scaling ( $D$ or $W$ ) and compound scaling ( $D\&W$ )

$^{\#}\rm{F}$ Target Scale by $D_{1}$ $W_{1}$ $D_{2}$ $W_{2}$ $D_{3}$ $W_{3}$ $^{\#}\rm{P}$ $^{\#}\rm{F}$ $\rm{PGD}^{20}$ $\rm{CW}^{40}$
$D$ 5 10 5 10 2 10 24.0M 5.25G 56.05 53.14 [BaiduDisk]
5G $W$ 4 11 4 13 4 6 24.5M 5.71G 56.89 53.87 [BaiduDisk]
$D\&W$ 14 5 14 7 7 3 17.7M 5.09G 57.49 54.78 [BaiduDisk]
$D$ 6 12 6 12 3 12 48.5M 9.59G 56.42 53.91 [BaiduDisk]
10G $W$ 5 13 5 16 5 7 44.4M 10.5G 57.06 54.29 [BaiduDisk]
$D\&W$ 17 7 17 9 8 4 39.3M 9.74G 58.06 55.45 [BaiduDisk]
$D$ 9 14 8 14 4 14 90.4M 18.6G 57.11 54.48 [BaiduDisk]
20G $W$ 7 16 7 18 7 8 81.7M 20.4G 58.02 55.34 [BaiduDisk]
$D\&W$ 22 8 22 11 11 5 74.8M 20.3G 58.47 56.14 [BaiduDisk]
$D$ 14 16 13 16 11 16 185M 38.8G 57.90 55.79 [BaiduDisk]
40G $W$ 11 18 11 21 11 9 170M 42.7G 58.48 56.15 [BaiduDisk]
$D\&W$ 27 10 28 14 13 6 147M 40.4G 58.76 56.59 [BaiduDisk]

Adversarially Robust Residual Networks (RobustResNets)

We use the proposed compound scaling rule to scale RobustResBlock and present a portfolio of adversarially robust residual networks.

Table 3. Comparison to SotA methods with additional 500K data

Method Model $^{\#}\rm{P}$ $^{\#}\rm{F}$ $\rm{AA}$
RST WRN-28-10 36.5M 5.20G 59.53
AWP WRN-28-10 36.5M 5.20G 60.04
HAT WRN-28-10 36.5M 5.20G 62.50
Gowal et al. WRN-28-10 36.5M 5.20G 62.80
Huang el al. WRN-34-R 68.1M 19.1G 62.54
Ours RobustResNet-A1 19.2M 5.11G 63.70 [BaiduDisk]
Ours WRN-A4 147M 40.4G 65.79 [BaiduDisk]

How to use

1. Use our RobustResNets

  from models.resnet import PreActResNet
  depth = [D1, D2, D3]
  channels = [16, 16*W1, 32*W2, 64*W3]
  block_types = ['robust_res_block', 'robust_res_block', 'robust_res_block']
  # Syntax
  model = PreActResNet(
    num_classes=10,  # for CIFAR-10/SVHN/MNIST)
  # See Table 2 "D&W" rows for D1, D2, D3 and W1, W2, W3, see below for examples
  RobustResNet-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
  RobustResNet-A2 = PreActResNet(
    depth_configs=[17, 17, 8],
    channel_configs=[7, 9, 4],
  RobustResNet-A3 = PreActResNet(
    depth_configs=[22, 22, 11],
    channel_configs=[8, 11, 5],
  RobustResNet-A4 = PreActResNet(
    depth_configs=[27, 28, 13],
    channel_configs=[10, 14, 6],
  # If you prefer to use WRN's block but with our scalings
  WRN-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
    block_types = ['basic_block', 'basic_block', 'basic_block']

2. Just want to use our block RobustResBlock

  from models.resnet import RobustResBlock
  # See Table 1 above for the performance of RobustResBlock
  block = RobustResBlock(
    in_chs, out_chs,

3. Use our compound scaling rule, RobustScaling, to scale your custom models

Please see examples/compound_scaling.ipynb

How to evaluate pre-trained models

  • Download the checkpoints, which should contain the following:
      -arch_xxx.log  # training log
      -arch_xxx.yaml  # configuration file 
        -arch_xxx.pth  # last epoch checkpoint
        -arch_xxx_best.pth  # checkpoint for best robust acc on valid set
  • Run the following lines to evaluate adversarial robustness
  python \
    --data "path to data" \
    --config_file_path "path to configuration yaml file" \
    --checkpoint_path "path to checkpoint pth file" \
    --save_path "path to file for logging evaluation" \
    --attack_choice [FGSM/PGD/CW/AA] \
    --num_steps [1/20/40/0] \
    --batch_size 100  # batch size for evaluation, adjust according to your GPU memory


Model $^{\#}\rm{P}$ $^{\#}\rm{F}$ Clean $\rm{PGD}^{20}$ $\rm{CW}^{40}$ AA
WRN-28-10 36.5M 5.20G 84.62 55.90 53.15 51.66 [BaiduDisk]
RobNet-large-v2 33.3M 5.10G 84.57 52.79 48.94 47.48 [BaiduDisk]
AdvRush 32.6M 4.97G 84.95 56.99 53.27 52.90 [BaiduDisk]
RACL 32.5M 4.93G 83.91 55.98 53.22 51.37 [BaiduDisk]
RRN-A1 (ours) 19.2M 5.11G 85.46 58.47 55.72 54.42 [BaiduDisk]
WRN-34-12 66.5M 9.60G 84.93 56.01 53.53 51.97 [BaiduDisk]
WRN-34-R 68.1M 19.1G 85.80 57.35 54.77 53.23 [BaiduDisk]
RRN-A2 (ours) 39.0M 10.8G 85.80 59.72 56.74 55.49 [BaiduDisk]
WRN-46-14 128M 18.6G 85.22 56.37 54.19 52.63 [BaiduDisk]
RRN-A3 (ours) 75.9M 19.9G 86.79 60.10 57.29 55.84 [BaiduDisk]
WRN-70-16 267M 38.8G 85.51 56.78 54.52 52.80 [BaiduDisk]
RRN-A4 (ours) 147M 39.4G 87.10 60.26 57.90 56.29 [BaiduDisk]


Model $^{\#}\rm{P}$ $^{\#}\rm{F}$ Clean $\rm{PGD}^{20}$ $\rm{CW}^{40}$ AA
WRN-28-10 36.5M 5.20G 56.30 29.91 26.22 25.26 [BaiduDisk]
RobNet-large-v2 33.3M 5.10G 55.27 29.23 24.63 23.69 [BaiduDisk]
AdvRush 32.6M 4.97G 56.40 30.40 26.16 25.27 [BaiduDisk]
RACL 32.5M 4.93G 56.09 30.38 26.65 25.65 [BaiduDisk]
RRN-A1 (ours) 19.2M 5.11G 59.34 32.70 27.76 26.75 [BaiduDisk]
WRN-34-12 66.5M 9.60G 56.08 29.87 26.51 25.47 [BaiduDisk]
WRN-34-R 68.1M 19.1G 58.78 31.17 27.33 26.31 [BaiduDisk]
RRN-A2 (ours) 39.0M 10.8G 59.38 33.00 28.71 27.68 [BaiduDisk]
WRN-46-14 128M 18.6G 56.78 30.03 27.27 26.28 [BaiduDisk]
RRN-A3 (ours) 75.9M 19.9G 60.16 33.59 29.58 28.48 [BaiduDisk]
WRN-70-16 267M 38.8G 56.93 29.76 27.20 26.12 [BaiduDisk]
RRN-A4 (ours) 147M 39.4G 61.66 34.25 30.04 29.00 [BaiduDisk]


Model $^{\#}\rm{P}$ $^{\#}\rm{F}$ $\rm{PGD}^{20}$ $\rm{CW}^{40}$
WRN-28-10 36.5M 5.20G 52.44 50.97 [BaiduDisk]
RRN-A1 (ours) 19.2M 5.11G 57.62 56.06 [BaiduDisk]
WRN-34-12 66.5M 9.60G 52.85 51.36 [BaiduDisk]
RRN-A2 (ours) 39.0M 10.8G 58.39 56.99 [BaiduDisk]
WRN-46-14 128M 18.6G 53.67 52.95 [BaiduDisk]
RRN-A3 (ours) 75.9M 19.9G 58.81 57.60 [BaiduDisk]
WRN-70-16 267M 38.8G 54.12 50.52 [BaiduDisk]
RRN-A4 (ours) 147M 39.4G 59.01 57.85 [BaiduDisk]


Model $^{\#}\rm{P}$ $^{\#}\rm{F}$ $\rm{PGD}^{20}$ $\rm{CW}^{40}$
WRN-28-10 36.5M 5.20G 57.69 52.88 [BaiduDisk]
RRN-A1 (ours) 19.2M 5.11G 59.34 54.42 [BaiduDisk]
WRN-34-12 66.5M 9.60G 57.40 53.11 [BaiduDisk]
RRN-A2 (ours) 39.0M 10.8G 60.33 55.51 [BaiduDisk]
WRN-46-14 128M 18.6G 58.43 54.32 [BaiduDisk]
RRN-A3 (ours) 75.9M 19.9G 60.95 56.52 [BaiduDisk]
WRN-70-16 267M 38.8G 58.15 54.37 [BaiduDisk]
RRN-A4 (ours) 147M 39.4G 61.88 57.55 [BaiduDisk]

How to train

Baseline adversarial training

python -m torch.distributed.launch \
  --nproc_per_node=2 --master_port 24220 \  # use a random port number \
  --config_path ./configs/CIFAR10 \
  --exp_name ./exps/CIFAR10 \  # path to where you want to store training stats
  --version [WRN-A1/A2/A3/A4] \  # you may also change it to RobustResNet-A1/A2/A3/A4
  --train \ 
  --data_parallel \

Advanced adversarial training

Please download the additional pseudolabeled data from Carmon et al., 2019.

python -m torch.distributed.launch \
  --nproc_per_node=8 --master_port 14226 \  # use a random port number \
  --log-dir ./checkpoints/ \  # path to where you want to store training stats
  --config-path ./configs/Advanced_CIFAR10
  --version [WRN-A1/A2/A3/A4] \ 
  --desc drna4-basic-silu-apex-500k \  # name of the folder for storing training stats
  --apex-amp --adv-eval-freq 5 \  # evaluation frequency, will significantly slow down your training if too often
  --start-eval 310 \  # start evaluating after N epochs
  --apex_amp --advnorm --adjust_bn True \
   --num-adv-epochs 400 --batch-size 1024 --lr 0.4 --weight-decay 0.0005 --beta 6.0 \
  --data-dir /datasets/ --data cifar10s \
  --aux-data-filename /datasets/ti_500K_pseudo_labeled.pickle \  # location to where you download the pseudolabeled data
  --unsup-fraction 0.7


The code has been implemented and tested with Python 3.8.5, PyTorch 1.8.0, and apex(use for accel).

Part of the code is based on the following repos:


Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective








No releases published


No packages published
