SQuAD2.0

The Stanford Question Answering Dataset

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.


SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.

Explore SQuAD2.0 and model predictionsSQuAD2.0 paper (Rajpurkar & Jia et al. '18)

SQuAD 1.1, the previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles.

Explore SQuAD1.1 and model predictionsSQuAD1.0 paper (Rajpurkar et al. '16)

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate-v2.0.py <path_to_dev-v2.0> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:

Submission Tutorial

Because SQuAD is an ongoing effort, we expect the dataset to evolve.

To keep up to date with major changes to the dataset, please subscribe:

Have Questions?

Ask us questions at our google group or at robinjia@stanford.edu.

Leaderboard

SQuAD2.0 tests the ability of a system to not only answer reading comprehension questions, but also abstain when presented with a question that cannot be answered based on the provided paragraph.

RankModelEMF1
Human Performance

Stanford University

(Rajpurkar & Jia et al. '18)
86.83189.452

1

Jun 04, 2021
IE-Net (ensemble)

RICOH_SRCB_DML

90.93993.214

2

Feb 21, 2021
FPNet (ensemble)

Ant Service Intelligence Team

90.87193.183

3

May 16, 2021
IE-NetV2 (ensemble)

RICOH_SRCB_DML

90.86093.100

4

Apr 06, 2020
SA-Net on Albert (ensemble)

QIANXIN

90.72493.011

5

May 05, 2020
SA-Net-V2 (ensemble)

QIANXIN

90.67992.948

5

Apr 05, 2020
Retro-Reader (ensemble)

Shanghai Jiao Tong University

http://arxiv.org/abs/2001.09694
90.57892.978

5

Feb 05, 2021
FPNet (ensemble)

YuYang

90.60092.899

6

Apr 18, 2021
TransNets + SFVerifier + SFEnsembler (ensemble)

Senseforth AI Research

https://www.senseforth.ai/
90.48792.894

6

Dec 01, 2020
EntitySpanFocusV2 (ensemble)

RICOH_SRCB_DML

90.52192.824

6

Jul 31, 2020
ATRLP+PV (ensemble)

Hithink RoyalFlush

90.44292.877

7

May 16, 2022
LANetV2 (ensemble)

2digit-david

http://2digit.io/
90.42092.807

8

Mar 12, 2020
ALBERT + DAAF + Verifier (ensemble)

PINGAN Omni-Sinitic

90.38692.777

9

Feb 05, 2021
MixEnsemble (ensemble)

Anonymous

90.19492.594

10

Jan 10, 2020
Retro-Reader on ALBERT (ensemble)

Shanghai Jiao Tong University

http://arxiv.org/abs/2001.09694
90.11592.580

11

Jan 12, 2021
Answer Dependent Classify (single model)

YITU

90.05992.517

11

Jan 30, 2021
ANet

ensemble

90.08192.457

12

Nov 06, 2019
ALBERT + DAAF + Verifier (ensemble)

PINGAN Omni-Sinitic

90.00292.425

13

Apr 04, 2022
LANet (ensemble)

2digit-david

89.92392.425

14

Sep 18, 2019
ALBERT (ensemble model)

Google Research & TTIC

https://arxiv.org/abs/1909.11942
89.73192.215

14

Feb 25, 2020
Albert_Verifier_AA_Net (ensemble)

QIANXIN

89.74392.180

14

Jun 27, 2020
ELECTRA+ATRLP+PV (single model)

Hithink RoyalFlush

89.55192.366

14

Jan 10, 2021
Span Extract + Classify (single model)

Anonymous

89.56292.226

14

Mar 28, 2020
Retro-Reader on ELECTRA (single model)

Shanghai Jiao Tong University

http://arxiv.org/abs/2001.09694
89.56292.052

14

Mar 27, 2020
albert+KD+transfer (ensemble)

Anonymous

89.46192.134

15

Nov 18, 2020
ROaD-Electra

single model

89.44992.118

16

Feb 02, 2021
ELECTRA + E-Verifier (ensemble)

Midea NLP Team

89.34891.985

16

Jan 03, 2021
ELECTRA + ROBERTA + ALBERT (ensemble)

Midea NLP Team

89.32591.994

16

Jan 12, 2021
2task (single model)

Ted

89.32591.939

17

Oct 13, 2022
Deberta

single model

89.23591.900

18

Apr 21, 2020
albert+KD+transfer+twopass (single)

SPPD

89.11191.877

18

Apr 18, 2020
ALBERT + MTDA + SFVerifier (ensemble model)

Senseforth AI Research

https://www.senseforth.ai/
89.23591.739

19

Apr 15, 2020
ALBERT + SFVerifier (ensemble model)

Senseforth AI Research

https://www.senseforth.ai/
89.13391.666

19

Apr 22, 2020
ELECTRA+RL+EV (single model)

Hithink RoyalFlush

89.02191.765

20

Sep 17, 2021
AE-TEST

ensemble

88.99891.635

20

Dec 08, 2019
ALBERT+Entailment DA (ensemble)

CloudWalk

88.76191.745

20

May 02, 2020
ELECTRA+EntitySpanFocus (Single model)

SRCB_DML

88.87491.546

21

Apr 14, 2020
SA-Net on Electra (single model)

QIANXIN

88.85191.486

22

Mar 06, 2020
ELECTRA (single model)

Google Brain & Stanford

88.71691.365

23

Aug 13, 2020
ELECTRA_ATT (single model)

Shanghai Jiao Tong University

88.61491.303

24

Aug 27, 2021
Deberta+prefix

Tsinghua University

88.60391.299

25

Feb 24, 2020
ALBERT (Single model)

SRCB_DML

88.59291.286

25

Feb 20, 2020
Tuned ALBERT (ensemble model)

Group Data & Analytics Cell | Aditya Birla Group)

https://www.adityabirla.com/About/group-data-and-analytics
88.63791.230

25

Jun 24, 2020
ALBERT + IG + NE (single model)

Anonymous

88.56991.287

26

Jun 24, 2020
ALBERT + IG (single model)

Anonymous

88.52491.256

26

Jan 19, 2020
Retro-Reader on ALBERT (single model)

Shanghai Jiao Tong University

http://arxiv.org/abs/2001.09694
88.10791.419

26

Jul 22, 2019
XLNet + DAAF + Verifier (ensemble)

PINGAN Omni-Sinitic

88.59290.859

26

Mar 13, 2020
aanet_v2.0 (single model)

QIANXIN

88.43490.918

26

Dec 08, 2019
ALBERT+Entailment DA Verifier (single model)

CloudWalk

87.84791.265

26

Jan 07, 2020
ALBERT + SFVerifier (single model)

Senseforth AI Research

https://www.senseforth.ai/
88.19790.830

26

Sep 16, 2019
ALBERT (single model)

Google Research & TTIC

https://arxiv.org/abs/1909.11942
88.10790.902

26

Mar 30, 2020
MTL (single model)

HAPTIK AI RESEARCH

https://haptik.ai
88.10790.902

26

Jul 26, 2019
UPM (ensemble)

Anonymous

88.23190.713

26

Feb 10, 2020
SkERT-Large (single model)

Skelter Labs

87.99490.944

26

Aug 04, 2019
XLNet + SG-Net Verifier (ensemble)

Shanghai Jiao Tong University & CloudWalk

https://arxiv.org/abs/1908.05147
88.17490.702

26

May 21, 2020
albert+KD+transfer+twopass (single)

SPPD

87.94990.818

26

Feb 29, 2020
ALBERT+RL (single model)

Hithink RoyalFlush

87.87090.823

26

May 22, 2020
albert_xxlarge (single model)

Zheyu Ye

87.80290.872

26

Nov 15, 2019
XLNet (single model)

Google Brain & CMU

87.92690.689

27

Feb 12, 2020
Tuned ALBERT (single model)

Group Data & Analytics Cell | Aditya Birla Group)

https://www.adityabirla.com/About/group-data-and-analytics
87.84790.532

27

Feb 10, 2020
ALBERT 1.1 (single model)

Anonymous

87.70090.588

28

Apr 04, 2020
LUKE (single model)

Studio Ousia & NAIST & RIKEN AIP

https://arxiv.org/abs/2010.01057
87.42990.163

29

Aug 04, 2019
XLNet + SG-Net Verifier++ (single model)

Shanghai Jiao Tong University & CloudWalk

https://arxiv.org/abs/1908.05147
87.23890.071

30

Jul 26, 2019
UPM (single model)

Anonymous

87.19389.934

30

Nov 27, 2019
RoBERTa+Verify (ensemble)

CW

86.93390.037

30

Mar 20, 2019
BERT + DAE + AoA (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

87.14789.474

30

Jul 20, 2019
RoBERTa (single model)

Facebook AI

86.82089.795

31

Nov 12, 2019
RoBERTa+Verify (single model)

CW

86.44889.586

31

Mar 15, 2019
BERT + ConvLSTM + MTL + Verifier (ensemble)

Layer 6 AI

86.73089.286

32

Mar 05, 2019
BERT + N-Gram Masking + Synthetic Self-Training (ensemble)

Google AI Language

https://github.com/google-research/bert
86.67389.147

32

May 29, 2020
Enhanced Albert+Verifier (ensemble)

Microsoft STCA AIC

86.09889.634

32

Oct 16, 2019
Xlnet+Verifier

single model

86.59489.082

33

Aug 30, 2019
Xlnet+Verifier (single model)

Ping An Life Insurance Company AI Team

86.57289.063

33

May 30, 2020
Enhanced Albert+Verifier3 (ensemble)

Microsoft STCA AIC

85.82789.778

33

Dec 09, 2019
XLNET-V2-123+ (single model)

MST/EOI

http://tia.today
86.40389.148

34

May 21, 2019
XLNet (single model)

Google Brain & CMU

86.34689.133

35

May 14, 2019
SG-Net (ensemble)

Shanghai Jiao Tong University

https://arxiv.org/abs/1908.05147
86.21188.848

35

Apr 13, 2019
SemBERT (ensemble)

Shanghai Jiao Tong University

https://arxiv.org/abs/1909.02209
86.16688.886

35

Sep 29, 2019
BERTSP (single model)

NEUKG

http://www.techkg.cn/--please
85.83888.921

35

Sep 22, 2020
RoBERTa-Large (ensemble model)

SAIL

85.87288.793

35

Mar 16, 2019
BERT + DAE + AoA (single model)

Joint Laboratory of HIT and iFLYTEK Research

85.88488.621

35

Jul 22, 2019
SpanBERT (single model)

FAIR & UW

85.74888.709

36

Sep 21, 2020
RoBERTa-Large (single model)

SAIL

85.17388.425

36

May 14, 2019
SG-Net (single model)

Shanghai Jiao Tong University

https://arxiv.org/abs/1908.05147
85.22987.926

36

Mar 13, 2019
BERT + ConvLSTM + MTL + Verifier (single model)

Layer 6 AI

84.92488.204

36

Mar 05, 2019
BERT + N-Gram Masking + Synthetic Self-Training (single model)

Google AI Language

https://github.com/google-research/bert
85.15087.715

36

Jun 19, 2019
BNDVnet (single model)

PAOS

85.00387.833

36

Jan 15, 2019
BERT + MMFT + ADA (ensemble)

Microsoft Research Asia

85.08287.615

36

Apr 11, 2019
SemBERT (single model)

Shanghai Jiao Tong University

https://arxiv.org/abs/1909.02209
84.80087.864

36

Sep 13, 2019
xlnet (single model)

VerifiedXiaoPAI

84.64288.000

36

Apr 16, 2019
Insight-baseline-BERT (single model)

PAII Insight Team

84.83487.644

37

Sep 03, 2019
Hanvon_model (single model)

Hanvon_WuHan

84.72187.117

38

Jan 10, 2019
BERT + Synthetic Self-Training (ensemble)

Google AI Language

https://github.com/google-research/bert
84.29286.967

38

Sep 29, 2023
RoberTa+Parallel+Adapters (single model)

Quant Studio

84.12387.013

38

Nov 08, 2019
BERT + Multiple-CNN (ensemble)

Kyonggi University (ICL) & KISTI

84.20286.767

39

Jun 02, 2021
SemNet (single model)

JAIST

83.81986.669

40

Jul 22, 2019
Tuned BERT-1seq Large Cased (single model)

FAIR & UW

83.75186.594

41

Jun 01, 2021
SynNet (single model)

JAIST

83.52586.222

41

Mar 20, 2019
Bert-raw (ensemble)

None

83.60486.036

41

Dec 13, 2018
BERT finetune baseline (ensemble)

Anonymous

83.53686.096

41

Dec 21, 2018
PAML+BERT (ensemble model)

PINGAN GammaLab

83.45786.122

41

Dec 16, 2018
Lunet + Verifier + BERT (ensemble)

Layer 6 AI NLP Team

83.46986.043

42

Dec 15, 2018
Lunet + Verifier + BERT (single model)

Layer 6 AI NLP Team

82.99586.035

42

Jun 20, 2019
SENSEFORTH + BERT

single

https://senseforth.ai
83.14285.873

42

Jan 14, 2019
BERT + MMFT + ADA (single model)

Microsoft Research Asia

83.04085.892

42

May 14, 2019
ATB (single model)

Anonymous

82.88286.002

42

Feb 16, 2019
Bert-raw (ensemble)

None

83.17585.635

42

Feb 26, 2019
BERT with Something (ensemble)

Anonymous

83.05185.737

42

Jan 10, 2019
BERT + Synthetic Self-Training (single model)

Google AI Language

https://github.com/google-research/bert
82.97285.810

42

Jul 22, 2019
Tuned BERT Large Cased (single model)

FAIR & UW

82.80385.863

42

Mar 11, 2019
Bert-raw (ensemble)

None

83.11985.510

42

Feb 15, 2019
BERT + NeurQuRI (ensemble)

2SAH

82.80385.703

43

Feb 27, 2019
BERT + NeurQuRI (ensemble)

2SAH

82.71385.584

43

May 13, 2019
BERT-Base + QA Pre-training (single model)

Anonymous

82.72485.491

43

Dec 16, 2018
PAML+BERT (single model)

PINGAN GammaLab

82.57785.603

43

Aug 05, 2021
BART + Adapters + Lohfink-Rossi-Leaveout (single-model)

Georgia Institute of Technology

https://adapterhub.ml/adapters/lohfink-rossi/facebook-bart-large_qa_squad2_lohfink-rossi-leaveout/
82.30685.670

43

Nov 16, 2018
AoA + DA + BERT (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

82.37485.310

44

Dec 12, 2018
BERT finetune baseline (single model)

Anonymous

82.12684.820

44

Sep 22, 2020
BERT-Base PMI-Masking Additional Data (single model)

AI21 Labs

82.02484.854

45

Feb 28, 2019
BERT_s (single model)

Anonymous

81.97984.846

46

Feb 28, 2019
BERT-large+UBFT (single model)

anonymous

81.57384.535

47

Feb 15, 2019
BERT + NeurQuRI (single model)

2SAH

81.25784.342

47

Feb 25, 2019
BERT with Something (single model)

Anonymous

81.11084.386

47

Nov 16, 2018
AoA + DA + BERT (single model)

Joint Laboratory of HIT and iFLYTEK Research

81.17884.251

48

Mar 20, 2019
Bert-raw (single)

None

80.69383.922

48

Mar 07, 2019
BERT + UnAnsQ (single model)

Anonymous

80.74983.851

48

Sep 21, 2020
BERT-Base PMI-Masking (single model)

AI21 Labs

80.89683.604

49

Jan 22, 2019
BERT + NeurQuRI (single model)

2SAH

80.59183.391

49

Mar 11, 2019
Bert-raw (single)

None

80.41183.457

50

Sep 22, 2020
PMI-Masking Additional Data Random Baseline (single model)

AI21 Labs

80.37783.262

51

Feb 16, 2019
Bert-raw (single model)

None

80.34383.243

51

May 28, 2019
Bert

Single Model

https://senseforth.ai
80.42283.118

51

Sep 22, 2020
PMI-Masking Pure-PMI (single model)

AI21 Labs

80.24183.175

52

Apr 04, 2019
BISAN-CC (single model)

Seoul National University & Hyundai Motors

80.20883.149

52

Dec 03, 2018
PwP+BERT (single model)

AITRICS

80.11783.189

52

Jul 22, 2019
Original BERT Large Cased (single model)

FAIR & UW

79.97183.266

52

Feb 19, 2019
BERT + UDA (single model)

Anonymous

80.00583.208

53

Apr 10, 2019
bert (single model)

vinda msqjmxx

79.97183.184

53

Feb 28, 2019
ST_bl

single model

80.14082.962

53

Nov 08, 2018
BERT (single model)

Google AI Language

80.00583.061

54

Sep 22, 2020
PMI-Masking Additional Data Pure-PMI (single model)

AI21 Labs

79.99383.039

55

Feb 12, 2019
BERT + Sparse-Transformer

single model

79.94883.023

55

Sep 21, 2020
PMI-Masking Random Baseline (single model)

AI21 Labs

80.03882.796

55

Mar 07, 2019
BERT uncased (single model)

Anonymous

79.74583.020

55

Dec 06, 2018
NEXYS_BASE (single model)

NEXYS, DGIST R7

79.77982.912

56

Feb 01, 2019
{bert-finetuning} (single model)

ksai

79.63282.852

57

Feb 25, 2020
BERT-Large-Cased

single model

79.61082.692

58

Nov 09, 2018
L6Net + BERT (single model)

Layer 6 AI

79.18182.259

58

Mar 14, 2019
{Anonymous} (single model)

Anonymous

78.87682.524

58

Sep 29, 2023
RoberTa+Fusion+Adapters (single model)

Quant Studio

78.93381.863

59

Apr 24, 2019
BERT + WIAN (ensemble)

Infosys Limited

78.65081.497

60

Aug 02, 2020
AMBERT (single model)

ByteDance

78.59481.445

60

Mar 14, 2019
BISAN (single model)

Seoul National University & Hyundai Motors

78.48181.531

61

Dec 26, 2019
BERT-Large-Cased

single model

78.35781.500

62

Dec 14, 2018
BERT+AC (single model)

Hithink RoyalFlush

78.05281.174

63

Aug 03, 2020
BERT (single model)

ByteDance

77.31980.310

64

Sep 17, 2023
RoberTa+Adapter (single model)

Quant Studio

77.26280.258

65

Nov 06, 2018
SLQA+BERT (single model)

Alibaba DAMO NLP

http://www.aclweb.org/anthology/P18-1158
77.00380.209

66

Aug 03, 2020
AMBERT-H (single model)

ByteDance

76.71079.659

66

Aug 03, 2020
AMBERT-S (single model)

ByteDance

76.56379.776

67

Jan 05, 2019
synss (single model)

bert_finetune

76.05579.329

68

May 21, 2021
mgrc

single model

75.34478.381

68

Apr 05, 2021
BERT-Base-L (single model)

Anonymous

75.45778.232

69

Dec 18, 2018
ARSG-BERT (single model)

TRINITI RESEARCH LABS, Active.ai

https://active.ai
74.74678.227

69

Aug 29, 2020
BERT-Base-V (single model)

Anonymous

75.07377.805

69

Nov 05, 2018
MIR-MRC(F-Net) (single model)

Kangwon National University, Natural Language Processing Lab. & ForceWin, KP Lab.

74.79177.988

70

Aug 06, 2020
BERT-Base-DT (single model)

Anonymous

74.76977.706

71

Dec 03, 2020
BERT-Base-V2

single model

74.65677.404

71

Feb 25, 2021
BERT-Base-DP (single model)

Anonymous

74.57777.464

72

Aug 14, 2020
BERT-Base-Add (single model)

Anonymous

74.32977.396

72

May 23, 2019
{BERTcw} (single model)

private

74.38577.308

73

Sep 13, 2018
nlnet (single model)

Microsoft Research Asia

74.27277.052

74

Jan 12, 2020
batch2 (single model)

THU

73.74276.858

75

Dec 29, 2018
MMIPN

Single

73.50576.424

76

Aug 09, 2020
BERT-Base-Baseline (single model)

Anonymous

73.30276.284

77

Apr 20, 2019
BERT-Base (single model)

Dining Philosophers

73.09976.236

78

Oct 12, 2018
YARCS (ensemble)

IBM Research AI

72.67075.507

78

Apr 23, 2020
BERT-base

single model

72.07275.513

78

Apr 25, 2020
BERTBase (single model)

Anonymous

72.07275.513

79

Nov 14, 2018
BERT+Answer Verifier (single model)

Pingan Tech Olatop Lab

71.66675.457

80

Sep 17, 2018
Unet (ensemble)

Fudan University & Liulishuo Lab

https://arxiv.org/abs/1810.06638
71.41774.869

80

Apr 24, 2019
BERT-Base (single)

GreenflyAI

https://greenfly.ai
71.69974.430

80

Aug 15, 2018
Reinforced Mnemonic Reader + Answer Verifier (single model)

NUDT

https://arxiv.org/abs/1808.05759
71.76774.295

80

Aug 28, 2018
SLQA+ (single model)

Alibaba DAMO NLP

http://www.aclweb.org/anthology/P18-1158
71.46274.434

80

Apr 25, 2021
HYDRA_BERT (single model)

JAIST

71.29374.578

81

Jan 19, 2019
{BERT-base} (single-model)

Anonymous

70.76374.449

81

Sep 14, 2018
SAN (ensemble model)

Microsoft Business Applications AI Research

https://arxiv.org/abs/1712.03556
71.31673.704

82

Aug 21, 2018
FusionNet++ (ensemble)

Microsoft Business Applications Group AI Research

https://arxiv.org/abs/1711.07341
70.30072.484

82

Sep 26, 2018
Multi-Level Attention Fusion(MLAF) (single model)

Chonbuk National University, Cognitive Computing Lab.

69.47672.857

83

Sep 14, 2018
Unet (single model)

Fudan University & Liulishuo Lab

69.26272.642

84

Dec 20, 2018
DocQA + NeurQuRI (single model)

2SAH

68.76671.662

85

Aug 21, 2018
SAN (single model)

Microsoft Business Applications AI Research

https://arxiv.org/abs/1712.03556
68.65371.439

85

Sep 13, 2018
BiDAF++ with pair2vec (single model)

UW and FAIR

68.02171.583

85

Jun 24, 2018
KACTEIL-MRC(GFN-Net) (single model)

Kangwon National University, Natural Language Processing Lab.

68.21370.878

85

Jul 13, 2018
VS^3-NET (single model)

Kangwon National University in South Korea

67.89770.884

86

Jan 01, 2019
EBB-Net (single model)

Enliple AI

66.61070.303

87

Jun 25, 2018
KakaoNet2 (single model)

Kakao NLP Team

65.71969.381

88

Sep 13, 2018
BiDAF++ (single model)

UW and FAIR

65.65168.866

88

Jul 11, 2018
abcNet (single model)

Fudan University & Liulishuo AI Lab

65.25669.206

89

Jun 27, 2018
BSAE AddText (single model)

reciTAL.ai

63.33867.422

90

Aug 14, 2018
eeAttNet (single model)

BBD NLP Team

https://www.bbdservice.com
63.32766.633

90

May 30, 2018
BiDAF + Self Attention + ELMo (single model)

Allen Institute for Artificial Intelligence [modified by Stanford]

63.37266.251

91

May 30, 2018
BiDAF + Self Attention (single model)

Allen Institute for Artificial Intelligence [modified by Stanford]

59.33262.305

92

May 30, 2018
BiDAF-No-Answer (single model)

University of Washington [modified by Stanford]

59.17462.093

92

Nov 27, 2018
Tree-LSTM + BiDAF + ELMo (single model)

Carnegie Mellon University

57.70762.341

SQuAD1.1 Leaderboard

Here are the ExactMatch (EM) and F1 scores evaluated on the test set of SQuAD v1.1.

RankModelEMF1
Human Performance

Stanford University

(Rajpurkar et al. '16)
82.30491.221

1

Jul 24, 2021
{ANNA} (single model)

LG AI Research

90.62295.719

2

Apr 10, 2020
LUKE (single model)

Studio Ousia & NAIST & RIKEN AIP

https://arxiv.org/abs/2010.01057
90.20295.379

3

May 21, 2019
XLNet (single model)

Google Brain & CMU

89.89895.080

4

Dec 11, 2019
XLNET-123++ (single model)

MST/EOI

http://tia.today
89.85694.903

4

Aug 11, 2019
XLNET-123 (single model)

MST/EOI

89.64694.930

5

Jul 21, 2019
SpanBERT (single model)

FAIR & UW

88.83994.635

6

Jul 03, 2019
BERT+WWM+MT (single model)

Xiaoi Research

88.65094.393

7

Jul 21, 2019
Tuned BERT-1seq Large Cased (single model)

FAIR & UW

87.46593.294

8

Oct 05, 2018
BERT (ensemble)

Google AI Language

https://arxiv.org/abs/1810.04805
87.43393.160

9

May 14, 2019
ATB (single model)

Anonymous

86.94092.641

10

Jul 21, 2019
Tuned BERT Large Cased (single model)

FAIR & UW

86.52192.617

10

Jul 04, 2019
BERT+MT (single model)

Xiaoi Research

86.45892.645

11

Feb 14, 2019
KT-NET (single model)

Baidu NLP

85.94492.425

11

Sep 26, 2018
nlnet (ensemble)

Microsoft Research Asia

85.95491.677

11

Feb 28, 2019
ST_bl

single model

85.43091.976

12

Nov 21, 2019
EL-BERT (single model)

YeonTaek Oh

85.33591.807

13

Mar 14, 2019
BISAN (single model)

Seoul National University & Hyundai Motors

85.31491.756

13

Jun 03, 2019
DPN (single model)

Anonymous

84.97892.019

13

Oct 05, 2018
BERT (single model)

Google AI Language

https://arxiv.org/abs/1810.04805
85.08391.835

13

Jul 10, 2019
BERT-uncased (single model)

Anonymous

84.92691.932

13

Feb 16, 2019
BERT+Sparse-Transformer

single model

85.12591.623

13

Sep 09, 2018
nlnet (ensemble)

Microsoft Research Asia

85.35691.202

13

Jul 21, 2019
Original BERT Large Cased (single model)

FAIR & UW

84.32891.281

13

Feb 19, 2019
WD (single model)

Anonymous

84.40290.561

13

Jul 11, 2018
QANet (ensemble)

Google Brain & CMU

84.45490.490

13

Apr 21, 2019
Common-sense Governed BERT-123 (single model)

Jerry AGI Ragtag

83.93090.613

14

Feb 21, 2019
WD1 (single model)

Anonymous

83.80490.429

14

Jul 08, 2018
r-net (ensemble)

Microsoft Research Asia

84.00390.147

14

May 08, 2019
Common-sense Governed BERT-123 (single model)

MST/EOI

82.94391.074

14

Jun 20, 2018
MARS (ensemble)

YUANFUDAO research NLP

83.98289.796

15

Mar 19, 2018
QANet (ensemble)

Google Brain & CMU

83.87789.737

15

Sep 09, 2018
nlnet (single model)

Microsoft Research Asia

83.46890.133

16

Sep 01, 2018
MARS (single model)

YUANFUDAO research NLP

83.18589.547

16

Dec 28, 2020
Pytalk + Stanza + BERT (single model)

University of North Texas

83.42689.218

16

Jun 21, 2018
MARS (single model)

YUANFUDAO research NLP

83.12289.224

16

Jul 01, 2020
BERT-Base mod (single model)

Anonymous

82.68189.379

16

Mar 06, 2018
QANet (ensemble)

Google Brain & CMU

82.74489.045

16

Jun 20, 2018
QANet (single)

Google Brain & CMU

82.47189.306

16

Jan 22, 2018
Hybrid AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

82.48289.281

16

Feb 19, 2018
Reinforced Mnemonic Reader + A2D (ensemble model)

Microsoft Research Asia & NUDT

82.84988.764

16

May 09, 2018
MARS (single model)

YUANFUDAO research NLP

82.58788.880

16

Jan 03, 2018
r-net+ (ensemble)

Microsoft Research Asia

82.65088.493

16

Jan 05, 2018
SLQA+ (ensemble)

Alibaba iDST NLP

82.44088.607

16

Jul 14, 2019
BERT (single model)

KTNET

82.06288.947

16

Feb 27, 2018
QANet (single model)

Google Brain & CMU

82.20988.608

16

Feb 02, 2018
Reinforced Mnemonic Reader (ensemble model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
82.28388.533

16

Dec 23, 2018
MMIPN

Single

81.58088.948

16

Dec 17, 2017
r-net (ensemble)

Microsoft Research Asia

http://aka.ms/rnet
82.13688.126

16

Dec 17, 2018
ARSG-BERT (single model)

TRINITI RESEARCH LABS, Active.ai

https://active.ai
81.30788.909

16

Dec 22, 2017
AttentionReader+ (ensemble)

Tencent DPDAC NLP

81.79088.163

17

May 09, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

81.53888.130

17

Apr 23, 2018
r-net (single model)

Microsoft Research Asia

81.39188.170

17

May 09, 2018
Reinforced Mnemonic Reader + A2D + DA (single model)

Microsoft Research Asia & NUDT

81.40188.122

17

Apr 03, 2018
KACTEIL-MRC(GF-Net+) (ensemble)

Kangwon National University, Natural Language Processing Lab.

81.49687.557

17

Nov 20, 2020
mBERT + Task Adapter (Single)

TU Darmstadt

80.66788.169

17

Feb 27, 2018
QANet (single model)

Google Brain & CMU

80.92987.773

17

Nov 17, 2017
BiDAF + Self Attention + ELMo (ensemble)

Allen Institute for Artificial Intelligence

81.00387.432

17

Feb 19, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

80.91987.492

17

Mar 11, 2020
batch (single model)

THU

79.85988.263

17

Feb 12, 2018
Reinforced Mnemonic Reader + A2D (single model)

Microsoft Research Asia & NUDT

80.48987.454

17

Apr 12, 2018
AVIQA+ (ensemble)

aviqa team

80.61587.311

18

Jan 13, 2018
SLQA+

single model

80.43687.021

18

Jan 04, 2018
{EAZI} (ensemble)

Yiwise NLP Group

80.43686.912

18

Jan 12, 2018
EAZI+ (ensemble)

Yiwise NLP Group

80.42686.912

18

Jan 22, 2018
Hybrid AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

80.02787.288

18

Jan 06, 2020
BERT-INDEPENDENT-DSS-FILTERED (single model)

Brno University of Technology

79.59787.374

18

Mar 20, 2018
DNET (ensemble)

QA geeks

80.16486.721

19

Feb 12, 2018
BiDAF + Self Attention + ELMo + A2D (single model)

Microsoft Research Asia & NUDT

79.99686.711

20

Jan 03, 2018
r-net+ (single model)

Microsoft Research Asia

79.90186.536

20

Feb 23, 2018
MAMCN+ (single model)

Samsung Research

79.69286.727

21

Jan 29, 2018
Reinforced Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
79.54586.654

21

Dec 05, 2017
SAN (ensemble model)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1712.03556
79.60886.496

21

Dec 28, 2017
SLQA+ (single model)

Alibaba iDST NLP

79.19986.590

22

Oct 17, 2017
Interactive AoA Reader+ (ensemble)

Joint Laboratory of HIT and iFLYTEK

79.08386.450

22

Nov 05, 2018
KACTEIL-MRC(GF-Net+Distillation) (single model)

Kangwon National University, Natural Language Processing Lab.

79.08386.288

23

Jun 01, 2018
MDReader

single model

79.03186.006

23

Oct 24, 2017
FusionNet (ensemble)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1711.07341
78.97886.016

24

Oct 22, 2017
DCN+ (ensemble)

Salesforce Research

https://arxiv.org/abs/1711.00106
78.85285.996

25

Mar 29, 2018
KACTEIL-MRC(GF-Net+) (single model)

Kangwon National University, Natural Language Processing Lab.

78.66485.780

25

Nov 03, 2017
BiDAF + Self Attention + ELMo (single model)

Allen Institute for Artificial Intelligence

78.58085.833

26

May 09, 2018
KakaoNet (single model)

Kakao NLP Team

78.40185.724

27

Nov 30, 2017
SLQA (ensemble)

Alibaba iDST NLP

78.32885.682

27

Mar 19, 2018
aviqa (ensemble)

aviqa team

78.49685.469

27

Jan 02, 2018
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
78.43385.517

27

Sep 18, 2018
BiDAF++ with pair2vec (single model)

UW and FAIR

78.22385.535

27

Jun 01, 2018
MDReader0

single model

78.17185.543

27

Jan 03, 2018
MEMEN (single model)

Zhejiang University

https://arxiv.org/abs/1707.09098
78.23485.344

27

Jan 29, 2018
test

single

78.08785.348

28

Jul 25, 2017
Interactive AoA Reader (ensemble)

Joint Laboratory of HIT and iFLYTEK Research

77.84585.297

29

Mar 20, 2018
DNET (single model)

QA geeks

77.64684.905

30

Sep 18, 2018
BiDAF++ (single model)

UW and FAIR

77.57384.858

30

Dec 06, 2017
AttentionReader+ (single)

Tencent DPDAC NLP

77.34284.925

30

Dec 13, 2017
RaSoR + TR + LM (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
77.58384.163

30

Dec 21, 2017
Jenga (ensemble)

Facebook AI Research

77.23784.466

30

Nov 06, 2017
Conductor-net (ensemble)

CMU

https://arxiv.org/abs/1710.10504
76.99684.630

30

Jan 23, 2018
MARS (single model)

YUANFUDAO research NLP

76.85984.739

31

May 14, 2018
VS^3-NET (single model)

Kangwon National University in South Korea

76.77584.491

31

Nov 01, 2017
SAN (single model)

Microsoft Business AI Solutions Team

https://arxiv.org/abs/1712.03556
76.82884.396

31

Sep 26, 2018
{gqa} (single model)

FAIR

77.09083.931

31

Dec 19, 2017
FRC (single model)

in review

76.24084.599

31

Oct 13, 2017
r-net (single model)

Microsoft Research Asia

http://aka.ms/rnet
76.46184.265

32

Oct 22, 2017
Conductor-net (ensemble)

CMU

76.14683.991

33

Sep 08, 2017
FusionNet (single model)

Microsoft Business AI Solutions team

https://arxiv.org/abs/1711.07341
75.96883.900

34

Oct 22, 2017
Interactive AoA Reader+ (single model)

Joint Laboratory of HIT and iFLYTEK

75.82183.843

34

Oct 18, 2018
KAR (single model)

York University

https://arxiv.org/abs/1809.03449
76.12583.538

35

Jul 14, 2017
smarnet (ensemble)

Eigen Technology & Zhejiang University

75.98983.475

36

Mar 15, 2018
AVIQA-v2 (single model)

aviqa team

75.92683.305

37

Aug 18, 2017
RaSoR + TR (single model)

Tel-Aviv University

https://arxiv.org/abs/1712.03609
75.78983.261

37

Mar 20, 2020
Kbs (single model)

Tsinghua University

75.03483.405

37

Oct 23, 2017
DCN+ (single model)

Salesforce Research

https://arxiv.org/abs/1711.00106
75.08783.081

37

Nov 01, 2017
Mixed model (ensemble)

Sean

75.26582.769

37

May 21, 2017
MEMEN (ensemble)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1707.09098
75.37082.658

37

Nov 17, 2017
two-attention-self-attention (ensemble)

guotong1988

75.22382.716

37

Jul 10, 2017
DCN+ (single model)

Salesforce Research

https://arxiv.org/abs/1711.00106
74.86682.806

37

Mar 09, 2017
ReasoNet (ensemble)

MSR Redmond

https://arxiv.org/abs/1609.05284
75.03482.552

37

Oct 31, 2017
SLQA (single model)

Alibaba iDST NLP

74.48982.815

37

Feb 06, 2018
Jenga (single model)

Facebook AI Research

74.37382.845

37

Jan 02, 2018
Conductor-net (single model)

CMU

https://arxiv.org/abs/1710.10504
74.40582.742

37

Aug 14, 2018
eeAttNet (single model)

BBD NLP Team

https://www.bbdservice.com
74.60482.501

38

Feb 13, 2018
SSR-BiDAF

ensemble model

74.54182.477

39

Jul 14, 2017
Mnemonic Reader (ensemble)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
74.26882.371

40

Dec 23, 2017
S^3-Net (ensemble)

Kangwon National University in South Korea

74.12182.342

41

Jul 29, 2017
SEDT (ensemble model)

CMU

https://arxiv.org/abs/1703.00572
74.09081.761

42

Jul 06, 2017
SSAE (ensemble)

Tsinghua University

74.08081.665

42

Jul 25, 2017
Interactive AoA Reader (single model)

Joint Laboratory of HIT and iFLYTEK Research

73.63981.931

42

Feb 22, 2017
BiDAF (ensemble)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
73.74481.525

42

Apr 22, 2017
SEDT+BiDAF (ensemble)

CMU

https://arxiv.org/abs/1703.00572
73.72381.530

42

Nov 06, 2017
Conductor-net (single)

CMU

https://arxiv.org/abs/1710.10504
73.24081.933

42

Dec 14, 2017
Jenga (single model)

Facebook AI Research

73.30381.754

42

Jan 24, 2017
Multi-Perspective Matching (ensemble)

IBM Research

https://arxiv.org/abs/1612.04211
73.76581.257

42

May 01, 2017
jNet (ensemble)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
73.01081.517

43

Oct 22, 2017
Conductor-net (single)

CMU

72.59081.415

43

Apr 12, 2017
T-gating (ensemble)

Peking University

72.75881.001

43

Nov 16, 2017
two-attention-self-attention (single model)

guotong1988

72.60081.011

43

Sep 20, 2017
BiDAF + Self Attention (single model)

Allen Institute for Artificial Intelligence

https://arxiv.org/abs/1710.10723
72.13981.048

43

Mar 03, 2018
AVIQA (single model)

aviqa team

72.48580.550

43

Dec 15, 2017
S^3-Net (single model)

Kangwon National University in South Korea

71.90881.023

44

Nov 06, 2017
attention+self-attention (single model)

guotong1988

71.69880.462

45

Nov 01, 2016
Dynamic Coattention Networks (ensemble)

Salesforce Research

https://arxiv.org/abs/1611.01604
71.62580.383

45

Apr 13, 2017
QFASE

NUS

71.89879.989

45

Jul 14, 2017
smarnet (single model)

Eigen Technology & Zhejiang University

https://arxiv.org/abs/1710.02772
71.41580.160

46

Jul 14, 2017
Mnemonic Reader (single model)

NUDT and Fudan University

https://arxiv.org/abs/1705.02798
70.99580.146

46

May 23, 2018
AttReader (single)

College of Computer & Information Science, SouthWest University, Chongqing, China

71.37379.725

46

Apr 22, 2018
MAMCN (single model)

Samsung Research

70.98579.939

46

Oct 27, 2017
M-NET (single)

UFL

71.01679.835

47

Mar 24, 2017
jNet (single model)

USTC & National Research Council Canada & York University

https://arxiv.org/abs/1703.04617
70.60779.821

47

Apr 02, 2017
Ruminating Reader (single model)

New York University

https://arxiv.org/abs/1704.07415
70.63979.456

47

Mar 14, 2017
Document Reader (single model)

Facebook AI Research

https://arxiv.org/abs/1704.00051
70.73379.353

47

Mar 08, 2017
ReasoNet (single model)

MSR Redmond

https://arxiv.org/abs/1609.05284
70.55579.364

47

Dec 28, 2016
FastQAExt

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
70.84978.857

47

May 13, 2017
RaSoR (single model)

Google NY, Tel-Aviv University

https://arxiv.org/abs/1611.01436
70.84978.741

47

Apr 14, 2017
Multi-Perspective Matching (single model)

IBM Research

https://arxiv.org/abs/1612.04211
70.38778.784

48

Aug 30, 2017
SimpleBaseline (single model)

Technical University of Vienna

69.60078.236

48

Feb 05, 2018
SSR-BiDAF

single model

69.44378.358

49

Apr 12, 2017
SEDT+BiDAF (single model)

CMU

https://arxiv.org/abs/1703.00572
68.47877.971

50

Jun 25, 2017
PQMN (single model)

KAIST & AIBrain & Crosscert

68.33177.783

51

Apr 12, 2017
T-gating (single model)

Peking University

68.13277.569

51

Jul 29, 2017
SEDT (single model)

CMU

https://arxiv.org/abs/1703.00572
68.16377.527

51

Dec 28, 2016
FastQA

German Research Center for Artificial Intelligence

https://arxiv.org/abs/1703.04816
68.43677.070

51

Jan 22, 2018
FABIR

Single Model

https://arxiv.org/abs/1810.09580
67.74477.605

51

Nov 28, 2016
BiDAF (single model)

Allen Institute for AI & University of Washington

https://arxiv.org/abs/1611.01603
67.97477.323

52

Oct 26, 2016
Match-LSTM with Ans-Ptr (Boundary) (ensemble)

Singapore Management University

https://arxiv.org/abs/1608.07905
67.90177.022

52

Sep 19, 2017
AllenNLP BiDAF (single model)

Allen Institute for AI

http://allennlp.org/
67.61877.151

53

Feb 05, 2017
Iterative Co-attention Network

Fudan University

67.50276.786

54

Jan 03, 2018
newtest

single model

66.52775.787

54

Nov 01, 2016
Dynamic Coattention Networks (single model)

Salesforce Research

https://arxiv.org/abs/1611.01604
66.23375.896

55

Oct 26, 2016
Match-LSTM with Bi-Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
64.74473.743

56

Sep 21, 2017
OTF dict+spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
64.08373.056

56

Feb 19, 2017
Attentive CNN context with LSTM

NLPR, CASIA

63.30673.463

57

Nov 02, 2016
Fine-Grained Gating

Carnegie Mellon University

https://arxiv.org/abs/1611.01724
62.44673.327

57

Sep 21, 2017
OTF spelling (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.89772.016

58

Sep 21, 2017
OTF spelling+lemma (single)

University of Montreal

https://arxiv.org/abs/1706.00286
62.60471.968

59

Sep 28, 2016
Dynamic Chunk Reader

IBM

https://arxiv.org/abs/1610.09996
62.49970.956

59

Nov 15, 2019
RQA+IDR (single model)

BUAA & MSRA

https://arxiv.org/abs/2005.02925
61.14571.389

60

Aug 27, 2016
Match-LSTM with Ans-Ptr (Boundary)

Singapore Management University

https://arxiv.org/abs/1608.07905
60.47470.695

61

Aug 27, 2016
Match-LSTM with Ans-Ptr (Sentence)

Singapore Management University

https://arxiv.org/abs/1608.07905
54.50567.748

61

Nov 15, 2019
RQA (single model)

BUAA & MSRA

https://arxiv.org/abs/2005.02925
55.82765.467

62

Aug 22, 2019
UQA (single model)

Anonymous

53.69864.036