Corpus Crawler is a tool for Corpus Linguistics.
Modern linguistic research works on language corpora, which are large samples of “real world” text. This crawler helps to build such corpora: it follows links to publicly accessible web pages known to be written in a certain language; it removes boilerplate and HTML markup; finally, it writes its output into plaintext files. The crawler implements the Robots Exclusion Standard, and it is intentionally slow so it does not cause much load on the crawled web sites.
This is not an official Google product. But if you’re a linguistic researcher, or if you’re writing a spell checker (or similar language-processing software) for an “exotic” language, you might find Corpus Crawler useful.
To build corpora for not-yet-supported languages, please read the contribution guidelines and send us GitHub pull requests.
The crawled corpora have been used to compute word frequencies in Unicode’s Unilex project.
IETF BCP47 Code | Language | Tokens¹ |
---|---|---|
aai |
Arifama-Miniafia | 181K 💾 |
aak |
Ankave | 194K 💾 |
aau |
Abau | 313K 💾 |
aaz |
Amarasi | 308K 💾 |
abt |
Ambulas | 297K 💾 |
aby |
Aneme Wake | 233K 💾 |
acd |
Gikyode | 323K 💾 |
ace |
Aceh/Acehnese | 817K 💾 |
acf |
Saint Lucian Creole French | 236K 💾 |
ach |
Acoli | 178K 💾 |
acn |
Achang | 232K 💾 |
acr |
Achi | 239K 💾 |
acu |
Achuar-Shiwiar | 174K 💾 |
ade |
Adele | 267K 💾 |
adh |
Adhola | 166K 💾 |
adj |
Adioukrou | 233K 💾 |
ae |
Avestan | 129K 💾 |
ae-Latn |
Avestan (Latin) | 141K 💾 |
aey |
Amele | 218K 💾 |
agd |
Agarabi | 256K 💾 |
agg |
Angor | 214K 💾 |
agm |
Angaataha | 238K 💾 |
agn |
Agutaynen | 234K 💾 |
agr |
Aguaruna | 149K 💾 |
ahk |
Akha | 367K 💾 |
aia |
Arosi | 223K 💾 |
akb |
Batak Angkola | 220K 💾 |
ake |
Akawaio | 190K 💾 |
akh |
Akha | 408K 💾 |
akp |
Siwu | 191K 💾 |
alj |
Alangan | 185K 💾 |
alp |
Alune | 225K 💾 |
alt |
Southern Altai | 121K 💾 |
alz |
Alur | 160K 💾 |
am |
Amharic | 2,170K 💾 |
ame |
Yanesha' | 221K 💾 |
amf |
Hamer-Banna | 152K 💾 |
amk |
Ambai | 229K 💾 |
amm |
Ama (Papua New Guinea) | 246K 💾 |
amn |
Amanab | 207K 💾 |
amp |
Alamblak | 241K 💾 |
amr |
Amarakaeri | 151K 💾 |
amu |
Guerrero Amuzgo | 202K 💾 |
ann |
Obolo | 236K 💾 |
anv |
Denya | 214K 💾 |
aoj |
Mufian | 217K 💾 |
aom |
Ömie | 231K 💾 |
aon |
Bumbita Arapesh | 294K 💾 |
aoz |
Uab Meto | 197K 💾 |
ape |
Bukiyip | 294K 💾 |
apr |
Arop-Lokep | 373K 💾 |
apz |
Safeyoka | 235K 💾 |
ar |
Arabic | 19,593K 💾 |
arl |
Arabela | 206K 💾 |
asg |
Cishingini | 270K 💾 |
aso |
Dano | 290K 💾 |
ata |
Pele-Ata | 248K 💾 |
atb |
Zaiwa | 291K 💾 |
atg |
Ivbie North-Okpela-Arhe | 229K 💾 |
atq |
Aralle-Tabulahan | 202K 💾 |
auy |
Awiyaana | 164K 💾 |
av |
Avaric | 111K 💾 |
avn |
Avatime | 229K 💾 |
avt |
Au | 263K 💾 |
avu |
Avokaya | 391K 💾 |
awa |
Awadhi | 211K 💾 |
awb |
Awa (Papua New Guinea) | 179K 💾 |
ay |
Aymara | 482K 💾 |
ayo |
Ayoreo | 264K 💾 |
az |
Azerbaijani | 3,413K 💾 |
azg |
San Pedro Amuzgos Amuzgo | 271K 💾 |
azz |
Highland Puebla Nahuatl | 265K 💾 |
ba |
Bashkir | 666K 💾 |
ban |
Balinese | 211K 💾 |
bao |
Waimaha | 232K 💾 |
bav |
Vengo | 250K 💾 |
bba |
Baatonum | 792K 💾 |
bbb |
Barai | 289K 💾 |
bbo |
Northern Bobo Madaré | 211K 💾 |
bbr |
Girawa | 245K 💾 |
bch |
Bariai | 248K 💾 |
bcw |
Bana | 304K 💾 |
bdd |
Bunama | 171K 💾 |
be |
Belarusian | 1,441K 💾 |
be-tarask |
Belarusian (Taraškievica) | 108,431K 💾 |
bef |
Benabena | 239K 💾 |
bep |
Besoa | 204K 💾 |
bex |
Jur Modo | 254K 💾 |
bfd |
Bafut | 276K 💾 |
bfo |
Malba Birifor | 260K 💾 |
bg |
Bulgarian | 10,597K 💾 |
bgr |
Bawm Chin | 213K 💾 |
bgz |
Banggai | 186K 💾 |
bhl |
Bimin | 324K 💾 |
bhw |
Biak | 164K 💾 |
bi |
Bislama | 315K 💾 |
bib |
Bissa | 243K 💾 |
big |
Biangai | 229K 💾 |
bik |
Central Bikol | 183K 💾 |
bim |
Bimoba | 215K 💾 |
biv |
Southern Birifor | 221K 💾 |
bjr |
Binumarien | 226K 💾 |
bjv |
Bedjond | 268K 💾 |
bkl |
Berik | 306K 💾 |
bku |
Buhid | 204K 💾 |
bkv |
Bekwarra | 244K 💾 |
blh |
Kuwaa | 259K 💾 |
blt-Latn |
Tai Dam (Latin) | 262K 💾 |
blz |
Balantak | 199K 💾 |
bm |
Bambara | 30K 💾 |
bmh |
Kein | 253K 💾 |
bmq |
Bomu | 207K 💾 |
bmr |
Muinane | 122K 💾 |
bmu |
Somba-Siawari | 234K 💾 |
bmv |
Bum | 258K 💾 |
bn |
Bangla | 7,258K 💾 |
bnj |
Eastern Tawbuid | 239K 💾 |
bnp |
Bola | 263K 💾 |
bo |
Tibetan | 5,642K 💾 |
boa |
Bora | 133K 💾 |
boj |
Anjam | 255K 💾 |
bon |
Bine | 244K 💾 |
bov |
Tuwuli | 203K 💾 |
box |
Buamu | 274K 💾 |
bpr |
Koronadal Blaan | 204K 💾 |
bps |
Sarangani Blaan | 214K 💾 |
bqc |
Boko | 567K 💾 |
bqj |
Bandial | 175K 💾 |
bqp |
Busa | 162K 💾 |
bru |
Eastern Bru | 261K 💾 |
bs |
Bosnian | 8,993K 💾 |
bsn |
Barasana-Eduria | 225K 💾 |
bss |
Akoose | 199K 💾 |
btd |
Batak Dairi | 192K 💾 |
bts |
Batak Simalungun | 175K 💾 |
btt |
Bete-Bendi | 266K 💾 |
btx |
Batak Karo | 189K 💾 |
bua |
Buriat | 143K 💾 |
bud |
Ntcham | 207K 💾 |
buk |
Bugawac | 264K 💾 |
bus |
Bokobaru | 159K 💾 |
bvc |
Baelelea | 308K 💾 |
bvz |
Bauzi | 509K 💾 |
bwq |
Southern Bobo Madaré | 214K 💾 |
bwu |
Buli | 285K 💾 |
byr |
Baruya | 182K 💾 |
byx |
Qaqet | 387K 💾 |
bzh |
Mapos Buang | 251K 💾 |
bzi |
Bisu | 381K 💾 |
bzj |
Belize Kriol English | 240K 💾 |
ca-valencia |
Valencian | 24,295K 💾 |
caa |
Chortí | 307K 💾 |
cab |
Garifuna | 154K 💾 |
cac |
Chuj | 244K 💾 |
cak |
Kaqchikel | 259K 💾 |
cap |
Chipaya | 154K 💾 |
car |
Galibi Carib | 160K 💾 |
cax |
Chiquitano | 149K 💾 |
cbc |
Carapana | 256K 💾 |
cbi |
Chachi | 187K 💾 |
cbl |
Bualkhaw Chin | 210K 💾 |
cbr |
Cashibo-Cacataibo | 236K 💾 |
cbs |
Cashinahua | 198K 💾 |
cbt |
Chayahuita | 150K 💾 |
cbv |
Cacua | 265K 💾 |
cce |
Chopi | 204K 💾 |
ccp |
Chakma | 79K 💾 |
cdf |
Chiru | 193K 💾 |
ce |
Chechen | 669K 💾 |
ceb |
Cebuano | 1,067K 💾 |
ceg |
Chamacoco | 232K 💾 |
cfm |
Falam Chin | 438K 💾 |
cgc |
Kagayanen | 299K 💾 |
chj |
Ojitlán Chinantec | 305K 💾 |
chm |
Mari | 132K 💾 |
chr |
Cherokee | 119K 💾 |
chz |
Ozumacín Chinantec | 205K 💾 |
cjo |
Ashéninka Pajonal | 141K 💾 |
cjp |
Cabécar | 199K 💾 |
cjv |
Chuave | 286K 💾 |
cko |
Anufo | 272K 💾 |
cle |
Lealao Chinantec | 313K 💾 |
cme |
Cerma | 230K 💾 |
cmr |
Mro-Khimi Chin | 275K 💾 |
cnh |
Hakha Chin | 934K 💾 |
cni |
Asháninka | 122K 💾 |
cnk |
Khumi Chin | 237K 💾 |
cnl |
Lalana Chinantec | 308K 💾 |
cnt |
Tepetotutla Chinantec | 261K 💾 |
coe |
Koreguaje | 181K 💾 |
cof |
Colorado | 183K 💾 |
cok |
Santa Teresa Cora | 230K 💾 |
con |
Cofán | 151K 💾 |
cot |
Caquinte | 128K 💾 |
crh |
Crimean Tatar | 505K 💾 |
cs |
Czech | 3,141K 💾 |
csk |
Jola-Kasa | 177K 💾 |
cso |
Sochiapam Chinantec | 328K 💾 |
ctd-Latn |
Tedim Chin (Latin) | 852K 💾 |
ctu |
Chol | 203K 💾 |
cub |
Cubeo | 220K 💾 |
cuc |
Usila Chinantec | 278K 💾 |
cui |
Cuiba | 292K 💾 |
cuk |
San Blas Kuna | 187K 💾 |
cul |
Culina | 221K 💾 |
cv |
Chuvash | 111K 💾 |
cwe |
Kwere | 144K 💾 |
cwt |
Kuwaataay | 168K 💾 |
cy |
Welsh | 11,519K 💾 |
cya |
Nopala Chatino | 245K 💾 |
czt |
Zotung Chin | 227K 💾 |
da |
Danish | 655K 💾 |
daa |
Dangaléat | 208K 💾 |
dad |
Marik | 197K 💾 |
dah |
Gwahatike | 274K 💾 |
ddn |
Dendi | 210K 💾 |
de |
German | 46,431K 💾 |
ded |
Dedua | 146K 💾 |
des |
Desano | 210K 💾 |
dga |
Southern Dagaare | 458K 💾 |
dgi |
Northern Dagara | 257K 💾 |
dgz |
Daga | 219K 💾 |
din |
Southwestern Dinka | 196K 💾 |
dip |
Northeastern Dinka | 193K 💾 |
djk |
Eastern Maroon Creole | 307K 💾 |
dln |
Darlong | 776K 💾 |
dnw |
Western Dani | 254K 💾 |
dob |
Dobu | 179K 💾 |
dop |
Lukpa | 226K 💾 |
dsh |
Daasanach | 211K 💾 |
dtb |
Labuk-Kinabatangan Kadazan | 248K 💾 |
dtp |
Kadazan Dusun | 1,038K 💾 |
dts |
Toro So Dogon | 202K 💾 |
due |
Umiray Dumaget Agta | 247K 💾 |
dug |
Duruma | 172K 💾 |
duo |
Dupaninan Agta | 266K 💾 |
dwr |
Dawro | 254K 💾 |
dww |
Dawawa | 208K 💾 |
dyi |
Djimini Senoufo | 268K 💾 |
dyo |
Jola-Fonyi | 158K 💾 |
dyu |
Dyula | 1,156K 💾 |
dz |
Dzongkha | 61K 💾 |
ee |
Ewe | 421K 💾 |
eka |
Ekajuk | 213K 💾 |
el |
Greek | 5,470K 💾 |
emi |
Mussau-Emira | 176K 💾 |
emp |
Northern Emberá | 158K 💾 |
enb |
Markweeta | 147K 💾 |
enq |
Enga | 217K 💾 |
enx |
Enxet | 772K 💾 |
eri |
Ogea | 269K 💾 |
es |
Spanish | 32,670K 💾 |
ese |
Ese Ejja | 226K 💾 |
et |
Estonian | 3,658K 💾 |
eu |
Basque | 130K 💾 |
ewo |
Ewondo | 158K 💾 |
eza |
Ezaa | 963K 💾 |
fa |
Persian | 9,114K 💾 |
fa-AF |
Dari | 7,363K 💾 |
faa |
Fasu | 238K 💾 |
fai |
Faiwol | 256K 💾 |
fal |
South Fali | 198K 💾 |
far |
Fataleka | 286K 💾 |
fi |
Finnish | 4,837K 💾 |
fil |
Tagalog | 184K 💾 |
fip |
Fipa | 134K 💾 |
fit |
Tornedalen Finnish | 292K 💾 |
fj |
Fijian | 257K 💾 |
fo |
Faroese | 851K 💾 |
fon |
Fon | 266K 💾 |
for |
Fore | 169K 💾 |
fr |
French | 5,488K 💾 |
fue |
Borgu Fulfulde | 148K 💾 |
fuf |
Pular | 174K 💾 |
fuq |
Central-Eastern Niger Fulfulde | 156K 💾 |
fuv |
Nigerian Fulfulde | 13K 💾 |
ga |
Irish | 7,587K 💾 |
gag |
Gagauz | 245K 💾 |
gah |
Alekano | 210K 💾 |
gam |
Kandawo | 250K 💾 |
gaw |
Nobonob | 246K 💾 |
gbi |
Galela | 288K 💾 |
gd |
Scottish Gaelic | 17,105K 💾 |
gde |
Gude | 217K 💾 |
gdn |
Umanakaina | 306K 💾 |
gdr |
Wipi | 271K 💾 |
gej |
Gen | 236K 💾 |
gfk |
Patpatar | 294K 💾 |
ghs |
Guhu-Samane | 186K 💾 |
gil |
Gilbertese | 228K 💾 |
gkn |
Gokana | 267K 💾 |
gmv-Latn |
Gamo (Latin) | 127K 💾 |
gn |
Guarani | 142K 💾 |
gnd |
Zulgo-Gemzek | 364K 💾 |
gng |
Ngangam | 219K 💾 |
gnw |
Western Bolivian Guaraní | 263K 💾 |
gof |
Gofa | 124K 💾 |
gog |
Gogo | 173K 💾 |
gor |
Gorontalo | 211K 💾 |
gqr |
Gor | 218K 💾 |
grb |
Northern Grebo | 270K 💾 |
grt |
Garo | 141K 💾 |
gso |
Southwest Gbaya | 228K 💾 |
gsw-u-sd-chag |
Swiss German (Aargau) | 99K 💾 |
gsw-u-sd-chbe |
Swiss German (Bern) | 73K 💾 |
gsw-u-sd-chfr |
Swiss German (Fribourg) | 42K 💾 |
gu |
Gujarati | 702K 💾 |
gub |
Guajajára | 997K 💾 |
guc |
Wayuu | 211K 💾 |
gud |
Yocoboué Dida | 216K 💾 |
guh |
Guahibo | 204K 💾 |
gui |
Eastern Bolivian Guaraní | 197K 💾 |
gum |
Guambiano | 186K 💾 |
gun |
Mbyá Guaraní | 176K 💾 |
guo |
Guayabero | 203K 💾 |
guq |
Aché | 184K 💾 |
gur |
Farefare | 240K 💾 |
gux |
Gourmanchéma | 215K 💾 |
gv |
Manx Gaelic | 152K 💾 |
gvc |
Guanano | 241K 💾 |
gvf |
Golin | 276K 💾 |
gvl |
Gulay | 270K 💾 |
gwr |
Gwere | 157K 💾 |
gym |
Ngäbere | 294K 💾 |
gyr |
Guarayu | 176K 💾 |
ha |
Hausa | 1,775K 💾 |
hae |
Eastern Oromo | 163K 💾 |
hag |
Hanga | 202K 💾 |
haw |
Hawaiian | 2,221K 💾 |
hay |
Haya | 112K 💾 |
heh |
Hehe | 136K 💾 |
hi |
Hindi | 10,004K 💾 |
hif |
Fiji Hindi | 204K 💾 |
hig |
Kamwe | 261K 💾 |
hil |
Hiligaynon | 208K 💾 |
hla |
Halia | 273K 💾 |
hne |
Chhattisgarhi | 207K 💾 |
hnn |
Hanunoo | 212K 💾 |
hns |
Caribbean Hindustani | 312K 💾 |
ho |
Hiri Motu | 240K 💾 |
hot |
Hote | 222K 💾 |
hr |
Croatian | 8,188K 💾 |
ht |
Haitian | 1,101K 💾 |
hto |
Minica Huitoto | 182K 💾 |
hu |
Hungarian | 600K 💾 |
hub |
Huambisa | 160K 💾 |
hui |
Huli | 232K 💾 |
hus |
Huastec | 236K 💾 |
huu |
Murui Huitoto | 165K 💾 |
huv |
San Mateo Del Mar Huave | 197K 💾 |
hvn |
Sabu | 312K 💾 |
hy |
Armenian | 25,972K 💾 |
ian |
Iatmul | 224K 💾 |
iba |
Iban | 179K 💾 |
icr |
Islander Creole English | 248K 💾 |
id |
Indonesian | 6,634K 💾 |
ifa |
Amganad Ifugao | 810K 💾 |
ifb |
Batad Ifugao | 835K 💾 |
ife |
Ifè | 300K 💾 |
ifk |
Tuwali Ifugao | 214K 💾 |
ifu |
Mayoyao Ifugao | 258K 💾 |
ify |
Keley-I Kallahan | 863K 💾 |
ig |
Igbo | 13K 💾 |
ign |
Ignaciano | 161K 💾 |
ik |
Inupiaq | 96K 💾 |
ilo |
Iloko | 169K 💾 |
imo |
Imbongu | 280K 💾 |
inb |
Inga | 151K 💾 |
ino |
Inoke-Yate | 236K 💾 |
iou |
Tuma-Irumu | 225K 💾 |
ipi |
Ipili | 312K 💾 |
iri |
Irigwe | 243K 💾 |
irk |
Iraqw | 184K 💾 |
iry |
Iraya | 205K 💾 |
it |
Italian | 13,569K 💾 |
itv |
Itawit | 242K 💾 |
iu |
Inuktitut | 98K 💾 |
iws |
Sepik Iwam | 307K 💾 |
izr |
Izere | 216K 💾 |
izz |
Izii | 908K 💾 |
ja |
Japanese | 2,116K 💾 |
jac |
Popti' | 221K 💾 |
jae |
Yabem | 186K 💾 |
jam |
Jamaican Creole English | 254K 💾 |
jbu |
Jukun Takum | 264K 💾 |
jic |
Tol | 285K 💾 |
jiv |
Shuar | 134K 💾 |
jmc |
Machame | 150K 💾 |
jun |
Juang | 178K 💾 |
jv |
Javanese | 177K 💾 |
jvn |
Caribbean Javanese | 211K 💾 |
ka |
Georgian | 4,978K 💾 |
kaa |
Kara-Kalpak | 135K 💾 |
kab-Arab |
Kabyle (Arabic) | 715K 💾 |
kab-Tfng |
Kabyle (Tifinagh) | 1,338K 💾 |
kab |
Kabyle | 66K 💾 |
kac |
Kachin | 1,057K 💾 |
kao |
Xaasongaxango | 205K 💾 |
kaq |
Capanahua | 164K 💾 |
kbh |
Camsá | 193K 💾 |
kbm |
Iwal | 298K 💾 |
kbp |
Kabiyè | 571K 💾 |
kbq |
Kamano | 156K 💾 |
kbr |
Kafa | 147K 💾 |
kcg |
Tyap | 279K 💾 |
kdc |
Kutu | 140K 💾 |
kdi |
Kumam | 195K 💾 |
kdj |
Karamojong | 163K 💾 |
kdn |
Kunda | 144K 💾 |
kek |
Kekchí | 406K 💾 |
ken |
Kenyang | 200K 💾 |
keo |
Kakwa | 215K 💾 |
ker |
Kera | 267K 💾 |
kew |
West Kewa | 247K 💾 |
kez |
Kukele | 173K 💾 |
kgf |
Kube | 175K 💾 |
kgr |
Abun | 356K 💾 |
khz |
Keapara | 196K 💾 |
kia |
Kim | 525K 💾 |
kij |
Kilivila | 155K 💾 |
kj |
Kuanyama | 1,474K 💾 |
kjb |
Q'anjob'al | 263K 💾 |
kje |
Kisar | 235K 💾 |
kjh |
Khakas | 128K 💾 |
kjs |
East Kewa | 251K 💾 |
kk |
Kazakh | 642K 💾 |
kki |
Kagulu | 125K 💾 |
kkj |
Kako | 263K 💾 |
kln |
Kalenjin | 149K 💾 |
km |
Khmer | 29,110K 💾 |
kma |
Konni | 230K 💾 |
kmg |
Kâte | 127K 💾 |
kmo |
Kwoma | 213K 💾 |
kms |
Kamasau | 293K 💾 |
kmu |
Kanite | 214K 💾 |
kn |
Kannada | 126K 💾 |
kne |
Kankanaey | 230K 💾 |
knf |
Mankanya | 164K 💾 |
knj |
Western Kanjobal | 1,350K 💾 |
knk |
Kuranko | 228K 💾 |
kno |
Kono | 360K 💾 |
knv |
Tabo | 243K 💾 |
kog |
Cogui | 189K 💾 |
kpf |
Komba | 174K 💾 |
kpg |
Kapingamarangi | 967K 💾 |
kpr |
Korafe-Yegha | 262K 💾 |
kpw |
Kobon | 288K 💾 |
kpx |
Mountain Koiali | 190K 💾 |
kpz |
Kupsabiny | 166K 💾 |
kqc |
Doromu-Koki | 209K 💾 |
kqe |
Kalagan | 241K 💾 |
kqp |
Kimré | 254K 💾 |
kqw |
Kandas | 201K 💾 |
kqy |
Koorete | 156K 💾 |
krc |
Karachay-Balkar | 132K 💾 |
kri |
Krio | 256K 💾 |
krj |
Kinaray-A | 228K 💾 |
kru |
Kurukh | 182K 💾 |
ksd |
Kuanua | 228K 💾 |
ksr |
Borong | 233K 💾 |
ktb |
Kambaata | 113K 💾 |
ktj |
Plapo Krumen | 356K 💾 |
kto |
Kuot | 286K 💾 |
ku |
Kurdish | 2,479K 💾 |
kub |
Kutep | 281K 💾 |
kud |
‘Auhelawa | 167K 💾 |
kue |
Kuman (Papua New Guinea) | 230K 💾 |
kum |
Kumyk | 142K 💾 |
kup |
Kunimaipa | 279K 💾 |
kus |
Kusaal | 200K 💾 |
kv |
Komi | 122K 💾 |
kvn |
Border Kuna | 212K 💾 |
kwf |
Kwara'ae | 296K 💾 |
kwi |
Awa-Cuaiquer | 165K 💾 |
kwj |
Kwanga | 290K 💾 |
kxc |
Konso | 148K 💾 |
kxm |
Northern Khmer | 257K 💾 |
ky |
Kyrgyz | 18,597K 💾 |
kyc |
Kyaka | 220K 💾 |
kyf |
Kouya | 215K 💾 |
kyg |
Keyagana | 190K 💾 |
kyq |
Kenga | 250K 💾 |
kyu |
Western Kayah | 466K 💾 |
kyz |
Kayabí | 324K 💾 |
kze |
Kosena | 164K 💾 |
kzf |
Da'a Kaili | 213K 💾 |
kzj |
Coastal Kadazan | 215K 💾 |
la |
Latin | 48K 💾 |
laj |
Lango | 175K 💾 |
las |
Lama | 235K 💾 |
law |
Lauje | 262K 💾 |
lb |
Luxembourgish | 5,173K 💾 |
lcm |
Tungag | 239K 💾 |
lee |
Lyélé | 257K 💾 |
lef |
Lelemi | 211K 💾 |
lem |
Nomaande | 249K 💾 |
leu |
Kara (Papua New Guinea) | 255K 💾 |
lew |
Ledo Kaili | 198K 💾 |
lex |
Luang | 271K 💾 |
lgg |
Lugbara | 188K 💾 |
lhu |
Lahu | 352K 💾 |
lia |
West-Central Limba | 247K 💾 |
lid |
Nyindrou | 308K 💾 |
lif |
Limbu | 138K 💾 |
lip |
Sekpele | 214K 💾 |
lis |
Lisu | 304K 💾 |
ljp |
Lampung Api | 188K 💾 |
lln |
Lele | 291K 💾 |
lme |
Pévé | 245K 💾 |
lmk |
Lamkang | 217K 💾 |
lnd |
Lundayeh | 670K 💾 |
lo |
Lao | 4,384K 💾 |
lob |
Lobi | 192K 💾 |
loe |
Saluan | 220K 💾 |
lok |
Loko | 264K 💾 |
lon |
Malawi Lomwe | 137K 💾 |
lsi |
Lashi | 1,077K 💾 |
lsm |
Saamia | 156K 💾 |
lt |
Lithuanian | 39,575K 💾 |
luc |
Aringa | 242K 💾 |
lus |
Lushai | 204K 💾 |
lv |
Latvian | 1,020K 💾 |
lwo |
Luwo | 255K 💾 |
maa |
San Jerónimo Tecóatl Mazatec | 487K 💾 |
mad |
Madurese | 706K 💾 |
mag |
Magahi | 193K 💾 |
mai |
Maithili | 211K 💾 |
maj |
Jalapa De Díaz Mazatec | 188K 💾 |
mak |
Makasar | 179K 💾 |
mam |
Mam | 834K 💾 |
maw |
Mampruli | 251K 💾 |
maz |
Central Mazahua | 286K 💾 |
mbb |
Western Bukidnon Manobo | 278K 💾 |
mbc |
Macushi | 221K 💾 |
mbh |
Mangseng | 321K 💾 |
mbt |
Matigsalug Manobo | 226K 💾 |
mca |
Maca | 208K 💾 |
mcb |
Machiguenga | 132K 💾 |
mcd |
Sharanahua | 200K 💾 |
mco |
Coatlán Mixe | 217K 💾 |
mcp |
Makaa | 237K 💾 |
mcq |
Ese | 158K 💾 |
mcu |
Cameroon Mambila | 260K 💾 |
mda |
Mada | 312K 💾 |
mdy |
Male | 589K 💾 |
med |
Melpa | 283K 💾 |
mee |
Mengen | 301K 💾 |
mej |
Meyah | 323K 💾 |
mek |
Mekeo | 234K 💾 |
men |
Mende | 210K 💾 |
meq |
Merey | 291K 💾 |
meu |
Motu | 175K 💾 |
mfe |
Morisyen | 172K 💾 |
mfh |
Matal | 238K 💾 |
mfi |
Wandala | 265K 💾 |
mfk |
North Mofu | 248K 💾 |
mfq |
Moba | 232K 💾 |
mfy |
Mayo | 167K 💾 |
mfz |
Mabaan | 237K 💾 |
mg |
Malagasy | 1,623K 💾 |
mgd |
Moru | 192K 💾 |
mgh |
Makhuwa-Meetto | 150K 💾 |
mgo |
Meta' | 251K 💾 |
mh |
Marshallese | 750K 💾 |
mhi |
Ma'di | 192K 💾 |
mhl |
Mauwake | 235K 💾 |
mhx |
Maru | 291K 💾 |
mhy |
Ma'anyan | 190K 💾 |
mi |
Maori | 1,504K 💾 |
mib |
Atatláhuca Mixtec | 263K 💾 |
mif |
Mofu-Gudur | 283K 💾 |
mil |
Peñoles Mixtec | 365K 💾 |
min |
Minangkabau | 242K 💾 |
mio |
Pinotepa Nacional Mixtec | 288K 💾 |
miq |
Mískito | 214K 💾 |
mit |
Southern Puebla Mixtec | 273K 💾 |
mk |
Macedonian | 10,422K 💾 |
mkl |
Mokole | 230K 💾 |
ml |
Malayalam | 118K 💾 |
mlh |
Mape | 235K 💾 |
mlp |
Bargam | 297K 💾 |
mmo |
Mangga Buang | 269K 💾 |
mmx |
Madak | 271K 💾 |
mna |
Mbula | 257K 💾 |
mnb |
Muna | 151K 💾 |
mnf |
Mundani | 241K 💾 |
mnw |
Mon | 1,836K 💾 |
moa |
Mwan | 308K 💾 |
mog |
Mongondow | 220K 💾 |
mop |
Mopán Maya | 296K 💾 |
mor |
Moro | 152K 💾 |
mox |
Molima | 222K 💾 |
mpg |
Marba | 210K 💾 |
mpm |
Yosondúa Mixtec | 336K 💾 |
mps |
Dadibi | 1,270K 💾 |
mpt |
Mian | 256K 💾 |
mpx |
Misima-Panaeati | 227K 💾 |
mqb |
Mbuko | 302K 💾 |
mqj |
Mamasa | 164K 💾 |
mqn |
Moronene | 164K 💾 |
mr |
Marathi | 16,594K 💾 |
mrw |
Maranao | 912K 💾 |
ms |
Malay | 659K 💾 |
msm |
Agusan Manobo | 225K 💾 |
msy |
Aruamu | 229K 💾 |
mt |
Maltese | 3,331K 💾 |
mta |
Cotabato Manobo | 262K 💾 |
mti |
Maiwa (Papua New Guinea) | 166K 💾 |
mtj |
Moskona | 321K 💾 |
mto |
Totontepec Mixe | 233K 💾 |
mtp |
Wichí Lhamtés Nocten | 183K 💾 |
muh |
Mündü | 392K 💾 |
mur |
Murle | 210K 💾 |
mux |
Bo-Ung | 363K 💾 |
muy |
Muyang | 265K 💾 |
mva |
Manam | 231K 💾 |
mvp |
Duri | 174K 💾 |
mwv |
Mentawai | 141K 💾 |
mxb |
Tezoatlán Mixtec | 281K 💾 |
mxt |
Jamiltepec Mixtec | 267K 💾 |
my |
Burmese | 1,007K 💾 |
my-t-d0-zawgyi |
Burmese (Zawgyi encoding) | 593K 💾 |
myb |
Mbay | 192K 💾 |
myk |
Mamara Senoufo | 272K 💾 |
myv |
Erzya | 143K 💾 |
myw |
Muyuw | 150K 💾 |
myx |
Masaaba | 164K 💾 |
myy |
Macuna | 245K 💾 |
mza |
Santa María Zacatepec Mixtec | 316K 💾 |
mzi |
Ixcatlán Mazatec | 190K 💾 |
mzk |
Nigeria Mambila | 283K 💾 |
mzm |
Mumuye | 265K 💾 |
naf |
Nabak | 220K 💾 |
nak |
Nakanai | 333K 💾 |
nan-Latn |
Min Nan Chinese (Latin) | 231K 💾 |
nas |
Naasioi | 168K 💾 |
nca |
Iyo | 203K 💾 |
nch |
Central Huasteca Nahuatl | 195K 💾 |
ncj |
Northern Puebla Nahuatl | 164K 💾 |
ncu |
Chumburung | 312K 💾 |
ndj |
Ndamba | 141K 💾 |
ndy |
Lutos | 216K 💾 |
ndz |
Ndogo | 350K 💾 |
neb |
Toura | 326K 💾 |
new |
Newari | 150K 💾 |
nfr |
Nafaanra | 233K 💾 |
ngp |
Ngulu | 149K 💾 |
nho |
Takuu | 309K 💾 |
nhu |
Noone | 270K 💾 |
nhw |
Western Huasteca Nahuatl | 194K 💾 |
nhy |
Northern Oaxaca Nahuatl | 185K 💾 |
nia |
Nias | 182K 💾 |
nii |
Nii | 316K 💾 |
nij |
Ngaju | 194K 💾 |
nim |
Nilamba | 117K 💾 |
nin |
Ninzo | 267K 💾 |
nkf |
Inpui Naga | 197K 💾 |
nko |
Nkonya | 168K 💾 |
nl |
Dutch | 58,357K 💾 |
nlc |
Nalca | 241K 💾 |
nmz |
Nawdm | 209K 💾 |
nnb |
Nande | 127K 💾 |
nnq |
Ngindo | 137K 💾 |
nnw |
Southern Nuni | 291K 💾 |
noa |
Woun Meu | 275K 💾 |
nog |
Nogai | 104K 💾 |
nop |
Numanggang | 183K 💾 |
not |
Nomatsiguenga | 141K 💾 |
nou |
Ewage-Notu | 266K 💾 |
npl |
Southeastern Puebla Nahuatl | 148K 💾 |
npy |
Napu | 192K 💾 |
nsn |
Nehan | 248K 💾 |
nsu |
Sierra Negra Nahuatl | 170K 💾 |
ntm |
Nateni | 229K 💾 |
ntp |
Northern Tepehuan | 173K 💾 |
ntr |
Delo | 272K 💾 |
nuj |
Nyole | 151K 💾 |
nus |
Nuer | 195K 💾 |
nvm |
Namiae | 290K 💾 |
nwb |
Nyabwa | 316K 💾 |
nwi |
Southwest Tanna | 230K 💾 |
ny |
Nyanja | 356K 💾 |
nyf |
Giryama | 169K 💾 |
nyn |
Nyankole | 120K 💾 |
nyo |
Nyoro | 120K 💾 |
nyy |
Nyakyusa-Ngonde | 138K 💾 |
nzi |
Nzima | 201K 💾 |
obo |
Obo Manobo | 266K 💾 |
oc |
Occitan | 2,706K 💾 |
oku |
Oku | 239K 💾 |
okv |
Orokaiva | 212K 💾 |
old |
Mochi | 151K 💾 |
ong |
Olo | 284K 💾 |
opm |
Oksapmin | 332K 💾 |
or |
Oriya | 175K 💾 |
os |
Ossetic | 135K 💾 |
osa |
Osage | 3K 💾 |
otd |
Ot Danum | 187K 💾 |
ote |
Mezquital Otomi | 251K 💾 |
ozm |
Koonzime | 267K 💾 |
pa |
Punjabi | 59,990K 💾 |
pab |
Parecís | 156K 💾 |
pad |
Paumarí | 242K 💾 |
pag |
Pangasinan | 177K 💾 |
pah |
Tenharim | 268K 💾 |
pam |
Pampanga | 196K 💾 |
pau |
Palauan | 255K 💾 |
pbc |
Patamona | 181K 💾 |
pbi |
Parkwa | 272K 💾 |
pck |
Paite Chin | 770K 💾 |
pcm |
Nigerian Pidgin | 315K 💾 |
pez |
Eastern Penan | 235K 💾 |
pib |
Yine | 114K 💾 |
pir |
Piratapuyo | 229K 💾 |
pis |
Pijin | 263K 💾 |
pjt |
Pitjantjatjara | 237K 💾 |
pkb |
Pokomo | 166K 💾 |
pl |
Polish | 7,148K 💾 |
plw |
Brooke's Point Palawano | 203K 💾 |
pmf |
Pamona | 307K 💾 |
pny |
Pinyin | 247K 💾 |
poh |
Poqomchi' | 266K 💾 |
poi |
Highland Popoluca | 179K 💾 |
poy |
Pogolo | 147K 💾 |
ppk |
Uma | 220K 💾 |
ppo |
Folopa | 258K 💾 |
prf |
Paranan | 203K 💾 |
prk |
Parauk | 1,026K 💾 |
ps |
Pashto | 7,343K 💾 |
pss |
Kaulong | 326K 💾 |
pt |
Portuguese | 20,891K 💾 |
pt-PT |
Portuguese (Portugal) | 666K 💾 |
ptp |
Patep | 294K 💾 |
ptu |
Bambam | 194K 💾 |
pwg |
Gapapaiwa | 208K 💾 |
pww |
Pwo Northern Karen | 345K 💾 |
pxm |
Quetzaltepec Mixé | 720K 💾 |
qu |
Quechua | 580K 💾 |
qub |
Huallaga Huánuco Quechua | 122K 💾 |
quc |
K'iche' | 207K 💾 |
quf |
Lambayeque Quechua | 161K 💾 |
quh |
South Bolivian Quechua | 623K 💾 |
qul |
North Bolivian Quechua | 140K 💾 |
qup |
Southern Pastaza Quechua | 177K 💾 |
quw |
Tena Lowland Quichua | 116K 💾 |
quy |
Ayacucho Quechua | 106K 💾 |
qvc |
Cajamarca Quechua | 166K 💾 |
qve |
Eastern Apurímac Quechua | 168K 💾 |
qvi |
Imbabura Highland Quichua | 146K 💾 |
qvm |
Margos-Yarowilca-Lauricocha Quechua | 132K 💾 |
qvn |
North Junín Quechua | 139K 💾 |
qvo |
Napo Lowland Quechua | 117K 💾 |
qvs |
San Martín Quechua | 153K 💾 |
qvw |
Huaylla Wanca Quechua | 111K 💾 |
qvz |
Northern Pastaza Quichua | 157K 💾 |
qwh |
Huaylas Ancash Quechua | 128K 💾 |
qxh |
Panao Huánuco Quechua | 123K 💾 |
qxl |
Salasaca Highland Quichua | 127K 💾 |
qxn |
Northern Conchucos Ancash Quechua | 150K 💾 |
qxo |
Southern Conchucos Ancash Quechua | 136K 💾 |
qxr |
Cañar Highland Quichua | 509K 💾 |
rai |
Ramoaaina | 273K 💾 |
raj |
Malvi | 198K 💾 |
rav |
Sampang | 138K 💾 |
rej |
Rejang | 178K 💾 |
rim |
Nyaturu | 151K 💾 |
rm-puter |
Romansh (Puter) | 1,068K 💾 |
rm-rumgr |
Romansh (Grischun) | 4,794K 💾 |
rm-surmiran |
Romansh (Surmiran) | 2,540K 💾 |
rm-sursilv |
Romansh (Sursilvan) | 11,678K 💾 |
rm-sutsilv |
Romansh (Sutsilvan) | 1,007K 💾 |
rm-vallader |
Romansh (Vallader) | 5,560K 💾 |
rmc |
Carpathian Romani | 170K 💾 |
rmo |
Sinte Romani | 228K 💾 |
rn |
Rundi | 120K 💾 |
rnl |
Ranglong | 221K 💾 |
ro |
Romanian | 13,962K 💾 |
ro-MD |
Moldavian | 2,694K 💾 |
rom |
Vlax Romani | 186K 💾 |
roo |
Rotokas | 292K 💾 |
rro |
Waima | 177K 💾 |
ru |
Russian | 40,987K 💾 |
ruf |
Luguru | 135K 💾 |
rug |
Roviana | 956K 💾 |
rw |
Kinyarwanda | 605K 💾 |
rwo |
Rawa | 261K 💾 |
sab |
Buglere | 405K 💾 |
sah |
Sakha | 2,457K 💾 |
sas |
Sasak | 196K 💾 |
sat |
Santali | 149K 💾 |
sba |
Ngambay | 246K 💾 |
sbl |
Botolan Sambal | 251K 💾 |
sck |
Sadri | 189K 💾 |
sda |
Toraja-Sa'dan | 154K 💾 |
seh |
Sena | 155K 💾 |
sey |
Secoya | 163K 💾 |
sg |
Sango | 265K 💾 |
sgb |
Mag-antsi Ayta | 233K 💾 |
sgw |
Sebat Bet Gurage | 116K 💾 |
sgz |
Sursurunga | 327K 💾 |
shk |
Shilluk | 189K 💾 |
shn |
Shan | 1,435K 💾 |
shp |
Shipibo-Conibo | 169K 💾 |
si |
Sinhala | 1,046K 💾 |
sig |
Paasaal | 277K 💾 |
sil |
Tumulung Sisaala | 256K 💾 |
sim |
Mende (Papua New Guinea) | 273K 💾 |
sja |
Epena | 194K 💾 |
sk |
Slovak | 70,933K 💾 |
sl |
Slovenian | 10,975K 💾 |
sld |
Sissala | 206K 💾 |
sll |
Salt-Yui | 264K 💾 |
sm |
Samoan | 248K 💾 |
smt |
Simte | 177K 💾 |
sn |
Shona | 2,542K 💾 |
snc |
Sinaugoro | 216K 💾 |
snn |
Siona | 222K 💾 |
snp |
Siane | 237K 💾 |
snw |
Selee | 212K 💾 |
sny |
Saniyo-Hiyewe | 348K 💾 |
so |
Somali | 874K 💾 |
soq |
Kanasi | 213K 💾 |
soy |
Miyobe | 205K 💾 |
spl |
Selepet | 244K 💾 |
spp |
Supyire Senoufo | 251K 💾 |
sps |
Saposa | 324K 💾 |
sq |
Albanian | 10,104K 💾 |
sr |
Serbian | 4,785K 💾 |
sr-Latn |
Serbian (Latin) | 10,143K 💾 |
sri |
Siriano | 166K 💾 |
srm |
Saramaccan | 369K 💾 |
srn |
Sranan Tongo | 232K 💾 |
ssd |
Siroi | 210K 💾 |
ssg |
Seimat | 221K 💾 |
ssx |
Samberigi | 233K 💾 |
stn |
Owa | 263K 💾 |
su |
Sundanese | 172K 💾 |
sua |
Sulka | 458K 💾 |
sue |
Suena | 227K 💾 |
sur |
Mwaghavul | 261K 💾 |
sus |
Susu | 205K 💾 |
suz |
Sunwar | 732K 💾 |
sv |
Swedish | 33,633K 💾 |
sw |
Swahili | 8,817K 💾 |
swp |
Suau | 175K 💾 |
sxn |
Sangir | 209K 💾 |
ta |
Tamil | 1,413K 💾 |
tab |
Tabassaran | 132K 💾 |
taj |
Eastern Tamang | 169K 💾 |
tap |
Taabwa | 145K 💾 |
taq |
Tamasheq | 218K 💾 |
tav |
Tatuyo | 256K 💾 |
taw |
Tai | 268K 💾 |
tbc |
Takia | 278K 💾 |
tbg |
North Tairora | 235K 💾 |
tbo |
Tawala | 198K 💾 |
tby |
Tabaru | 226K 💾 |
tbz |
Ditammari | 692K 💾 |
tca |
Ticuna | 251K 💾 |
tcc |
Datooga | 135K 💾 |
te |
Telugu | 574K 💾 |
ted |
Tepo Krumen | 346K 💾 |
tem |
Timne | 190K 💾 |
teo |
Teso | 118K 💾 |
ter |
Tereno | 187K 💾 |
tfr |
Teribe | 228K 💾 |
tgo |
Sudest | 216K 💾 |
tgp |
Tangoa | 228K 💾 |
thk |
Tharaka | 150K 💾 |
ti |
Tigrinya | 803K 💾 |
tif |
Tifal | 413K 💾 |
tih |
Timugon Murut | 879K 💾 |
tik |
Tikar | 264K 💾 |
tim |
Timbe | 206K 💾 |
tk |
Turkmen | 516K 💾 |
tlb |
Tobelo | 209K 💾 |
tlf |
Telefol | 422K 💾 |
tlj |
Talinga-Bwisi | 159K 💾 |
tmc |
Tumak | 245K 💾 |
tna |
Tacana | 216K 💾 |
tnr |
Ménik | 254K 💾 |
to |
Tonga | 1,214K ��� |
tob |
Toba | 229K 💾 |
toc |
Coyutla Totonac | 218K 💾 |
toh |
Gitonga | 194K 💾 |
top |
Papantla Totonac | 168K 💾 |
tos |
Highland Totonac | 224K 💾 |
tpi |
Tok Pisin | 8,049K 💾 |
tpm |
Tampulma | 892K 💾 |
tpp |
Pisaflores Tepehua | 162K 💾 |
tpt |
Tlachichilco Tepehua | 173K 💾 |
tpz |
Tinputz | 370K 💾 |
tqo |
Toaripi | 215K 💾 |
tr |
Turkish | 13,846K 💾 |
trs |
Chicahuaxtla Triqui | 287K 💾 |
tsz |
Purepecha | 129K 💾 |
tt |
Tatar | 1,356K 💾 |
ttc |
Tektiteko | 231K 💾 |
tte |
Bwanabwana | 198K 💾 |
tue |
Tuyuca | 141K 💾 |
tuf |
Central Tunebo | 237K 💾 |
twb |
Western Tawbuid | 198K 💾 |
twu |
Termanu | 242K 💾 |
txa |
Tombonuo | 224K 💾 |
txu |
Kayapó | 354K 💾 |
tyv |
Tuvinian | 614K 💾 |
tyz |
Tày | 260K 💾 |
tzh |
Tzeltal | 901K 💾 |
tzj |
Tz'utujil | 245K 💾 |
ubr |
Ubir | 222K 💾 |
ubu |
Umbu-Ungu | 308K 💾 |
udm |
Udmurt | 135K 💾 |
udu |
Uduk | 287K 💾 |
ug |
Uyghur | 9,493K 💾 |
uk |
Ukrainian | 12,921K 💾 |
ur |
Urdu | 3,622K 💾 |
ura |
Urarina | 193K 💾 |
urb |
Urubú-Kaapor | 347K 💾 |
urk |
Urak Lawoi' | 368K 💾 |
ury |
Orya | 301K 💾 |
usa |
Usarufa | 171K 💾 |
usp |
Uspanteco | 228K 💾 |
uvl |
Lote | 277K 💾 |
uz |
Uzbek | 131K 💾 |
vag |
Vagla | 221K 💾 |
vec |
Venetian | 2K 💾 |
vec-u-sd-itpd |
Venetian (Padua) | 813K 💾 |
vec-u-sd-itts |
Venetian (Trieste) | 12K 💾 |
vec-u-sd-itvr |
Venetian (Verona) | 16K 💾 |
vid |
Vidunda | 151K 💾 |
viv |
Iduna | 220K 💾 |
vmw |
Makhuwa | 130K 💾 |
vun |
Vunjo | 141K 💾 |
vut |
Vute | 206K 💾 |
waj |
Waffa | 236K 💾 |
wap |
Wapishana | 193K 💾 |
war |
Waray | 208K 💾 |
way |
Wayana | 143K 💾 |
wer |
Weri | 209K 💾 |
wiu |
Wiru | 232K 💾 |
wlx |
Wali | 847K 💾 |
wmw |
Mwani | 139K 💾 |
wnc |
Wantoat | 238K 💾 |
wnu |
Usan | 234K 💾 |
wob |
Wè Northern | 270K 💾 |
wos |
Hanga Hundi | 264K 💾 |
wrs |
Waris | 213K 💾 |
wsk |
Waskia | 239K 💾 |
wuv |
Wuvulu-Aua | 187K 💾 |
wwa |
Waama | 239K 💾 |
xal |
Kalmyk | 135K 💾 |
xav |
Xavánte | 440K 💾 |
xed |
Hdi | 229K 💾 |
xla |
Kamula | 230K 💾 |
xog |
Soga | 127K 💾 |
xrb |
Eastern Karaboro | 286K 💾 |
xsb |
Sambal | 244K 💾 |
xsi |
Sio | 319K 💾 |
xsm |
Kasem | 604K 💾 |
xsr |
Sherpa | 184K 💾 |
xsu |
Sanumá | 408K 💾 |
xtd |
Diuxi-Tilantongo Mixtec | 277K 💾 |
xtm |
Magdalena Peñasco Mixtec | 335K 💾 |
xuo |
Kuo | 306K 💾 |
yaa |
Yaminahua | 204K 💾 |
yad |
Yagua | 142K 💾 |
yal |
Yalunka | 203K 💾 |
yam |
Yamba | 277K 💾 |
yaz |
Lokaa | 222K 💾 |
yby |
Yaweyuha | 219K 💾 |
ycn |
Yucuna | 202K 💾 |
yle |
Yele | 298K 💾 |
yli |
Angguruk Yali | 221K 💾 |
yml |
Iamalele | 245K 💾 |
yo |
Yoruba | 270K 💾 |
yon |
Yongkom | 202K 💾 |
yrb |
Yareba | 184K 💾 |
yre |
Yaouré | 285K 💾 |
yss |
Yessan-Mayo | 227K 💾 |
yua |
Yucateco | 813K 💾 |
yuj |
Karkar-Yuri | 258K 💾 |
yut |
Yopno | 227K 💾 |
yuw |
Yau (Morobe Province) | 243K 💾 |
yva |
Yawa | 250K 💾 |
zaa |
Sierra de Juárez Zapotec | 265K 💾 |
zad |
Cajonos Zapotec | 180K 💾 |
zae |
Yareni Zapotec | 248K 💾 |
zap |
Zapotec | 194K 💾 |
zas |
Santo Domingo Albarradas Zapotec | 184K 💾 |
zaw |
Mitla Zapotec | 157K 💾 |
zca |
Coatecas Altas Zapotec | 236K 💾 |
zia |
Zia | 242K 💾 |
ziw |
Zigula | 140K 💾 |
zlm |
Malay | 664K 💾 |
zne |
Zande | 253K 💾 |
zpc |
Choapan Zapotec | 208K 💾 |
zpi |
Santa María Quiegolani Zapotec | 209K 💾 |
zpq |
Zoogocho Zapotec | 208K 💾 |
zpt |
San Vicente Coatlán Zapotec | 229K 💾 |
zpz |
Texmelucan Zapotec | 281K 💾 |
zyp |
Zyphe Chin | 230K 💾 |
¹ Downloadable files include counts for each token; to get raw text, run the crawler yourself. For breaking text into words, we use an ICU word break iterator and count all tokens whose break status is one of UBRK_WORD_LETTER
, UBRK_WORD_KANA
, or UBRK_WORD_IDEO
.
./corpuscrawler --language=yo --output=./corpus