Changes between Version 11 and Version 12 of SpiderLing
- Timestamp:
- 02/08/21 13:15:30 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SpiderLing
v11 v12 19 19 20 20 == Publications == 21 We presented our results at the following venues: 21 Chapter 1 in Vít Suchomel's Ph.D. thesis (defended in 2020):[[BR]] 22 [https://is.muni.cz/th/u4rmz/Better_Web_Corpora_For_Corpus_Linguistics_And_NLP.pdf Better Web Corpora For Corpus Linguistics And NLP] 23 24 We also presented our results at the following venues: 22 25 23 26 [http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf Efficient Web Crawling for Large Text Corpora]\\ … … 35 38 36 39 == Large textual corpora built using !SpiderLing == 37 === Since 2017===40 === From 2017 to 2020 === 38 41 Corpora of total size of ca. 200 billion tokens in various languages (mostly English) were built from data crawled by SpiderLing from 2017 to March 2020. 42 43 [[Image(crawled_sizes_2019.png, 960px)]] 44 {{{#!div style="font-size: 80%" 45 Table source: Page 128 of Suchomel, Vít. "Better Web Corpora For Corpus Linguistics And NLP." Dissertation thesis, Masaryk university, 2020. 46 }}} 39 47 40 48 === From 2011 to 2014 ===