Changes between Version 11 and Version 12 of SpiderLing


Ignore:
Timestamp:
Feb 8, 2021, 1:15:30 PM (4 months ago)
Author:
admin
Comment:

Ph.D. thesis

Legend:

Unmodified
Added
Removed
Modified
  • SpiderLing

    v11 v12  
    1919
    2020== Publications ==
    21 We presented our results at the following venues:
     21Chapter 1 in Vít Suchomel's Ph.D. thesis (defended in 2020):[[BR]]
     22[https://is.muni.cz/th/u4rmz/Better_Web_Corpora_For_Corpus_Linguistics_And_NLP.pdf Better Web Corpora For Corpus Linguistics And NLP]
     23
     24We also presented our results at the following venues:
    2225
    2326[http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf Efficient Web Crawling for Large Text Corpora]\\
     
    3538
    3639== Large textual corpora built using !SpiderLing ==
    37 === Since 2017 ===
     40=== From 2017 to 2020 ===
    3841Corpora of total size of ca. 200 billion tokens in various languages (mostly English) were built from data crawled by SpiderLing from 2017 to March 2020.
     42
     43[[Image(crawled_sizes_2019.png, 960px)]]
     44{{{#!div style="font-size: 80%"
     45Table source: Page 128 of Suchomel, Vít. "Better Web Corpora For Corpus Linguistics And NLP." Dissertation thesis, Masaryk university, 2020.
     46}}}
    3947
    4048=== From 2011 to 2014 ===