Changes between Version 10 and Version 11 of SpiderLing


Ignore:
Timestamp:
Jun 18, 2020, 8:58:33 PM (3 months ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SpiderLing

    v10 v11  
    55
    66The aim of our work is to focus the crawling on the text rich parts of the web and maximize the number of words in the final corpus per downloaded megabyte. Nevertheless the crawler can be configured to ignore the yield rate of web domains and download from low yield sites too.
     7
     8{{{
     9#!html
     10<a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a>
     11|
     12<a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a>
     13|
     14<a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a>
     15}}}
    716
    817== Get SpiderLing ==