Changes between Version 10 and Version 11 of SpiderLing
- Timestamp:
- 06/18/20 20:58:33 (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SpiderLing
v10 v11 5 5 6 6 The aim of our work is to focus the crawling on the text rich parts of the web and maximize the number of words in the final corpus per downloaded megabyte. Nevertheless the crawler can be configured to ignore the yield rate of web domains and download from low yield sites too. 7 8 {{{ 9 #!html 10 <a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a> 11 | 12 <a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a> 13 | 14 <a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a> 15 }}} 7 16 8 17 == Get SpiderLing ==