Changes between Version 12 and Version 13 of SpiderLing
- Timestamp:
- 07/23/21 14:06:34 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SpiderLing
v12 v13 16 16 17 17 == Get SpiderLing == 18 Download [https:// nlp.fi.muni.cz/projects/spiderling/the latest version]. Please note the software is distributed as is, without a guaranteed support.18 Download [https://corpus.tools/raw-attachment/wiki/Downloads/spiderling-src-2.0.tar.xz the latest version]. Please note the software is distributed as is, without a guaranteed support. 19 19 20 20 == Publications == … … 75 75 - pdftotext (from poppler-utils), 76 76 - ps2ascii (from ghostscript-core), 77 - antiword (from antiword), 77 - antiword (from antiword + perl-Time-HiRes), 78 - odfpy, 78 79 - nice (coreutils) (optional), 79 80 - ionice (util-linux) (optional), … … 88 89 Recommended hardware configuration (crawling ~30 bn words of English text): 89 90 - 8-32 core CPU (the more CPUs the faster the processing of crawled data), 90 - 32- 256GB system memory91 - 32-512 GB system memory 91 92 (the more RAM the more domains kept in memory and thus more webs visited), 92 93 - lots of storage space, … … 105 106 106 107 == Installation == 107 - unpack ,108 - unpack: tar -xJvf spiderling-src-*.tar.xz, 108 109 - install required tools, see install_rpm.sh for rpm based systems 109 110 - check importing the following dependences by pypy3/python3: … … 119 120 - raise ulimit -n accoring to MAX_OPEN_CONNS; 120 121 - then increase MAX_OPEN_CONNS and OPEN_AT_ONCE; 121 - configure language dependent settings.122 - configure language and TLD dependent settings. 122 123 123 124 == Language models for all recognised languages ==