source: spiderling

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Graph Diff Rev Age Author Log Message
(edit) @7aa1fc0   3 years vit.suchomel v. 0.77
(edit) @9d3fc33   3 years vit.suchomel Catch robot parser failures
(edit) @fae3ce6   3 years vit.suchomel Loading domains -- nicer code
(edit) @0a801f1   3 years vit.suchomel Symlink to robot parser, new robot parser
(edit) @403516f   3 years vit.suchomel Catch all exceptions when decoding html data
(edit) @da8286a   3 years vit.suchomel Failsafe Justext exception logging
(edit) @db6df1f   3 years vit.suchomel Version 0.76
(edit) @3492869   3 years vit.suchomel util/remove_duplicates.py rewritten to fix a serious bug - some …
(edit) @be862ea   3 years vit.suchomel TODO update
(edit) @b29382d   3 years vit.suchomel Version 0.75
(edit) @e4e472e   3 years vit.suchomel Fixed FORCE_ENCODING missing in config
(edit) @eb08f47   3 years vit.suchomel Chared path made configurable
(edit) @3ece6e9   3 years vit.suchomel Agent url, '+' before the url to allow recognizing a bot
(edit) @d5cfc92   3 years vit.suchomel Wrong meta encoding TypeError? caught
(edit) @39d1612   3 years vit.suchomel Better estimate of RAM cost of big crawls
(edit) @a9b773e   3 years vit.suchomel DNS resolver count configurable
(edit) @1fdbcec   3 years vit.suchomel TODO update
(edit) @f6c0108   3 years vit.suchomel Version 0.74
(edit) @3bb6ebd   3 years vit.suchomel Typos/corrections
(edit) @ecdcef3   3 years vit.suchomel simple_decode to force encoding in process.py
(edit) @50a1c93   3 years vit.suchomel Config file notes update
(edit) @6d5947c   3 years vit.suchomel Merge branch 'master' of …
(edit) @952cd37   3 years vit.suchomel Important configuration moved to util/config.py
(edit) @3fb2f9e   3 years vit.suchomel Important configuration moved to util/config.py
(edit) @ae00451   3 years vit.suchomel English TLDs
(edit) @981c414   3 years vit.suchomel TODO added
(edit) @ae502f3   4 years vit.suchomel version 0.73
(edit) @91f5f3e   4 years vit.suchomel Blacklist of web domains
(edit) @728657c   4 years vit.suchomel Correct README and improve usability a bit - thanks to Prof. Nikola …
(edit) @9d63661   4 years vit.suchomel Enable using allowed_non_country_domains
(edit) @0c504ea   4 years vit.suchomel Read savepoint id from sys.argv[2]
(edit) @cf627f4   4 years vit.suchomel version 0.72.1
(edit) @a3b8577   4 years vit.suchomel Catch urlparse invalid URL exceptions also make_whole_url removed …
(add) @0d534b9   4 years vit.suchomel Initial commit
Note: See TracRevisionLog for help on using the revision log.