Changes between Version 3 and Version 4 of Justext
- Timestamp:
- 03/24/15 15:58:29 (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Justext
v3 v4 3 3 jusText is a tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages. It is designed to preserve mainly text containing full sentences and it is therefore well suited for creating linguistic resources such as Web corpora. 4 4 5 == What's new== 6 Mišo Belica created a [https://github.com/miso-belica/jusText jusText fork on GitHub] with some tweaks. 5 [http://corpus.tools/browser/justext/CHANGES Changelog] 7 6 8 jusText is now also [https://pypi.python.org/pypi/jusText available on PyPi]. 9 10 [http://corpus.tools/browser/justext/CHANGES Changelog] 7 == How it works == 8 See description of the jusText [Justext/Algorithm algorithm]. 11 9 12 10 == Installation == … … 51 49 [http://nlp.fi.muni.cz/projects/justext/] 52 50 51 == Related links == 52 Mišo Belica created a [https://github.com/miso-belica/jusText jusText fork on GitHub] with some tweaks. 53 54 jusText is also [https://pypi.python.org/pypi/jusText available on PyPi]. 55 53 56 == Acknowledgements == 54 57 This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of [http://www.muni.cz/ Masaryk University in Brno] with financial support from [http://presemt.eu PRESEMT] and [http://www.sketchengine.co.uk Lexical Computing Ltd.] It also relates to Jan Pomikálek's [http://is.muni.cz/th/45523/fi_d/phdthesis.pdf PhD research].