Changes between Version 3 and Version 4 of Justext


Ignore:
Timestamp:
03/24/15 15:58:29 (11 years ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Justext

    v3 v4  
    33jusText is a tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages. It is designed to preserve mainly text containing full sentences and it is therefore well suited for creating linguistic resources such as Web corpora.
    44
    5 == What's new==
    6 Mišo Belica created a [https://github.com/miso-belica/jusText jusText fork on GitHub] with some tweaks.
     5[http://corpus.tools/browser/justext/CHANGES Changelog]
    76
    8 jusText is now also [https://pypi.python.org/pypi/jusText available on PyPi].
    9 
    10 [http://corpus.tools/browser/justext/CHANGES Changelog]
     7== How it works ==
     8See description of the jusText [Justext/Algorithm algorithm].
    119
    1210== Installation ==
     
    5149[http://nlp.fi.muni.cz/projects/justext/]
    5250
     51== Related links ==
     52Mišo Belica created a [https://github.com/miso-belica/jusText jusText fork on GitHub] with some tweaks.
     53
     54jusText is also [https://pypi.python.org/pypi/jusText available on PyPi].
     55
    5356== Acknowledgements ==
    5457This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of [http://www.muni.cz/ Masaryk University in Brno] with financial support from [http://presemt.eu PRESEMT] and [http://www.sketchengine.co.uk Lexical Computing Ltd.] It also relates to Jan Pomikálek's [http://is.muni.cz/th/45523/fi_d/phdthesis.pdf PhD research].