| 1 | Apr 11 2018 |
| 2 | Norwegian joined wordlist |
| 3 | Apr 11 2018 |
| 4 | More wordlists |
| 5 | Sep 11 2017 |
| 6 | Lowercased stoplist |
| 7 | Aug 24 2017 |
| 8 | New and updated wordlists |
| 9 | Aug 24 2017 |
| 10 | Justext 1.4 |
| 11 | Aug 24 2017 |
| 12 | Web demo |
| 13 | Aug 24 2017 |
| 14 | max_good_distance, a new context classification parameter |
| 15 | Maximum distance (in paragraphs) of a short paragraph from a good |
| 16 | paragraph to re-classify the short paragraph as good. |
| 17 | Jun 30 2017 |
| 18 | Minor package updates |
| 19 | Jun 30 2017 |
| 20 | Justext 1.3 |
| 21 | Jun 29 2017 |
| 22 | Preprocess split to get_html_root and preprocess_html_root |
| 23 | Allows using the DOM root before the head (and other possibly useful |
| 24 | elements) are removed. Needed to get the page title from the head. |
| 25 | Apr 12 2017 |
| 26 | new README |
| 27 | Apr 12 2017 |
| 28 | filter out HTML(5) elements |
| 29 | Feb 24 2017 |
| 30 | remove words containing Latin characters from Korean stoplist |
| 31 | Jan 12 2015 |
| 32 | Move * out of trunk/ |
| 33 | Nov 11 2012 |
| 34 | Temporary workaround for issue #2: Remove any text nodes that cannot be decoded. |
| 35 | Jan 26 2012 |
| 36 | Added stoplists for Kazakh, Kyrgyz, Turkmen and Uzbek. |
| 37 | Dec 6 2011 |
| 38 | Fixed inserting spaces between text nodes. Before, content such as "abc<b>efg</b>" became "abc efg" after processing. Now it correctly becomes "abcefg". |
| 39 | Aug 8 2011 |
| 40 | jusText 1.2 |
| 41 | Aug 8 2011 |
| 42 | Edited wiki page Algorithm through web user interface. |
| 43 | Aug 4 2011 |
| 44 | Use character counts instead of word counts where possible (length-low, length-high, max-heading-distance and for computing link density). This is to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc). The default thresholds have been adjusted correspondingly. |
| 45 | Aug 4 2011 |
| 46 | More robust parsing of meta tags containing the information about used charset. |
| 47 | Jun 6 2011 |
| 48 | Bug fix: Corrected decoding of HTML entities € to Ÿ |
| 49 | Mar 28 2011 |
| 50 | Edited wiki page Algorithm through web user interface. |
| 51 | Mar 28 2011 |
| 52 | Edited wiki page Algorithm through web user interface. |
| 53 | Mar 23 2011 |
| 54 | Edited wiki page Algorithm through web user interface. |
| 55 | Mar 17 2011 |
| 56 | Edited wiki page Algorithm through web user interface. |
| 57 | Mar 9 2011 |
| 58 | Edited wiki page Algorithm through web user interface. |
| 59 | Mar 9 2011 |
| 60 | Edited wiki page Algorithm through web user interface. |
| 61 | Mar 9 2011 |
| 62 | Edited wiki page Algorithm through web user interface. |
| 63 | Mar 9 2011 |
| 64 | Edited wiki page Algorithm through web user interface. |
| 65 | Mar 9 2011 |
| 66 | Created wiki page through web user interface. |
| 67 | Mar 9 2011 |
| 68 | jusText 1.1 |
| 69 | Mar 9 2011 |
| 70 | Initial import. |