Changes between Version 45 and Version 46 of WikiStart
- Timestamp:
- 08/21/25 14:07:47 (12 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WikiStart
v45 v46 7 7 <table style="border-spacing: 1em"><tr> 8 8 9 <td class="app" style="background-color:# 000080; background-image:url('/chrome/site/justext_nb.png')">9 <td class="app" style="background-color:#DDA0DD ; background-image:url('/chrome/site/justext_nb.png')"> 10 10 <p><a href="/wiki/Justext"> 11 11 JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences.</a><p> … … 19 19 </td> 20 20 21 <td class="app" style="background-color:#8 00000; background-image:url('/chrome/site/chared_nb.png')">21 <td class="app" style="background-color:#87CEEB ; background-image:url('/chrome/site/chared_nb.png')"> 22 22 <p><a href="/wiki/Chared"> 23 23 Chared is a tool for detecting the character encoding of a text in a known language. It contains models for a wide range of languages.</a><p> … … 33 33 </tr><tr> 34 34 35 <td class="app" style="background-color:# 800080; background-image:url('/chrome/site/spiderling_nb.png')">35 <td class="app" style="background-color:#20B2AA ; background-image:url('/chrome/site/spiderling_nb.png')"> 36 36 <p><a href="/wiki/SpiderLing">Spiderling is a web spider for linguistics. It can crawl text-rich parts of the web and collect a lot of data suitable for text corpora. 37 37 </a><p> … … 45 45 </td> 46 46 47 <td class="app" style="background-color:# 008000; background-image:url('/chrome/site/onion_nb.png')">47 <td class="app" style="background-color:#6B8E23 ; background-image:url('/chrome/site/onion_nb.png')"> 48 48 <p><a href="/wiki/Onion"> 49 49 Onion (ONe Instance ONly) is a de-duplicator for large collections of texts. It can measure the similarity of paragraphs or whole documents and drop duplicate ones based on the threshold you set.</a></p> … … 59 59 </tr><tr> 60 60 61 <td class="app" style="background-color:# 808000; background-image:url('/chrome/site/unitok_nb.png')">62 <p ><a href="/wiki/Unitok">61 <td class="app" style="background-color:#9ACD32 ; background-image:url('/chrome/site/unitok_nb.png')"> 62 <p style="color:white;"><a href="/wiki/Unitok"> 63 63 Unitok is a universal text tokeniser with specific settings for many languages. It can turn plain text into a sequence of newline-separated tokens (“vertical” format), while preserving XML-like tags containing metadata.</a></p> 64 64 <p> … … 71 71 </td> 72 72 73 <td class="app" style="background-color:# 008080; background-image:url('/chrome/site/noske_icon_logo_only_white.png')">74 <p ><a href="http://nlp.fi.muni.cz/trac/noske">NoSketch Engine is the open-sourced little brother of the corpus querying system Sketch Engine.73 <td class="app" style="background-color:#9932CC ; background-image:url('/chrome/site/noske_icon_logo_only_white.png')"> 74 <p style="color:white;"><a href="http://nlp.fi.muni.cz/trac/noske">NoSketch Engine is the open-sourced little brother of the corpus querying system Sketch Engine. 75 75 </a><p> 76 76 <p> 77 <a class="lnk" href="https://link.springer.com/article/10.1007%2Fs40607-014-0009-9" >Paper</a>77 <a class="lnk" href="https://link.springer.com/article/10.1007%2Fs40607-014-0009-9" style="color:white;">Paper</a> 78 78 | 79 <a class="lnk" href="/wiki/noske_cite" >Cite</a>79 <a class="lnk" href="/wiki/noske_cite" style="color:white;">Cite</a> 80 80 | 81 <a class="lnk" href="http://www.gnu.org/licenses/gpl2.txt" >Licence</a>81 <a class="lnk" href="http://www.gnu.org/licenses/gpl2.txt" style="color:white;">Licence</a> 82 82 </p> 83 83 </td> … … 86 86 <tr> 87 87 88 <td class="app black" style="background-color:# a7d7f9; background-image: url('/chrome/site/w2c_44.png');">89 <p><a href="/wiki/wiki2corpus" >wiki2corpus is a script which downloads Wikipedia articles (for a given language) and outputs them in the form of prevertical which can be further processed by other corpus tools.88 <td class="app black" style="background-color:#A52A2A; background-image: url('/chrome/site/w2c_44.png');"> 89 <p><a href="/wiki/wiki2corpus" style="color:white;">wiki2corpus is a script which downloads Wikipedia articles (for a given language) and outputs them in the form of prevertical which can be further processed by other corpus tools. 90 90 </a><p> 91 91 92 92 | 93 <a class="lnk" href="https://choosealicense.com/licenses/mit/" >Licence</a>93 <a class="lnk" href="https://choosealicense.com/licenses/mit/" style="color:white;">Licence</a> 94 94 95 95 </td> 96 96 97 <td class="app black" style="background-color:# ff1493; background-image: url('/chrome/site/noske_nb.png');">98 <p><a href="/wiki/languagefilter" >Language Filter is a language discriminating tool. It works with the vertical format. The language of paragraphs and documents is determined according to pre-defined lists of words with corpus frequency.97 <td class="app black" style="background-color:#191970; background-image: url('/chrome/site/noske_nb.png');"> 98 <p><a href="/wiki/languagefilter" style="color:white;">Language Filter is a language discriminating tool. It works with the vertical format. The language of paragraphs and documents is determined according to pre-defined lists of words with corpus frequency. 99 99 </a><p> 100 100 <p> 101 <a class="lnk" href="https://nlp.fi.muni.cz/raslan/raslan19.pdf#page=137" >Paper</a>101 <a class="lnk" href="https://nlp.fi.muni.cz/raslan/raslan19.pdf#page=137" style="color:white;">Paper</a> 102 102 | 103 <a class="lnk" href="/wiki/languagefilter/Cite" >Cite</a>103 <a class="lnk" href="/wiki/languagefilter/Cite" style="color:white;">Cite</a> 104 104 | 105 <a class="lnk" href="http://www.gnu.org/licenses/gpl2.txt" >Licence</a>105 <a class="lnk" href="http://www.gnu.org/licenses/gpl2.txt" style="color:white;">Licence</a> 106 106 </p> 107 107 </td>