| 8 | |
| 9 | <td class="app" style="background-color:#000080 ; background-image:url('/chrome/site/justext_nb.png')"> |
| 10 | <p><a href="/wiki/Justext"> |
| 11 | JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences.</a><p> |
| 12 | <p> |
| 13 | <a class="lnk" href="http://is.muni.cz/th/45523/fi_d/phdthesis.pdf">Paper</a> |
| 14 | | |
| 15 | <a class="lnk" href="/wiki/Justext/Cite">Cite</a> |
| 16 | | |
| 17 | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| 18 | </p> |
| 19 | </td> |
| 20 | |
| 21 | <td class="app" style="background-color:#800000 ; background-image:url('/chrome/site/_nb.png')"> |
| 22 | <p><a href="/wiki/Chared"> |
| 23 | Chared is a tool for detecting the character encoding of a text in a known language. It contains models for a wide range of languages.</a><p> |
| 24 | <p> |
| 25 | <a class="lnk" href="#">Paper</a> |
| 26 | | |
| 27 | <a class="lnk" href="/wiki/Chared/Cite">Cite</a> |
| 28 | | |
| 29 | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
| 30 | </p> |
| 31 | </td> |
| 32 | |
| 33 | </tr><tr> |
| 34 | |
| 35 | <td class="app" style="background-color:#800080 ; background-image:url('/chrome/site/_nb.png')"> |
| 36 | <p><a href="/wiki/SpiderLing">Spiderling is a web spider for linguistics. It can crawl text-rich parts of the web and collect a lot of data suitable for text corpora. |
| 37 | </a><p> |
| 38 | <p> |
| 39 | <a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a> |
| 40 | | |
| 41 | <a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a> |
| 42 | | |
| 43 | <a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a> |
| 44 | </p> |
| 45 | </td> |
33 | | </tr><tr> |
34 | | |
35 | | <td class="app" style="background-color:#000080 ; background-image:url('/chrome/site/justext_nb.png')"> |
36 | | <p><a href="/wiki/Justext"> |
37 | | JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences.</a><p> |
38 | | <p> |
39 | | <a class="lnk" href="http://is.muni.cz/th/45523/fi_d/phdthesis.pdf">Paper</a> |
40 | | | |
41 | | <a class="lnk" href="/wiki/Justext/Cite">Cite</a> |
42 | | | |
43 | | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
44 | | </p> |
45 | | </td> |
46 | | |
47 | | <td class="app" style="background-color:#800080 ; background-image:url('/chrome/site/_nb.png')"> |
48 | | <p><a href="/wiki/SpiderLing">Spiderling is a web spider for linguistics. It can crawl text-rich parts of the web and collect a lot of data suitable for text corpora. |
49 | | </a><p> |
50 | | <p> |
51 | | <a class="lnk" href="http://nlp.fi.muni.cz/~xsuchom2/papers/PomikalekSuchomel_SpiderlingEfficiency.pdf">Paper</a> |
52 | | | |
53 | | <a class="lnk" href="/wiki/SpiderLing/Cite">Cite</a> |
54 | | | |
55 | | <a class="lnk" href="http://www.gnu.org/licenses/gpl.txt">Licence</a> |
56 | | </p> |
57 | | </td> |
58 | | |
59 | | </tr><tr> |
60 | | |
61 | | <td class="app" style="background-color:#808000 ; background-image:url('/chrome/site/_nb.png')"> |
62 | | <p><a href="/wiki/Chared"> |
63 | | Chared is a tool for detecting the character encoding of a text in a known language. It contains models for a wide range of languages.</a><p> |
64 | | <p> |
65 | | <a class="lnk" href="#">Paper</a> |
66 | | | |
67 | | <a class="lnk" href="/wiki/Chared/Cite">Cite</a> |
68 | | | |
69 | | <a class="lnk" href="http://opensource.org/licenses/BSD-3-Clause">Licence</a> |
70 | | </p> |
71 | | </td> |
72 | | |
73 | | <td class="app" style="background-color:#000000 ; background-image:url('/chrome/site/_nb.png')"> |
| 73 | <td class="app" style="background-color:#008080 ; background-image:url('/chrome/site/_nb.png')"> |