Changes between Version 8 and Version 9 of WikiStart


Ignore:
Timestamp:
02/22/15 13:33:09 (10 years ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v8 v9  
    11{{{
    22#!html
    3 <style>
    4 .tydyt {
    5   padding: 1em
    6 }
    7 </style>
    8 <div class="tydyt" style="background-color:#800000 ; width:49% ; color:white ; float:left ; border-radius:20px">
     3<table style="color:white; border-spacing: 1em"><tr>
     4
     5<td style="background-color:#008000 ; width:50% ; border-radius:20px ; padding:2em ; margin: .5em">
     6onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts
     7</td>
     8
     9<td style="background-color:#800000 ; width:50% ; border-radius:20px ; padding:2em">
    910unitok is a universal text tokeniser
    10 </div>
    11 <div style="float:clear">&nbsp;</div>
     11</td>
     12
     13</tr><tr>
     14
     15<td style="background-color:#0080ff ; width:50% ; border-radius:20px ; padding:2em">
     16justext is a tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages. It is designed to preserve mainly text containing full sentences and it is therefore well suited for creating linguistic resources such as Web corpora
     17</td>
     18<td></td>
     19</tr>
     20</table>
    1221}}}
    13 
    14 * [http://corpus.tools/wiki/Unitok unitok] is a universal text tokenizer
    15 * [http://corpus.tools/wiki/Justext jusText] is a tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages. It is designed to preserve mainly text containing full sentences and it is therefore well suited for creating linguistic resources such as Web corpora.
    16 * [http://corpus.tools/wiki/Onion onion] (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts.