Changes between Version 1 and Version 2 of Unitok


Ignore:
Timestamp:
02/10/15 12:16:50 (10 years ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Unitok

    v1 v2  
    11= unitok =
     2
     3* splits input text into tokens (one token per line)
     4* recognizes URLs, e-mail addreses, DNS domains, IP addresses
     5* for specified languages recognizes abbreviations and clictics (such as 've or n't in English)
     6* preserves XML-like tags
     7* replaces entities with unicode equivalents
     8* adds glue (<g/>) tags between tokens not separated by space
     9
     10== Get unitok ==
     11See [wiki:Downloads] for the current version.