Changes between Version 1 and Version 2 of Unitok
- Timestamp:
- 02/10/15 12:16:50 (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Unitok
v1 v2 1 1 = unitok = 2 3 * splits input text into tokens (one token per line) 4 * recognizes URLs, e-mail addreses, DNS domains, IP addresses 5 * for specified languages recognizes abbreviations and clictics (such as 've or n't in English) 6 * preserves XML-like tags 7 * replaces entities with unicode equivalents 8 * adds glue (<g/>) tags between tokens not separated by space 9 10 == Get unitok == 11 See [wiki:Downloads] for the current version.