source: unitok

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Graph Diff Rev Age Author Log Message
(edit) @3ac1078   2 years jan.michelfeit Fix hexadecimal HTML entities
(edit) @c351917   2 years jan.michelfeit Importable configs + unitok_old for backward compatibility three
(edit) @312f632   2 years jan.michelfeit Removed unused PHONE_NUMBER_RE three
(edit) @45ad684   2 years jan.michelfeit Remove non-canonical characters three
(edit) @25bf963   2 years jan.michelfeit Remove unused REs three
(edit) @624aef7   2 years jan.michelfeit Configuration file for each language three
(edit) @7dfa151   2 years jan.michelfeit Refactored glue_tokens into print_tokens three
(edit) @32ee926   2 years jan.michelfeit Token types and debug mode three
(edit) @a291d5c   2 years jan.michelfeit This is unitok v3 expects clean text in UTF-8, stream-based by … three
(edit) @5db84df   2 years vit.suchomel Abbreviation matching corrected threetwo
(edit) @d9ee163   2 years jan.michelfeit Only import uninorm if necessary threetwo
(edit) @6757ff2   2 years vit.suchomel Support for Scottish Gaelic correction threetwo
(edit) @053a506   2 years vit.suchomel Support for Scottish Gaelic threetwo
(edit) @0379774   2 years jan.michelfeit Slovak data threetwo
(edit) @6bbd219   2 years vit.suchomel Abbreviations with ignore case (those from Wikipedia, {en, de, fr} … threetwo
(edit) @a32d93e   2 years vit.suchomel Support for the Devanagari script threetwo
(edit) @66b8368   2 years vit.suchomel Support for minus-hyphens in abbreviations threetwo
(edit) @8985734   2 years jan.michelfeit Longer TLDs take precedence (also updated) threetwo
(edit) @166252d   3 years jan.michelfeit More quote-like characters threetwo
(edit) @3849ab2   3 years jan.michelfeit Hyphens aren't dashes threetwo
(edit) @ccceab4   3 years jan.michelfeit Prepared for distutils threetwo
(edit) @9ea037e   3 years jan.michelfeit Normalise function threetwo
(edit) @f55a9e7   3 years jan.michelfeit Reasonable B52 switching threetwo
(edit) @a38cdba   3 years jan.michelfeit Glue tokens function threetwo
(edit) @67d89cc   3 years jan.michelfeit Unitok uses fileinput threetwo
(edit) @82e8a10   3 years jan.michelfeit Unitok uses argparse threetwo
(edit) @ce3d977   3 years xmichelf Untangled unitok code threetwo
(edit) @7796add   3 years xmichelf Too long line broke vim syntax highlighting :) threetwo
(edit) @a5e3aa9   3 years xmichelf unitok uses functions from uninorm threetwo
(edit) @d770224   3 years xmichelf uninorm added threetwo
(edit) @a677a92   3 years xmichelf Unitok from ske-toolkit 2.0.8 threetwo
(edit) @d966580   3 years vit.suchomel vertfrac uses read_big_structures threetwo
(edit) @0a17100   3 years vit.suchomel Filter text types from an input vertical threetwo
(edit) @c2695d1   3 years vit.suchomel arbtokeniser, hebtokeniser made operating streams The input is loaded … threetwo
(edit) @4156487   3 years vit.suchomel Unitok Yoruba corrections threetwo
(edit) @2414f29   3 years vit.suchomel Yoruba tokenisation support improvement threetwo
(edit) @bb83a4f   3 years vit.suchomel Yoruba tokenisation support threetwo
(edit) @2ed6775   3 years jan.michelfeit Updated TLD list threetwo
(edit) @3b38a87   3 years jan.michelfeit Don't crash on out-of-range entities, remove control chars after threetwo
(edit) @4116384   3 years vit.suchomel Maldivian word RE correction threetwo
(edit) @20bc390   3 years vit.suchomel Maldivian Thaana tokenisation support threetwo
(edit) @a51e655   3 years vit.suchomel Unitok - token matching expressions reordered Abbreviation in URL -- … threetwo
(edit) @064b482   3 years jan.michelfeit Noncombining grave and acute normalized to ' threetwo
(edit) @5980d3f   3 years jan.michelfeit normal_quotes executable threetwo
(edit) @08c7bc3   3 years jan.michelfeit Fixed verthead threetwo
(edit) @8c58266   3 years jan.michelfeit Normalize quotes threetwo
(edit) @9fcb2cc   3 years vit.suchomel New handling of abbreviations (aka "B52 new") integrated in unitok … threetwo
(edit) @61d190a   3 years jan.michelfeit Fix vert2plain imports (Better would be putting unitok in python packages) threetwo
(edit) @3900f5f   3 years jan.michelfeit Merge branch 'master' of toad4:/corpora/src/git/ske-toolkit threetwo
(edit) @2b8253a   3 years jan.michelfeit Put everything in PATH threetwo
(edit) @af214ab   3 years jan.michelfeit Fold vert.py into the scripts using it threetwo
(edit) @cfbaf01   3 years jan.michelfeit End sentences on chinese ? and ! threetwo
(edit) @c55d56f   3 years vit.baisa vertfrac: you can specify structure of vert threetwo
(edit) @7020f92   3 years jan.michelfeit Added cleanup.py threetwo
(edit) @4f2c105   3 years vit.suchomel Configurable word definition per language Support for Hindi words threetwo
(edit) @3ce72b2   3 years jan.michelfeit Fixed coding in unitok.py threetwo
(edit) @b13796b   3 years jan.michelfeit Czech abbreviations threetwo
(edit) @7e6b7e0   4 years jan.michelfeit Allow shell commands in vertchain threetwo
(edit) @8cdeecc   4 years jan.michelfeit Fixed vertchain.py discarding rest of input buffer threetwo
(edit) @39b1011   4 years jan.michelfeit vertchain.py reading more carefully threetwo
(edit) @ad3ff58   4 years jan.michelfeit Added vertchain.py threetwo
(edit) @4859c7e   4 years jan.michelfeit Handle empty lines in tag_sentences.py threetwo
(edit) @9fe2b2e   4 years jan.michelfeit Allow pseudo-apostrophes in english clictics threetwo
(edit) @cdeaf92   4 years jan.michelfeit Tokenisers for hebrew and arabic, wrapper scripts threetwo
(edit) @25b1cfd   4 years jan.michelfeit Consistent naming threetwo
(edit) @a2e07af   4 years jan.michelfeit New vertsplit.py threetwo
(edit) @69e6079   4 years jan.michelfeit Updates from alba threetwo
(edit) @3b861ac   4 years jan.michelfeit New tag_sentences.py threetwo
(edit) @8ab6e6c   4 years xmichelf New, clean repository threetwo
(edit) @f7ae51a   4 years xmichelf Clean-up threetwo
(edit) @9dbc158   4 years xmichelf Add vert_frac.py threetwo
(edit) @902c8eb   4 years xmichelf Move stc_wrapper to /opt threetwo
(edit) @48aee14   5 years xpomikal hashws.sh does not recreate hashes if already created threetwo
(edit) @23020d1   5 years xpomikal Added hashws.sh threetwo
(edit) @88f8048   5 years xpomikal Removing ize2ise.pl threetwo
(edit) @8f3665f   5 years xpomikal Added ize2ise.pl (archiving the code, will be removed) threetwo
(edit) @5f5e39e   5 years xpomikal Added sconll2wmap.py and sconll2sketch.sh threetwo
(edit) @19849b6   5 years xpomikal Removed session2user.py (no longer needed) threetwo
(edit) @3b31421   5 years xpomikal Adding session2user.py so that the code is available somewhere in case … threetwo
(edit) @eaf054a   5 years xpomikal Removed stoplists. Instead use: … threetwo
(edit) @6535ec5   5 years xpomikal Added escape_tags.py and unescape_tags.py threetwo
(edit) @908bc1c   5 years xpomikal Added vertfork.py threetwo
(edit) @111a020   5 years xpomikal Modified the REGEXP for matching pronouns in make_lempos_tt-portuguese.py threetwo
(edit) @8bf938f   5 years xpomikal Made vert2plain.py executable threetwo
(edit) @6f5e2b6   5 years honza.pomikalek Removed justext.py from this repository. It's now a stand-alone … threetwo
(edit) @fb55897   5 years honza.pomikalek Added stc_wrapper.py (Stanford Chinese Segmenter and Tagger Wrapper) threetwo
(edit) @329a03b   5 years jan.pomikalek Added onion_pp.py threetwo
(edit) @c650ec7   6 years jan.pomikalek xmlize.py: minor fix in help text threetwo
(edit) @3ee18a1   6 years jan.pomikalek added xmlize.py threetwo
(edit) @24d2b48   6 years jan.pomikalek Added Czech model to unitok.py tailored to Vojta's and marx's needs. threetwo
(edit) @335a92b   6 years jan.pomikalek Minor bug fix: Make sure that a number with thousand separators is a … threetwo
(edit) @bd6a67f   6 years jan.pomikalek Added verthead.py: Returns the first n structures from a vertical file. threetwo
(edit) @ff158fa   6 years jan.pomikalek Added vert.py threetwo
(edit) @6244bc6   6 years jan.pomikalek Added make_lempos for Spanish. threetwo
(edit) @ed69daf   6 years jan.pomikalek Added make_lempos for Portuguese. threetwo
(edit) @351a004   6 years jan.pomikalek - Inbuilt stoplists can be listed and used. - "None" stoplist can be … threetwo
(edit) @4ef7045   6 years jan.pomikalek max-heading-distance can now be specified as a parameter threetwo
(edit) @337f63d   6 years jan.pomikalek A few corrections in the documentation. threetwo
(edit) @7ecf436   6 years jan.pomikalek - Made the code for classification of headings less confusing. - The … threetwo
(edit) @ed264ae   6 years jan.pomikalek - During stoplist lookup, use a simpler way of tokenisation -- just … threetwo
Note: See TracRevisionLog for help on using the revision log.