Changes between Version 7 and Version 8 of Onion


Ignore:
Timestamp:
07/30/15 12:31:59 (9 years ago)
Author:
admin
Comment:

usage

Legend:

Unmodified
Added
Removed
Modified
  • Onion

    v7 v8  
    4343}}}
    4444
     45== Usage ==
     46{{{onion [OPTIONS] [FILE]}}}
     47
     48Mark duplicate text parts in the input vertical file.
     49{{{
     50 -f FILE   hashes of duplicate n-grams
     51 -n NUM    n-gram length (default: 5)
     52 -t NUM    duplicate content threshold (default: 0.5)
     53 -d STR    document tag (default: doc)
     54 -p STR    paragraph tag (default: p)
     55 -s        strip duplicate parts (rather than mark)
     56 -m        no smoothing
     57 -T NUM    trim n-gram hashes to NUM bits (default: 64)
     58 -l NUM    max stub length (default: 20)
     59 -b NUM    buffer size, in bytes (default: 16777216)
     60 -q        quiet; suppress all output except for errors
     61 -V        print version information and exit
     62 -h        display this help and exit
     63}}}
     64With no FILE, or when FILE is -, read standard input. Output is written to standard output.
     65
    4566== Acknowledgements ==
    4667This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of [http://www.muni.cz/ Masaryk University in Brno] with financial support from [http://presemt.eu PRESEMT] and [http://www.sketchengine.co.uk Lexical Computing Ltd.] It also relates to Jan Pomikálek's [http://is.muni.cz/th/45523/fi_d/phdthesis.pdf PhD research].