| | 1 | = Onion = |
| | 2 | onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. |
| | 3 | |
| | 4 | == Installation == |
| | 5 | |
| | 6 | === Prerequisites === |
| | 7 | * 64-bit CPU architecture |
| | 8 | * libjudy (>=1.0.5) |
| | 9 | |
| | 10 | === Configuration and installation === |
| | 11 | 1. Download the sources: |
| | 12 | {{{ |
| | 13 | wget -O onion-1.2.tar.gz 'https://docs.google.com/uc?authuser=0&id=0B4SxKw5O_gLHUXZhOHBzUDNwcXM&export=download' |
| | 14 | }}} |
| | 15 | 2. Extract the downloaded file: |
| | 16 | {{{ |
| | 17 | tar xzvf onion-1.2.tar.gz |
| | 18 | }}} |
| | 19 | 3. Configure the package by editing onion-1.2/Makefile.config: |
| | 20 | * set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed |
| | 21 | * if you have libjudy installed in a non-standard path you need to: |
| | 22 | * set JUDY_INC to where Judy.h is located |
| | 23 | * set JUDY_LIB to where libJudy.a is located |
| | 24 | 4. Install the package (you may need sudo or a root shell for the last command): |
| | 25 | {{{ |
| | 26 | cd onion-1.2/ |
| | 27 | make |
| | 28 | make install |
| | 29 | }}} |
| | 30 | |
| | 31 | |
| | 32 | == Quick start == |
| | 33 | {{{ |
| | 34 | onion -s <documents.vert >deduplicated_documents.vert |
| | 35 | }}} |
| | 36 | |
| | 37 | There's also an usage example on a sample input. |
| | 38 | |
| | 39 | For usage information see: |
| | 40 | {{{ |
| | 41 | onion -h |
| | 42 | man onion |
| | 43 | }}} |
| | 44 | |
| | 45 | == Acknowledgements == |
| | 46 | This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of Masaryk University in Brno with a financial support from PRESEMT and Lexical Computing Ltd. It also relates to author's PhD research. |