| 1 | = Onion = |
| 2 | onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. |
| 3 | |
| 4 | == Installation == |
| 5 | |
| 6 | === Prerequisites === |
| 7 | * 64-bit CPU architecture |
| 8 | * libjudy (>=1.0.5) |
| 9 | |
| 10 | === Configuration and installation === |
| 11 | 1. Download the sources: |
| 12 | {{{ |
| 13 | wget -O onion-1.2.tar.gz 'https://docs.google.com/uc?authuser=0&id=0B4SxKw5O_gLHUXZhOHBzUDNwcXM&export=download' |
| 14 | }}} |
| 15 | 2. Extract the downloaded file: |
| 16 | {{{ |
| 17 | tar xzvf onion-1.2.tar.gz |
| 18 | }}} |
| 19 | 3. Configure the package by editing onion-1.2/Makefile.config: |
| 20 | * set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed |
| 21 | * if you have libjudy installed in a non-standard path you need to: |
| 22 | * set JUDY_INC to where Judy.h is located |
| 23 | * set JUDY_LIB to where libJudy.a is located |
| 24 | 4. Install the package (you may need sudo or a root shell for the last command): |
| 25 | {{{ |
| 26 | cd onion-1.2/ |
| 27 | make |
| 28 | make install |
| 29 | }}} |
| 30 | |
| 31 | |
| 32 | == Quick start == |
| 33 | {{{ |
| 34 | onion -s <documents.vert >deduplicated_documents.vert |
| 35 | }}} |
| 36 | |
| 37 | There's also an usage example on a sample input. |
| 38 | |
| 39 | For usage information see: |
| 40 | {{{ |
| 41 | onion -h |
| 42 | man onion |
| 43 | }}} |
| 44 | |
| 45 | == Acknowledgements == |
| 46 | This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of Masaryk University in Brno with a financial support from PRESEMT and Lexical Computing Ltd. It also relates to author's PhD research. |