= Onion = onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. == Installation == === Prerequisites === * 64-bit CPU architecture * libjudy (>=1.0.5) === Configuration and installation === 1. Download the sources: {{{ wget -O onion-1.2.tar.gz 'https://docs.google.com/uc?authuser=0&id=0B4SxKw5O_gLHUXZhOHBzUDNwcXM&export=download' }}} 2. Extract the downloaded file: {{{ tar xzvf onion-1.2.tar.gz }}} 3. Configure the package by editing onion-1.2/Makefile.config: * set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed * if you have libjudy installed in a non-standard path you need to: * set JUDY_INC to where Judy.h is located * set JUDY_LIB to where libJudy.a is located 4. Install the package (you may need sudo or a root shell for the last command): {{{ cd onion-1.2/ make make install }}} == Quick start == {{{ onion -s deduplicated_documents.vert }}} There's also an usage example on a sample input. For usage information see: {{{ onion -h man onion }}} == Acknowledgements == This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of Masaryk University in Brno with a financial support from PRESEMT and Lexical Computing Ltd. It also relates to author's PhD research.