= Onion = onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts. == Installation == === Prerequisites === * 64-bit CPU architecture * libjudy (>=1.0.5) === Configuration and installation === 1. Download the sources: {{{ wget -O onion-1.2.tar.gz 'https://corpus.tools/attachment/wiki/Downloads/onion-1.2.tar.gz' }}} 2. Extract the downloaded file: {{{ tar xzvf onion-1.2.tar.gz }}} 3. Configure the package by editing onion-1.2/Makefile.config: * set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed * if you have libjudy installed in a non-standard path you need to: * set JUDY_INC to where Judy.h is located * set JUDY_LIB to where libJudy.a is located 4. Install the package (you may need sudo or a root shell for the last command): {{{ cd onion-1.2/ make make install }}} == Quick start == {{{ onion -s deduplicated_documents.vert }}} There's also an [Onion/UsageExample usage example} on a sample input. For usage information see: {{{ onion -h man onion }}} == Acknowledgements == This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of [http://www.muni.cz/ Masaryk University in Brno] with financial support from [http://presemt.eu PRESEMT] and [http://www.sketchengine.co.uk Lexical Computing Ltd.] It also relates to Jan Pomikálek's [http://is.muni.cz/th/45523/fi_d/phdthesis.pdf PhD research].