= Onion =
onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts.
== Licence ==
Onion is licensed under the BSD 3-Clause License.
== Installation ==
=== Prerequisites ===
* 64-bit CPU architecture
* libjudy (>=1.0.5)
=== Configuration and installation ===
1. Download the sources:
{{{
wget -O onion-1.2.tar.gz 'https://corpus.tools/attachment/wiki/Downloads/onion-1.2.tar.gz'
}}}
2. Extract the downloaded file:
{{{
tar xzvf onion-1.2.tar.gz
}}}
3. Configure the package by editing onion-1.2/Makefile.config:
* set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed
* if you have libjudy installed in a non-standard path you need to:
* set JUDY_INC to where Judy.h is located
* set JUDY_LIB to where libJudy.a is located
4. Install the package (you may need sudo or a root shell for the last command):
{{{
cd onion-1.2/
make
make install
}}}
== Quick start ==
{{{
onion -s deduplicated_documents.vert
}}}
There's also an [Onion/UsageExample usage example] on a sample input.
For usage information see:
{{{
onion -h
man onion
}}}
== Acknowledgements ==
This software has been developed at the [http://nlp.fi.muni.cz/en/nlpc Natural Language Processing Centre] of [http://www.muni.cz/ Masaryk University in Brno] with financial support from [http://presemt.eu PRESEMT] and [http://www.sketchengine.co.uk Lexical Computing Ltd.] It also relates to Jan Pomikálek's [http://is.muni.cz/th/45523/fi_d/phdthesis.pdf PhD research].