Version 1 (modified by 10 years ago) ( diff ) | ,
---|
Onion
onion (ONe Instance ONly) is a tool for removing duplicate parts from large collections of texts.
Installation
Prerequisites
- 64-bit CPU architecture
- libjudy (>=1.0.5)
Configuration and installation
- Download the sources:
wget -O onion-1.2.tar.gz 'https://docs.google.com/uc?authuser=0&id=0B4SxKw5O_gLHUXZhOHBzUDNwcXM&export=download'
- Extract the downloaded file:
tar xzvf onion-1.2.tar.gz
- Configure the package by editing onion-1.2/Makefile.config:
- set PREFIX (or INSTALL_BIN and INSTALL_DATA) according to where you want the executables and data (docs) installed
- if you have libjudy installed in a non-standard path you need to:
- set JUDY_INC to where Judy.h is located
- set JUDY_LIB to where libJudy.a is located
- Install the package (you may need sudo or a root shell for the last command):
cd onion-1.2/ make make install
Quick start
onion -s <documents.vert >deduplicated_documents.vert
There's also an usage example on a sample input.
For usage information see:
onion -h man onion
Acknowledgements
This software has been developed at the Natural Language Processing Centre of Masaryk University in Brno with a financial support from PRESEMT and Lexical Computing Ltd. It also relates to author's PhD research.