Welcome to eWRT - Extensible Web Retrieval Toolkit’s documentation!¶

Knowledge capture in the age of massive Web data requires robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data. The Extensible Web Retrieval Toolkit (eWRT) is modular open-source Python API that addresses this requirement. It retrieves social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia, including various helper classes for effective caching and data management. The toolkit provides components for content acquisition and caching, low-level natural language processing functionalities such as language detection, phonetic string similarity measures, and methods for string normalization.

eWRT has been jointly developed by researchers from MODUL University Vienna, webLyzard technology, the University of Applied Sciences Chur, and the Vienna University of Economics and Business. The library is currently being extended as part of the uComp Project, which investigates Embedded Human Computation for Knowledge Extraction and Evaluation.

Checkout: eWRT on gitweb.

Contents:

Welcome to eWRT - Extensible Web Retrieval Toolkit’s documentation!¶

Indices and tables¶