conv Package

conv Package

@package eWRT.input.conv

Classes for converting various input formats into each other.

doc Module

@package eWRT.input.conv.doc converts Microsoft Word documents into text

class eWRT.input.conv.doc.HtmlToText[source]

Bases: object

converts HTML into text requires a converter

static getText(word_document_content)[source]

@param[in] word_document_content the content of the html page to convert @param[in] encoding the document encoding @returns the text representation of the Web page

html Module

@package eWRT.input.conv.html converts HTML pages into text

class eWRT.input.conv.html.HtmlToText[source]

Bases: object

converts HTML into text requires a converter

static getText(html_content, encoding='utf8')[source]

@param[in] html_content the content of the html page to convert @param[in] encoding the document encoding @returns the text representation of the Web page

class eWRT.input.conv.html.TestHtmlToText[source]

Bases: object

testBorderCases()[source]
testConversion()[source]

pdf Module

@package eWRT.input.conv.pdf converts PDF documents into text

class eWRT.input.conv.pdf.HtmlToText[source]

Bases: object

converts HTML into text requires a converter

static getText(pdf_content)[source]

@param[in] pdf_content the content of the html page to convert @param[in] encoding the document encoding @returns the text representation of the Web page