class GreekOCR

Last modified: January 05, 2015

Contents

GreekOCR

In module gamera.toolkits.greekocr.greekocr

Provides the functionality for GreekOCR. The following parameters control the recognition process:

cknn
The kNNInteractive classifier.
mode
The mode for dealing with accents. Can be wholistic or separatistic.

__init__

Signature:

init (mode="wholistic")

where mode can be "wholistic" or "separatistic".

load_trainingdata

Loads the training data. Signature:

load_trainingdata (trainfile)

where trainfile is an Gamera XML file containing training data. Make sure that the training file matches the mode (wholistic or separatistic).

get_page_glyphs

Returns a list of segmented CCs using the selected segmentation approach on the given image. This list can be used for creating training data. Signature:

get_page_glyphs (image)

where image is a Gamera image.

process_image

Recognizes the given image and returns the recognized text as Unicode string. Signature:

process_image (image)

where image is a Gamera image. The recognized text is additionally stored in the GreekOCR property output, which can subsequently be written to a file with save_text_unicode or save_text_teubner.

Make sure that you have called load_trainingdata before!

save_debug_images

Saves the following images to the current working directory:

debug_lines.png
Has a frame drawn around each detected line.
debug_chars.png
Has a frame drawn around each detected character.
debug_words.png
Has a frame drawn around each detected word.

save_text_unicode

Stores the recognized text to the given filename as Unicode string. Signature

save_text_unicode(filename)

Make sure that you have called process_image before!

save_text_teubner

Stores the recognized text to the given filename as a LaTeX document utilizing the Teubner style for representing Greek characters and accents. Signature

save_text_teubner(filename)

Make sure that you have called process_image before!