Markus Diem: Recognizing Degraded Handwritten Characters
Abschlussvortrag DA
| What |
|
|---|---|
| When |
Mar 04, 2010 from 02:15 pm to 02:35 pm |
| Where | Sem 183/2 |
| Add event to calendar |
|
In this thesis a new character recognition system is proposed that can handle degraded manuscript documents which were discovered at the St. Catherine's Monastery. In contrast to state of the art OCR systems, no early decision namely the image binarization needs to be performed. Thus, an object recognition methodology is adapted for the recognition of ancient manuscripts. Therefore interest points are extracted which allow for the computation of local descriptors. These are directly classified using a SVM with one against all tests.
In order to localize characters interest points that represent whole characters are found by means of a scale distribution histogram. Then the remaining interest points are clustered using a k-means which is initialized with the previously selected interest points. Finally a voting scheme is applied where the local descriptors' class probabilities which were assigned after the classification are accumulated to a single class probability histogram of each character cluster. This histogram does not solely allow for a hard decision, but can be presented to human experts who can decide the character class for hardly readable characters according to the probabilities obtained.
The system was evaluated on three different dataset namely a synthetic with Latin script, degraded characters and real world data. The system achieves a F0.5-score of 0.77 on the last dataset mentioned.
In order to localize characters interest points that represent whole characters are found by means of a scale distribution histogram. Then the remaining interest points are clustered using a k-means which is initialized with the previously selected interest points. Finally a voting scheme is applied where the local descriptors' class probabilities which were assigned after the classification are accumulated to a single class probability histogram of each character cluster. This histogram does not solely allow for a hard decision, but can be presented to human experts who can decide the character class for hardly readable characters according to the probabilities obtained.
The system was evaluated on three different dataset namely a synthetic with Latin script, degraded characters and real world data. The system achieves a F0.5-score of 0.77 on the last dataset mentioned.
