In this activity, we aim to be able to extract text (handwritten or typed) using image processing techniques that we have learned. The image in Figure 1 is the source of the text we need to extract.
Figure 1. Image of a document from which text will be extracted
The image is tilted so it was rotated using Gimp 2.8. Using the same software, I selected a portion of the image and cropped it (Figure 2a). The image was then loaded in Scilab 4.1.2 and converted to grayscale for image processing. The grayscale image is shown in Figure 2b.
Figure 2. (a) Cropped portion from the rotated image of the document
(b) grayscale version of the patch in (a)
The first task was to remove the lines, to do this, I took the fftshifted FT of the grayscale image and multiplied it by a mask to filter out the higher order frequencies that contribute to the lines. I then took the inverse FT to get the image with the lines removed. I then binarized the image and inverted it so that I can clean the image using morphological operations. Figure 3 shows the (a) FT of the grayscale image (with the masked center for visibility of other frequencies), (b)mask used to remove the lines and (c) the binarized and inverted of the masked image.
Figure 3. (a) FT of the grayscale version of the selected patch (masked zero order for visibility of other frequencies) (b) Mask used to remove lines (c) Binarized and inverted version of the resulting image after implementing the mask in (b) to remove the lines
Morphological operation were applied on the binarized and inverted image after line removal to clean the image and connect the broken texts. The images in Figure 4 are the best that I can process so far.
Figure 4. Images cleaned using a series of morphological operations (a) Close operation with rectangle, (b) Dilation of (a) with a diagonal, (c) Close operation applied on (b) with a diagonal
I give myself a grade of 8 for this activity because I am not satisfied with what I have done. I wasn't able to reduce the thickness of the text to 1 pixel and separate each of the letters.
I would like to thank Ms. Eloisa Ventura for helpful discussions and Dr. Maricor Soriano for the hints given during class.
Comments
Post a Comment