Correct A Math Test with Deep Learning - Part I


100-Masu (or 100-Math) is a popular math test for primary school student in Japan. It's just only a combination of one digit addition, multiplication, and division in 100 squares table.

As you can see, the math test has a table that include 100 cells where students write their answers. Any cell that is left empty or with a wrong answer is to be counted towards the total number of mistakes and the tally is marked on the test sheet.
The important in practicing 100-math calculations test is also in Prof. Hideo Kageyama's book:
  • To do everyday
  • Recording everyday
With about 30 students in class, the teacher must spent so much time everyday to manually correct these tests. So it's worth to implement a automatically correction.

The Approach

Our target is building a mobile app that can capture test sheet image, identify the multipliers in each table, identify the numbers written in the answer cells, compare against the expected answers, and mark the total number of mistakes on that test sheet.
The first step of my approach is detection. We must detect the table in test sheet and position of each cell in table.
The second step is classification. The goal of this step is identify the test numbers and written numbers in table.

A test sheet will look like this

In first step, because of lack of data, I used some traditional computer vision algorithms instead of deep neural network to detect 100-math table and extract it's cells. Each cell will be feed to a deep learning model for recognition in step two.

Detect and Align Table

 There are basically 2 main problems:
  • Where is the table?
  • Once I have found the table how do I turn it back into a square table?
Let's look at each of these step by step!

Find the table

The first thing to do in any image processing problem is to reduce the amount of data you are dealing with. We started from the full color high resolution image. The first thing we can do is to throw away the color information. So the first step in our image processing is to throw away even more information. We are going to threshold the image so that we have either background pixels (the paper) or foreground pixels (the printed elements). There are a variety of thresholding techniques available to us.

The initial naive approach is the obvious one. Light pixels are the paper and dark pixels are the ink, so lets pick a number (say the average pixel value for the image) and anything less that that we'll set as foreground and anything higher than that is background. This would give us an image that looks like this

Then we applied Morphological transformations to get information about lines of table. It needs two inputs, one is our thresh image, second one is called structuring element or kernel which decides the nature of operation. Two basic morphological operators were used are Erosion and Dilation.

Now we need take out the table from image. Ideally it would nice if we knew the coordinates of the corners of the table - that would let use draw a box around it and know the exact location of it. There are quite a few ways that we could approach this. A simple approach might be to use findContours function in OpenCV. 

Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.

Turn it back into a square table

We can get largest contour's corners after applied image processing and find contours. But we need a transform that will maps one arbitrary 2D quadrilateral into another. For this we can use a perspective transform with these four corners.

Our result after applied that transform

The result we got is a 11x11 square table, now just divide it into 11 columns and 11 rows, we can get 20 test number blocks and 100 written cell blocks. 

Finally, we have a lightweight detection algorithm to find the table and turn it back into square table. I want to implement a deep learning network for that job in future if have enough data.
In the next part, we will talk about building a network to recognize handwritten multi-digits number.
See ya!


Popular posts from this blog

Intersection over Union (IoU) cho object detection

Giới thiệu về Generative Adversarial Networks (GANs)