How to make a simple Optical Character Recognition in 12 lines of code.

Just to motivate you!

Optical character recognition is the recognition of typed, handwritten or printed text and converting them into text. OCR can be used to automate various task involving humans, like in banking, OCR is being used to process checks without human involvement, generating content of documents from their scanned images, it can also be helpful for visually impaired people, etc.

For this OCR we'll be using Microsoft's Computer Vision API. We'll do a post request for making an API call in python. and in response, we'll get output in JSON format.

To get started you are required to have a Microsoft account, and after that, you can get a free subscription to computer vision API for 30 days. You have to acquire your secret subscription key which looks similar to this 98f714r6vb2e193018b28fg1u9b3b0d7e7.

#Defining base url for API call.
base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/"
ocr_url = base_url + "ocr"

#Defining subscription key and headers for subscription key.
sub = "98f714r6vb2e193018b28fg1u9b3b0d7e7"
headers  = {'Ocp-Apim-Subscription-Key': sub}

Microsoft OCR API is quite flexible and we can define many parameters depending upon our use cases, here we defining two parameters, the language which is English in our case (defined by 'unk') and whether to detect the orientation of text or not, which is defined as true in our case. We also need URL of the image on which we want to run our OCR (we can also upload a local image for OCR), so we'll define url of the image.

#Defining parameters and orientation
params   = {'language': 'unk', 'detectOrientation ': 'true'}

#Defining image url
img = "https://quotefancy.com/download/18846/original/wallpaper.jpg"
data = {'url': img}

Following is the image at above link

Optical character recognition sample image

Now we'll import requests for making a post request mentioning ocr_url, headers, params and json.

import requests
response = requests.post(ocr_url, headers=headers, params=params, json=data)
response.raise_for_status()
analysis = response.json()
print analysis

The JSON output of the above script contains data about bounding box coordinates, orientation and text angle, for each word line by line. Here's the output.

{  
   'language':'en',
   'orientation':'Up',
   'textAngle':0.0,
   'regions':[  
      {  
         'boundingBox':'689,768,2462,1049',
         'lines':[  
            {  
               'boundingBox':'689,768,2462,180',
               'words':[  
                  {  
                     'boundingBox':'689,768,541,158',
                     'text':'Work'
                  },
                  {  
                     'boundingBox':'1293,768,450,158',
                     'text':'hard'
                  },
                  {  
                     'boundingBox':'1816,768,158,156',
                     'text':'in'
                  },
                  {  
                     'boundingBox':'2041,768,771,180',
                     'text':'silence,'
                  },
                  {  
                     'boundingBox':'2889,768,262,158',
                     'text':'Let'
                  }
               ]
            },
            {  
               'boundingBox':'689,1037,2454,181',
               'words':[  
                  {  
                     'boundingBox':'689,1075,399,143',
                     'text':'your'
                  },
                  {  
                     'boundingBox':'1135,1074,722,103',
                     'text':'success'
                  },
                  {  
                     'boundingBox':'1918,1037,217,140',
                     'text':'be'
                  },
                  {  
                     'boundingBox':'2184,1075,399,143',
                     'text':'your'
                  },
                  {  
                     'boundingBox':'2638,1037,505,140',
                     'text':'noise.'
                  }
               ]
            },
            {  
               'boundingBox':'1717,1358,408,52',
               'words':[  
                  {  
                     'boundingBox':'1717,1359,173,51',
                     'text':'Frank'
                  },
                  {  
                     'boundingBox':'1913,1358,212,52',
                     'text':'Ocean'
                  }
               ]
            },
            {  
               'boundingBox':'1782,1765,276,52',
               'words':[  
                  {  
                     'boundingBox':'1782,1765,276,52',
                     'text':'@quoteßancu'
                  }
               ]
            }
         ]
      }
   ]
}

Enjoy!
P.S: Just in case if you need any clarification do post a comment.

Wild CSE

Search This Blog

How to make a simple Optical Character Recognition in 12 lines of code.

Labels

Comments

Post a Comment

Popular posts from this blog

Operating System: Process and Process Management

Convolution Neural Network (CNN): Introduction

Operating System: Threads and Concurrency