PowerShell : Extract Text from Image and Convert \ Print in any Language


On 10th March, Xerox announced a Latest Technology  (Translation Service) that can Scan a document, Translate it in any language and Print from the Scanner or from Xerox App.

For moment I was wondering how they will make it work, but then I got a feeling that it can be Automated with PowerShell and no need to buy a new Hardware for this feature.

BREAKING THE PROBLEM : 

So lets break this problem into steps.

  1. A Scanned document is an Image, Save it as a file.
  2. Extract the Text from the Image
  3. Translate the Text in desired Language.
  4. Print / Email / Save the Documents

Easy! right?  🙂 at least it looks like from the above points.

HOW TO ACHIEVE IT : 

compare

STEP 1 Use any Tool (Like SnagIt) to capture an Image of the document, you can also use Print Screen and save  the capture as an Image. Lets take below screenshot as our sample Image.

Image
This is your Sample Image

STEP 2 Use Microsoft’s Optical Character Recognition  (OCR)  API to extract Text from the Image, I’ve created a function for that and you can deep dive into it later.

code

STEP 3 We have the Text extracted from Image as set of sentences divided into Lines like in our Sample Image, now it’s time to Feed these sentences to Bing Translation API and get them translated to any desired language.

English to Hindi :

hindi

English to German :

German

You can choose from a Range of Languages in which you want to convert your Image extracted text.
a

STEP 4 In previous step we’ve converted the Image text in desired language, now time to Print, Save or Email it.

You can pipe the output to Out-Printer cmdlet and it will get printed. Pipe the results to Set-Content to save it in File.

Or, Convert it to HTML and send as a body in an email. You can also send the File as an Attachment. Like in the below screenshot.

print

HOW TO RUN IT:

f

SCRIPT :

PLEASE NOTE : 

You would need Subscription key for Microsoft optical Recognition API and ClientID and Client_secret for Microsoft Bing Translation API, please follow below links to get an idea how to obtain these keys.

Microsoft OCR API
Microsoft Translator Services

 

Thank you for stopping by, have a nice weekend.

Advertisements

6 thoughts on “PowerShell : Extract Text from Image and Convert \ Print in any Language

    1. Sorry I couldn’t get your question. Please follow the url to my GitHUb gist to get the code, copy-paste in powershell window. just make sure your server is connected to internet as the script use a web REST API.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s