On 10th March, Xerox announced a Latest Technology (Translation Service) that can Scan a document, Translate it in any language and Print from the Scanner or from Xerox App.
For moment I was wondering how they will make it work, but then I got a feeling that it can be Automated with PowerShell and no need to buy a new Hardware for this feature.
BREAKING THE PROBLEM :
So lets break this problem into steps.
- A Scanned document is an Image, Save it as a file.
- Extract the Text from the Image
- Translate the Text in desired Language.
- Print / Email / Save the Documents
Easy! right? 🙂 at least it looks like from the above points.
HOW TO ACHIEVE IT :
STEP 1 – Use any Tool (Like SnagIt) to capture an Image of the document, you can also use Print Screen and save the capture as an Image. Lets take below screenshot as our sample Image.
STEP 2 – Use Microsoft’s Optical Character Recognition (OCR) API to extract Text from the Image, I’ve created a function for that and you can deep dive into it later.
STEP 3 – We have the Text extracted from Image as set of sentences divided into Lines like in our Sample Image, now it’s time to Feed these sentences to Bing Translation API and get them translated to any desired language.
English to Hindi :
English to German :
You can choose from a Range of Languages in which you want to convert your Image extracted text.
STEP 4 – In previous step we’ve converted the Image text in desired language, now time to Print, Save or Email it.
You can pipe the output to Out-Printer cmdlet and it will get printed. Pipe the results to Set-Content to save it in File.
Or, Convert it to HTML and send as a body in an email. You can also send the File as an Attachment. Like in the below screenshot.
HOW TO RUN IT:
SCRIPT :
Function Get-ImageText() | |
{ | |
[CmdletBinding()] | |
Param( | |
[Parameter(Mandatory=$True,Position=0,ValueFromPipeline=$True)] | |
[String] $Path | |
) | |
Process{ | |
$SplatInput = @{ | |
Uri= "https://api.projectoxford.ai/vision/v1/ocr" | |
Method = 'Post' | |
InFile = $Path | |
ContentType = 'application/octet-stream' | |
} | |
$Headers = @{ | |
'Ocp-Apim-Subscription-Key' = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXX" | |
} | |
Try{ | |
# Call OCR API and feed the parameters to it. | |
$Data = (Invoke-RestMethod @SplatInput -Headers $Headers -ErrorVariable +E) | |
$Language = $Data.Language # Detected language | |
$i=0; foreach($D in $Data.regions.lines){ | |
$i=$i+1;$s=''; | |
''|select @{n='LineNumber';e={$i}},@{n='LanguageCode';e={$Language}},@{n='Sentence';e={$D.words.text |%{$s=$s+"$_ "};$s}}} | |
} | |
Catch{ | |
"Something went wrong While extracting Text from Image, please try running the script again`nError Message : "+$E.Message | |
} | |
} | |
} | |
Function Translate-text() | |
{ | |
[CmdletBinding()] | |
Param( | |
[Parameter(Mandatory=$True,Position=0,ValueFromPipeline=$True)] | |
[String] $Text, | |
[String] [validateSet('Arabic','Hindi','Japanese','Russian','Spanish','French',` | |
'English','Korean','Urdu','Italian','Portuguese','German','Chinese Simplified') | |
]$From, | |
[String] [validateSet('Arabic','Hindi','Japanese','Russian','Spanish','French',` | |
'English','Korean','Urdu','Italian','Portuguese','German','Chinese Simplified') | |
]$To | |
) | |
Begin{ | |
# Language codes hastable | |
$LangCodes = @{'Arabic'='ar' | |
'Chinese Simplified'='zh-CHS' | |
'English'='en' | |
'French'='fr' | |
'German'='de' | |
'Hindi'='hi' | |
'Italian'='it' | |
'Japanese'='ja' | |
'Korean'='ko' | |
'Portuguese'='pt' | |
'Russian'='ru' | |
'Spanish'='es' | |
'Urdu'='ur' | |
} | |
# Secret Client ID and Key you get after Subscription | |
$ClientID = 'XXXXXXXXXXXXXXXXXXXX' | |
$client_Secret = ‘XXXXXXXXXXXXXXXXXXXX' | |
# If ClientId or Client_Secret has special characters, UrlEncode before sending request | |
$clientIDEncoded = [System.Web.HttpUtility]::UrlEncode($ClientID) | |
$client_SecretEncoded = [System.Web.HttpUtility]::UrlEncode($client_Secret) | |
} | |
Process{ | |
ForEach($T in $Text) | |
{ | |
Try{ | |
# Azure Data Market URL which provide access tokens | |
$URI = "https://datamarket.accesscontrol.windows.net/v2/OAuth2-13" | |
# Body and Content Type of the request | |
$Body = "grant_type=client_credentials&client_id=$clientIDEncoded&client_secret=$client_SecretEncoded&scope=http://api.microsofttranslator.com" | |
$ContentType = "application/x-www-form-urlencoded" | |
# Invoke REST method to Azure URI | |
$Access_Token=Invoke-RestMethod -Uri $Uri -Body $Body -ContentType $ContentType -Method Post | |
# Header value with the access_token just recieved | |
$Header = "Bearer " + $Access_Token.access_token | |
# Invoke REST request to Microsoft Translator Service | |
[string] $EncodedText = [System.Web.HttpUtility]::UrlEncode($T) | |
[string] $uri = "http://api.microsofttranslator.com/v2/Http.svc/Translate?text=" + $EncodedText + "&from=" + $LangCodes.Item($From) + "&to=" + $LangCodes.Item($To); | |
$Result = Invoke-RestMethod -Uri $URI -Headers @{Authorization = $Header} -ErrorVariable Error | |
Return $Result.string.'#text' | |
} | |
catch | |
{ | |
"Something went wrong While Translating Text, please try running the script again`nError Message : "+$Error.Message | |
} | |
} | |
} | |
} |
PLEASE NOTE :
You would need Subscription key for Microsoft optical Recognition API and ClientID and Client_secret for Microsoft Bing Translation API, please follow below links to get an idea how to obtain these keys.
Microsoft OCR API
Microsoft Translator Services
Thank you for stopping by, have a nice weekend.
Hi Prateek,
How do I get the Get-image cmdlet in powershell. I have version 5.0 , which is running on 2008 R2 server.
LikeLike
Sorry I couldn’t get your question. Please follow the url to my GitHUb gist to get the code, copy-paste in powershell window. just make sure your server is connected to internet as the script use a web REST API.
LikeLike
[…] my previous post we mimicked the latest Technology of Xerox using Powershell to Extract Text from an Locally saved […]
LikeLike
Keep working ,impressive job!
LikeLike
[…] Here are the full instructions for using Get-ImageText. […]
LikeLike
[…] Here are the full instructions for using Get-ImageText[20]. […]
LikeLike
amazing work.
LikeLike
[…] Here are the full instructions for using Get-ImageText. […]
LikeLike
I was checking through the internet for some information since yesterday night and I finally found what i was looking for! This is a magnificent website by the way, although it seems to be a little difficult to see in my android phone.
pierre hardy for cheap https://www.pierrehardysale.cc
LikeLike
Hello! Just wanted to say great blog. Keep up the good work!
discount fitflop https://missaototustuus.com/fitflop/
LikeLike
I’m having a small issue I can’t get my reader to pickup your rss feed, I’m using google reader by the way.
stuart weitzman outlet online https://www.ebaoy.co.uk
LikeLike
I was researching via the web for some info since yesterday night and I at last found this! This is a impressive web site by the way, except it is a little hard to navigate in my smart phone.
moncler outlet online https://moncler.vidaaposvinte.com
LikeLike
Hey – nice blog, just looking about some blogs, appears a fairly good platform you might be making use of. Im currently using WordPress for several of my websites but looking to alter one of them around to a platform similar to yours as a trial run. Something in specific you would recommend about it?
asics shoes online https://asics.benefitsofvitamins.org
LikeLike
I simply couldnt leave your site before telling you which i really enjoyed the standard information you offer for your visitors Is going to be back often to check on on new posts
Sergio Rossi boots sale https://sergiorossi.hawksportsind.com
LikeLike
Hi Prateek,
The URI = “https://api.projectoxford.ai/vision/v1/ocr” is not present anymore. Can you please have a look and update the script?
Thanks
LikeLike
[…] Voici les des instructions complètes pour utiliser Get-ImageText. […]
LikeLike
[…] Hier sind die vollständige Anleitung zur Verwendung von Get-ImageText. […]
LikeLike
Modifying your technique to the various forms is easy. How many of the top poker players around the globe frequent there? It’s also nice to have a pretty big chip stack in contrast to the other gamers in situation you do get called.
LikeLike