Powershell : Using Online Dictionaries to Data Mine Word Meanings in one shot


“Learner’s of any language this one is for you”What about if we can feed a script a list of words and it provides us their meanings and usage.

Well yes, It is possible to get all meanings in one shot and you can avoid searching them one-by-one which will be saving you lots of time and you can use this time to earn other aspects of the language.

So, let begin the Fun – We’ve multiple online dictionaries available over internet that give you meaning and usage of words. I choose Merriam-webster online dictionary to demonstrate you the approach and make this script working, because while looking at many online dictionaries, I found this one with least noise ( ~ that is equal to less work ;P )

URL  http://www.merriam-webster.com/

Noise here is the Unnecessary information when you are data mining useful information from the web based dictionary. Noise can be anything symbols, unnecasary tags or Hyperlinks embedded within your in formation. Now we have the URL, let me show how to get how to pass words to this URL and get meanings.

Required URLhttp://www.merriam-webster.com/ + dictionary/YourWord

EXAMPLE :  http://www.merriam-webster.com/dictionary/banana

banana

Similarly for word Apple.

apple

Now you Know what data needs to be captured. The real Fun is fetching highlighted required information from this webpage, which is a Two step approach :

STEP 1 :   1.1 Identify the HTML Tags between which your data is embedded

Hover the information on the webpage that needs to be captured, and Right Click > Select Inspect Element
ie

This will open the HTML source code on the right side of the Web browser ( ~ for Chrome Browser ) and will highlight the HTML tags that hold the information

h

Note down the HTML tags that are holding your data using this approach as we will be using these tag names later in our script.

1.2 Get webpage HTML source code

Use Invoke-Webquery to fetch your HTML data in a Powershell variable as a String.

# Grabbing the all web content from the website by passing keyword in the URL 
$webpage = Invoke-WebRequest "http://www.merriam-webster.com/dictionary/$_" -UseBasicParsing

# Capturing the HTML output
$content = $webpage.Content

1.3 Fetch data from these HTML tags

Use Regular Expression to identify these tags in your Powershell variable that stores the web content as a String  and store them in different variable.

# Playing with HTML output to find the useful information between target HTML tags.
$data = ($content | %{ [regex]::matches( $_ , '(?<=
\s+)(.*?)(?=\s+
)' ) } | select -ExpandProperty value)
$data = $data.Trim() # Remove white spaces from start and end of string

STEP 2 :

2.1 Clean your string for any unwanted noise or useless information.

Remove any unwanted noise from the String
using variety of inbuilt Powershell functions on Strings – Replace() Trim() Split() In following fashion

($data[0].Replace('

: ‘,”).Replace(‘

','').Replace(' : ',':').Split('<')[0]).Split(':')| %{"$i. $($_.Trim())"; $i++}

2.3 Present the Data

You have your data with you, now give them a structure and present them in a neat format, you can use Write-Host cmdlet to give the data colors if required, like this

Write-Host "$($grammer[0]): " -ForegroundColor Red

So this is all of explanation. Please Click on below link highlighted in GREEN to Auto download the complete script

COMPLETE SCRIPT :  Get-Meaning

NOTE :

1. Remember that this is a function and you’ll require to pass words or Array of words to get the output.
2. You may see some errors or in-proper data while script execution as the HTML page source code changes drastically for each word, and script may not be robust enough to handle source code changes for all words in english Dictionary 🙂

RUNNING THE SCRIPT : Store the List of Keywords you want to search meaning for in a notepad file and save it. kw

Once you run the script it will give you a beautiful output somewhat like this, Cool Right ?  🙂 😉

meaning

Hope you find that useful and fun like always, Please comment, Like and  Share down below. Happy Reading Folks 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s