INTRODUCTION :
Everybody comes across a word that you don’t understand how to use it in a sentence, I face this often as I do ton of readings. normally I would have done a simple google search, let’s suppose for the word “Elixir”, which will give me few websites with sentence examples.
I would have opened one of these websites and got the example sentences, but I noticed some uniformity in data presentation and the URL on a website yourdictionary.com, upon inspecting the source code I easily traced out the HTML Tags in which data was enclosed.
Hence, I thought why not harvest this website’s data (Data Scraping) and get all sentences for a word.
HOW IT WORKS :
To implement this solution using Powershell, I identified the HTML Tag in which data was residing and its class (“Li_Content”) to filter exactly the sentences I want.
Once I had the sufficient information a simple Invoke-Webrequest to the site with my query word (“Elixir”) following the URL did most of the work
Invoke-WebRequest "http://sentence.yourdictionary.com/Elixir"
Then some data wrangling on the HTML tag and class to extract the sentences, which would look like in the following image
HOW TO USE IT :
Run the function ‘Get-Sentence‘ with your word and use -WordLimit parameter to control the length, or -Count parameter to number of sentences
You can also use -HighlightWord switch to make highlight the Word you queried in each sentence.
Following animation also demonstrate how to run the function
SCRIPT :
Function Get-Sentence | |
{ | |
[cmdletBinding()] | |
[alias('gs')] | |
param( | |
[parameter(mandatory=$true)] [String]$Word, | |
[int] $count = 10, | |
[int] $WordLimit, | |
[Switch] $HighlightWord | |
) | |
Try | |
{ | |
Write-Verbose "Sending Webrequest to http://sentence.yourdictionary.com/$Word for sentences" | |
$Results = Invoke-WebRequest "http://sentence.yourdictionary.com/$Word" -TimeoutSec 5 -DisableKeepAlive | |
$ErrorMsg = "Couldn't find any sentences with word `"$($Word.toupper())`", please try again with another word " | |
# Condition to check if data is returned or not | |
# In response to the Web request | |
If($Results) | |
{ | |
$i=0 | |
Write-Verbose "Harvesting data from web request" | |
# Filtering out sentences from the data harvested from the website | |
$Data = $Results.ParsedHtml.getElementsByTagName('Div')| Where{$_.ClassName -eq 'li_content'} | |
# Condition to check Data contains Sentences or not | |
If($Data) | |
{ | |
Write-Verbose "Populating the output" | |
$Sentences = Foreach($Sentence in $Data) | |
{ | |
$WordCount = $Sentence.textContent.Split(' ').count | |
# Filter out Sentence that not comply the word limit | |
If($WordLimit -and $WordCount -le $WordLimit) | |
{ | |
$i=$i+1 | |
''|Select @{n='#';e={$i}}, | |
@{n='WordCount';e={$WordCount}}, | |
@{n='Sentence';e={$Sentence.textContent}} | |
} | |
elseif(-not $WordLimit) | |
{ | |
$i=$i+1 | |
''|Select @{n='#';e={$i}}, | |
@{n='WordCount';e={$WordCount}}, | |
@{n='Sentence';e={$Sentence.textContent}} | |
} | |
} | |
$Sentences = $Sentences| Select -First $count | |
# Condition and Logic to highlight the word | |
# For which you're looking for sentence examples | |
If($HighlightWord) | |
{ | |
$Sentences.sentence | ForEach-Object { | |
$Words = $_.split() | |
$Words | ForEach-Object { | |
If($_ -like "*$word*") | |
{ | |
Write-Host "$_" -NoNewline -Fore Black -Back Yellow; | |
Write-Host " " -NoNewline | |
} | |
else | |
{ | |
Write-Host "$_ " -NoNewline | |
} | |
} | |
[System.Environment]::NewLine | |
} | |
} | |
else | |
{ | |
$Sentences | |
} | |
} | |
Else | |
{ | |
Write-Host $ErrorMsg -ForegroundColor Red | |
} | |
} | |
else | |
{ | |
Write-Host $ErrorMsg -ForegroundColor Red | |
} | |
} | |
catch | |
{ | |
Write-host "ERROR: $_" -ForegroundColor Red | |
} | |
} |
Have fun exploring this script and Enjoy you weekend Powershell homies 🙂
Prateek Singh
Follow @SinghPrateik
[…] on January 12, 2017 submitted by /u/Prateeksingh1590 [link] [comments] Leave a […]
LikeLike
I like the report
LikeLiked by 1 person
[…] Get example Sentence’s for a Word using Web scraping on online dictionary […]
LikeLike