Powershell Tip : Parsing HTML from a local File or a String


INTRODUCTION :

If you are familiar with Invoke-WebRequest cmdlet then you must be aware that you get a parsed HTML from the requested Web URL. DOM structure of this Parsed HTML could be utilized to get access to HTML elements of the web page, like in the below animation –

iwr

PROBLEM :

What if we have HTML files are locally present on your machine or HTML content in form of string? Do we have any mechanism in place to Parse the local file/string?

SOLUTION : 

Well the answer is – yes we can! 🙂

Microsoft provides HTML document class in .Net framework class library, which has a  Write() method to write HTML Document using DOM 2 (Document Object Model Level 2write

APPROACH 1 : From a String

Instantiate HTML  document class object like in below animation and parse the HTML content as a string to access the HTML Elements.

htmlfile

APPROACH 2 : From a File

Similarly we can parse HTML document from a local HTML file

fromfile

NOTE : 

Even the parsed HTML from Invoke-Webrequest has the type HTML Document Class

same

That was all on today’s #Powershell Tip, Thanks for reading! 🙂

signature 

 

 

Advertisements

2 thoughts on “Powershell Tip : Parsing HTML from a local File or a String

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s