WHAT IS COMPILED HTML HELP (.CHM)?
Microsoft Compiled HTML Help is a Microsoft proprietary online help format, consisting of a collection of HTML pages, an index and other navigation tools. The files are compressed and deployed in a binary format with the extension .CHM, for Compiled HTML. The format is often used for software documentation, like for Sysinternals tools.
APPROACH :
Today me and my friend were looking for a approach through which we can Decompile .chm files into HTML and then parse the HTML DOM to extract some information. After some googling I found that there is Windows command line utility HH.exe shipped with Windows operating system which can decompile the .CHM files to HTML using some command line options.
So I wrapped up the commands into a Powershell function, like below
Function Get-DecompiledHTMLHelp | |
{ | |
[cmdletbinding()] | |
param( | |
[String] $Destination, [String]$Filename | |
) | |
$EXE = 'C:\Windows\hh.exe' | |
If(-not (Test-Path $destination)) | |
{ | |
"Destination folder doesn't exist" | |
} | |
elseIf(-not (Test-Path $Filename)) | |
{ | |
"Target .chm file not found, please make sure you're entering the full path and file name" | |
} | |
else | |
{ | |
Start-Process -FilePath $EXE -ArgumentList "-decompile $Destination $Filename" | |
$FilesAndFolder = Get-ChildItem $Destination -Recurse| group psiscontainer | |
$FolderCount = ($Filesandfolder| ?{$_.name -eq $true}).count | |
$FileCount = ($Filesandfolder| ?{$_.name -eq $False}).count | |
Write-host "Decompiled into $(if($Foldercount -gt 0){$Foldercount}else{0}) Folders and $(if($FileCount){$FileCount}else{0}) Files to Destination $Destination" -ForegroundColor Yellow | |
} | |
} |
and then extracted the required information using following piece of code
Function Create-HTMLDOMFromFile | |
{ | |
Param( | |
[String] $FileName, | |
[String] $TagName, | |
[Int] $OuputCount = 11 | |
) | |
$HTML = New-Object -Com "HTMLFile"; | |
$Content = Get-Content -Path $FileName -raw | |
#To convert raw content to HTML DOM (Document object Model) and 2 stands for DOM level 2 | |
$HTML.IHTMLDocument2_write($Content) | |
#Some data wrangling to extract the exact information | |
$HTML.all.tags($TagName) | select innertext -ExpandProperty innertext -First $OuputCount | |
} | |
#Create-HTMLDOMFromFile -FileName 'C:\Temp\DecompiledHTML\Command_Line_Options.htm' -TagName 'P' |
HOW TO RUN :
Here I chose Compiled HTML Help file of ProcMon.exe (Process Monitor – SysInternal Tool) as a sample .chm file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Function Create-HTMLDOMFromFile | |
{ | |
Param( | |
[String] $FileName, | |
[String] $TagName, | |
[Int] $OuputCount = 11 | |
) | |
$HTML = New-Object -Com "HTMLFile"; | |
$Content = Get-Content -Path $FileName -raw | |
#To convert raw content to HTML DOM (Document object Model) and 2 stands for DOM level 2 | |
$HTML.IHTMLDocument2_write($Content) | |
#Some data wrangling to extract the exact information | |
$HTML.all.tags($TagName) | select innertext -ExpandProperty innertext -First $OuputCount | |
} | |
#Create-HTMLDOMFromFile -FileName 'C:\Temp\DecompiledHTML\Command_Line_Options.htm' -TagName 'P' |
Hope you find it useful, happy learning 🙂
Prateek Singh Follow @SinghPrateik
[…] on July 7, 2016 submitted by /u/Prateeksingh1590 [link] [comments] Leave a […]
LikeLike
Thank you good sir.
LikeLiked by 1 person
You just saved me weeks of time getting data definitions for a master data management tool by being able to read in column definitions from a .chm file. After decompiling the .chm file, I used Power Query to feed in the file and get the database schema, table name, column name, and definition via an iterative function in Power Query. Absolutely awesome! Couldn’t have done it without your post!
LikeLike
i have a very large and complex chm file. can i get all data in this chm file that i could use with modern databases ?
LikeLike