HTML to Excel utility

This is the special program of convertation of the list of URL-s into the list of rows with text and links from webpages. Tool to update and expand of links.

Written in VBScript. MS Internel Explorer and MS Excel are used also.

How it work

The main idea: how to extract the information from webpages? Both text and links are needed.

The manual way:

  1. Open link in a browser
  2. Select all by Ctrl+A
  3. Copy by ctrl+C
  4. Open MS Excel sheet
  5. Paste by Ctrl+V

Our script do it. And what we have? Many rows are empty! Therefore the script delete all empty rows.

Well now we have the set of rows from the specified URL.

We can put them into the other worksheet – “Allpages”. But “Allpages” already have the others rows. The script scan the existing rows to try find the equal. If no equals – it add the row into the Allpages sheet (rows are equal if the text in the cells from “E” to “O” columns are the same).

This thing the script do for all rows.

But how to avoid this process for links to do the optimization? Local html-files. Each time the script save all downloaded files on the hard disk. Next time it compare the new webpage with one saved. If it have the differents — the row equal checking will be run. If the old file is the same – no updates in “AllPages”.

To do the equal checking manually you'll need to delete all html-files.

How to check it's work

More simplest way: you have own website and can to change it manually.

  1. Prepare the “links.ini” with your website in it.
  2. Run “test2xls.vbs”
  3. Save the “Allpages.xls” into other name or subdirectory “before_changes”
  4. Make some changes on your website
  5. Delete all “*.html” files from the directory with script
  6. Run the script again
  7. In the new “Allpages.xls” you'll see the changed rows.

The files

When the script download a webpage – it create the work file “work.txt” and the worksheet “work” in the MS Escel workbook. Later it delete this worksheet.

Before the script running the workbook “allpages.xls” should exists and have the worksheet “Links” inside.

Troubleshootings

When the script is running you should not do anything on the computer. Because the script uses Sendkeys operation to emulate the human work.

How to kill vbscript if any errors was:

  1. Open “Task manager” (Ctrl+Alt+Del or right click on the Windows Panel, then select “Task manager”)
  2. Search “vbscript”, right click, kill application
  3. Then you need to kill also the “EXCEL.EXE” process
  4. Internet Explorer you can close manually. It's not important.

When a webpage is not found or any other network errors – the script not support this problem.

When copying into the “Allpages” worksheet – the webpage will truncated into first 10 columns. Seems it's not problem because most webpages have 2-4 columns in the excel worksheet.

The script remove the background from cells, pictures. But the font color not changed (because the links have the blue color in MS Escel).

html2xls.txt · Last modified: 2011/09/30 11:21 (external edit)
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki