Extracting data from table on a web site dynamically

A

ajapale

Guest
Im finding it difficult to frame my question but here goes...

In the course of my work I visit a website each day and extract a number of figures from a series of tables. The layout and format of the page does not change from day to day.

There are 20 seperate tables each on its own web page. Each table updates once a day.

Is there any way I can set up a 'dynamic link' to a spreadsheet which would automate this process?

Im tired up 'cutting and pasting' several times a day.

The data is a matter of public record and is not subject to copyright.

Any help or advice would be appreciated.


Thanks
ajapale
 
What you want to do is known as "web scraping". You might find some useful tips or utilities using . If the format of the underlying HTML for the page containing this information remains pretty constant then it would be feasible to write, say, a script which retrieves the page, extracts the data (e.g. into CSV - Comma Separated Value format) allowing it to be imported into Excel. Chances are somebody has done just this before and has shared the knowledge (and perhaps the script(s)) via the web. Hope this helps.

P.S. Have you tried saving the HTML for the page from your browser and then opening it in Excel to see if the tabular information is imported correctly in that way? Just another idea.
 
Thanks Clubman,

I'm going to try those approches now.

A quick google threw up the following from google.answers
answers.google.com/answer...?id=413237

I then compiled these data into a single file using a custom-made text-extraction and text-manipulation script, without any manual intervention whatsoever, so as to preclude human error. I saved the result as a CSV (Comma-Separated Value) file which you should be able to import into the spreadsheet of your choosing.

ajapale
 
If you have Excel 2003, you can define a web query (Data, Import External Data, New Web Query) to go out to any html page and pick up selected data items and load them into cells in your spreadsheet. You can then set this query to auto-update each time you open the spreadsheet, or every 10 minutes or whatever you require. Hope this helps.

Conflict of Interest: Microsoft employee & shareholder.
 
Thanks MSFTie,

That worked a treat with excel 2002!

ajapale

I googled and found a few useful turtorials:
[broken link removed]


Google Search Strategy (Data Import External Data New Web Query)
 
Back
Top