Innovative python web scraping solutionΒΆ

AkaSig has come up with a solution for web scraping that immediately stroke me as being very nice. The idea is to reverse the use of ZPTs (Zope Page Templates).

ZPTs specify both an html structure and, within the html, have zpt instructions to stuff data into the html (like a "title" or a list of items).

AkaSig's idea is to reverse the process. You take an example of an existing webpage and use zpt-like instructions to get data out of similar html pages into data. Once you think about it it seems a natural idea, but of course you need to think off it first :-)

Read his article, it's got examples and a way better explanation. If ever I need to do webscraping, I'll try and grab his .zip first.

 
vanrees.org logo

Reinout van Rees

My name is Reinout van Rees and I program in Python, I live in the Netherlands, I cycle recumbent bikes and I have a model railway.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):