Innovative python web scraping solutionΒΆ

AkaSig has come up with a solution for web scraping that immediately stroke me as being very nice. The idea is to reverse the use of ZPTs (Zope Page Templates).

ZPTs specify both an html structure and, within the html, have zpt instructions to stuff data into the html (like a "title" or a list of items).

AkaSig's idea is to reverse the process. You take an example of an existing webpage and use zpt-like instructions to get data out of similar html pages into data. Once you think about it it seems a natural idea, but of course you need to think off it first :-)

Read his article, it's got examples and a way better explanation. If ever I need to do webscraping, I'll try and grab his .zip first.

blog comments powered by Disqus
 
vanrees.org logo

About me

My name is Reinout van Rees and I work a lot with Python (programming language) and Django (website framework). I live in The Netherlands and I'm happily married to Annie van Rees-Kooiman.

Weblog feeds

Most of my website content is in my weblog. You can keep up to date by subscribing to the automatic feeds (for instance with Google reader):