Web scraping with Python [closed]

Use urllib2 in combination with the brilliant BeautifulSoup library:

import urllib2
from BeautifulSoup import BeautifulSoup
# or if you're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://example.com').read())

for row in soup('table', {'class': 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string
    # will print date and sunrise

I'd really recommend Scrapy.

Quote from a deleted answer:

Scrapy crawling is fastest than mechanize because uses asynchronous operations (on top of Twisted).

Scrapy has better and fastest support for parsing (x)html on top of libxml2.

Scrapy is a mature framework with full unicode, handles redirections, gzipped responses, odd encodings, integrated http cache, etc.

Once you are into Scrapy, you can write a spider in less than 5 minutes that download images, creates thumbnails and export the extracted data directly to csv or json.

Web scraping with Python [closed]

Related

Recent Posts