Python Scrapy: Convert relative paths to absolute paths
Solution 1:
From Scrapy docs:
def parse(self, response):
# ... code ommited
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, self.parse)
that is, response
object has a method to do exactly this.
Solution 2:
What i do is:
import urlparse
...
def parse(self, response):
...
urlparse.urljoin(response.url, extractedLink.strip())
...
Notice strip()
, because i meet sometimes strange links like:
<a href="
/MID_BRAND_NEW!%c2%a0MID_70006_Google_Android_2.2_7%22%c2%a0Tablet_PC_Silver/a904326516.html
">MID BRAND NEW! MID 70006 Google Android 2.2 7" Tablet PC Silver</a>