Parse the JavaScript returned from BeautifulSoup

Solution 1:

Something like PhantomJS may be more robust, but here's some basic Python code to extract it the full menu:

import json
import re
import urllib2

text = urllib2.urlopen('http://dcsd.nutrislice.com/menu/meadow-view/lunch/').read()
menu = json.loads(re.search(r"bootstrapData\['menuMonthWeeks'\]\s*=\s*(.*);", text).group(1))

print menu

After that, you'll want to search through the menu for the date you're interested in.

EDIT: Some overkill on my part:

import itertools
import json
import re
import urllib2

text = urllib2.urlopen('http://dcsd.nutrislice.com/menu/meadow-view/lunch/').read()
menus = json.loads(re.search(r"bootstrapData\['menuMonthWeeks'\]\s*=\s*(.*);", text).group(1))

days = itertools.chain.from_iterable(menu['days'] for menu in menus)

day = next(itertools.dropwhile(lambda day: day['date'] != '2014-01-13', days), None)

if day:
    print '\n'.join(item['food']['description'] for item in day['menu_items'])
else:
    print 'Day not found.'

Solution 2:

All you need is a little string slicing:

import json

soup = BeautifulSoup(urllib2.urlopen(url).read())
script = soup.findAll('script')[1].string
data = script.split("bootstrapData['menuMonthWeeks'] = ", 1)[-1].rsplit(';', 1)[0]
data = json.loads(data)

JSON is, after all, a subset of JavaScript.

Parse the JavaScript returned from BeautifulSoup

Solution 1:

Solution 2:

Related

Recent Posts