Can search engines index JavaScript generated web pages?

Solution 1:

Your suspicion is correct - JS-generated content cannot be relied on to be visible to search bots. It also can't be seen by anyone with JS turned off - and, last time I added some tests to a site I was working on (which was a large, mainstream-audience site, with hundreds of thousands of unique vistors per month), approx 10% of users were not running Javascript in any form. That includes search bots, PC browsers with JS disabled, many mobiles, blind people using screenreaders... etc etc.

This is why content generated via JS (with no fallback option) is a Really Bad Idea.

Back to basics. First, create your site using bare-bones (X)HTML, on REST-like principles (at least to the extent of requiring POST requests for state changes). Simple semantic markup, and forget about CSS and Javascript.

Step one is to get that right, and have your entire site (or as much of it as makes sense) working nicely this way for search bots and Lynx-like user agents.

Then add a visual layer: CSS/graphics/media for visual polish, but don't significantly change your original (X)HTML markup; allow the original text-only site to stay intact and functioning. Keep your markup clean!

Third is to add a behavioural layer: Javascript (Ajax). Offer things that make the experience faster, smoother, nicer for users/browsers with Ajax-capable JS... but only those users. Users without Javascript are still welcome; and so are search bots, the visually impaired, many mobiles, etc.

This is called progressive enhancement in web design circles. Do it this way and your site works, in some reasonable form, for everyone.

Solution 2:

if a search engine also cannot see the generated HTML then there is not much to index

That about sums it up. Technically nothing is stopping a search engine from implementing a javascript engine for their bot/spider, but it's just not normally done. They could, but they won't.

On the other hand, you can sniff a search engine's user agent and serve it something readable. But search engines don't usually like this and will penalize you pretty severely if they detect differences with what you send to a normal browser.