There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:

  1. When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
  2. Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.

The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.

I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.

So what I'm really wondering is the following:

  • Can you think of any better solution?
  • What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?

While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).

Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.

pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.

What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.

I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.

Most of the info can be found in these two blog posts:

http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/

and

http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/

The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.

From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.

For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.

Here's an example:

<form id="foo">
  Name: <input id="name"><button id="say">Say My Name!</button>
</form>

After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)

FooView = Backbone.View.extend({
  events: {
    "change #name": "setName",
    "click #say": "sayName"
  },

  setName: function(e){
    var name = $(e.currentTarget).val();
    this.model.set({name: name});
  },

  sayName: function(e){
    e.preventDefault();
    var name = this.model.get("name");
    alert("Hello " + name);
  },

  render: function(){
    // do some rendering here, for when this is just running JavaScript
  }
});

$(function(){
  var model = new MyModel();
  var view = new FooView({
    model: model,
    el: $("#foo")
  });
});

This is a very simple example, but I think it gets the point across.

When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.

Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.

Edit

Consider a more complex example, where you have a list that needs to be attached (from the comments below this)

Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:

<ul id="user-list">
  <li data-id="1">Bob
  <li data-id="2">Mary
  <li data-id="3">Frank
  <li data-id="4">Jane
</ul>

Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.

UserListView = Backbone.View.extend({
  attach: function(){
    this.el = $("#user-list");
    this.$("li").each(function(index){
      var userEl = $(this);
      var id = userEl.attr("data-id");
      var user = this.collection.get(id);
      new UserView({
        model: user,
        el: userEl
      });
    });
  }
});

UserView = Backbone.View.extend({
  initialize: function(){
    this.model.bind("change:name", this.updateName, this);
  },

  updateName: function(model, val){
    this.el.text(val);
  }
});

var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();

In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.


This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.

Hope that helps.


I think you need this: http://code.google.com/web/ajaxcrawling/

You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.

Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)


So, it seem that the main concern is being DRY

  • If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
  • (test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
  • When the request is not from a google bot, you just process normally.
  • If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server <a href="/someotherpage">mylink</a>, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
  • For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
  • To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
  • To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
  • If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
  • There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.

Here are a couple of examples using phantom.js for seo:

http://backbonetutorials.com/seo-for-single-page-apps/

http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering