In a single-page app, what is the right way to deal with wrong URLs (404 errors)?

I am currently writing a web application using angularjs, but I think this question applies to any client-side javascript framework that does routing on the client side (as angular does).

In a single-page app, what is the right way to deal with wrong URLs?

Looking at a few major sites, I see that gmail will redirect to the inbox if you type any random URL below https://mail.google.com/mail/. This happens server-side (with an http 300 code) or client-side, depending on whether the wrong path is before or after the # character. On the other hand, twitter shows a real HTTP 404 for any invalid URL. A third option would be to show a "soft" 404, a purely client-side error page.

These solutions seem appropriate for different situations. Twitter wants the links to twitter users and tweets to be real links, so people can share them, post them in news articles, etc, so it is important that invalid links be recognized as such (if I have a broken link to a tweet in my website, a simple crawl will tell me that). In gmail, on the other hand, you are not expected to share links into your inbox, and I'm not even sure if the links are really permanent/persistent: it seems the url updating mostly serves the purpose of browser history navigation within the single-page app. The third approach of giving soft errors might be appropriate for situations similar to gmail, but where there is no reasonable "default" page.

After this long introduction, here are some specific questions:

  • Is it ever acceptable to give a "soft" error page instead of a 404 error, or should a single-page app always redirect to a real 404 if a url is invalid?
  • Gmail's code may be perfectly bugfree, but if it did have a bug leading to invalid links that end up redirecting back to the inbox, that might be even more confusing for users than an error page. For most web apps out there, that are not as well tested as gmail, would it be better to show an error page?
  • To implement real 404s for single-page apps, it seems necessary to duplicate the routing logic on the server-side. Is there any way around this?
  • When redirecting to a 404, I think the user should be able to see the URL that caused the error, possibly in the URL bar. With the html5 history api, I think this can be accomplished by simply triggering a reload of the current page (with the wrong url), combined with the server-side routing mentioned above. For browsers that do not support this or when using hashbang notation, this does not seem possible. What's the best way to support all browsers?

Solution 1:

If you care about SEO, one of the ways that angular.io was able to solve this problem (at least with Google anyway) is by using noindex meta tag "to indicate soft-404 status which will prevent crawlers from crawling the content of the page". Apparently it can be added to the document via JavaScript.

Alternatively, using JavaScript, you can redirect to a page that will respond with an actual HTTP 404 status code. Google understands JavaScript redirects just fine. Your original /does-not-exist page, when redirected to /404-error?from=does-not-exist, will be associated with the 404 status code returned by the server. The URL structure does not matter, only the status code and the redirect are important here.

Your other options are SSR (Nuxt.js, Next.js, Angular Universal, etc) or pre-rendering (prerender.io, puppeteer, etc) which Google calls dynamic rendering where you respond to search bot requests with a pre-rendered version while human users get your normal client-side rendered app.

Solution 2:

tl;dr: Drop hashbang support and opt for PJAX like behavior if you care about SEO.

Are you making an App or a Website? If website you need to return 404 so that you don't confuse google. It needs be a real 404 not just show a message of page not found (ie 200 with message "page not found" is very bad). Also what browsers do you care to support?

My opinion is that the whole hashbang server side rendering should be avoided (ie the nasty Google SEO #! hack). Either use real pushstate or re-render the whole page if the URL changes for browsers that don't support pushstate (not a hash change).

Now the reason this matters is that a #! should never return a 404 because it doesn't make sense and its impossible to mimic server side because the server never gets whats after the #! with out running Javascript.

Thus if you really care about SEO I would do something like PJAX and only use true pushstate for routing and then just fail to old web 1.0. Consequently the links I recommend you share that can truly be a 404 should not have #! (traditional # being fine so long as the contents of the page don't change drastically).

Finally the 404 is mostly not a problem but rather 30X ie redirect responses. Thats because the browser will automatically handle redirects so your Javascript AJAX calls will never see a 30X (they will get the redirect response instead... ie 200). To handle 30X responses you will have to send a header back for every request to indicate what the redirected URL is/was (ie what you were redirected to) so that you don't mess up the Pushstate History.

Of course if you need to support hashbang like Twitter used too (and they are the ones that even killed hashbang), you can leverage Google Sitemaps and the rel=nofollow to try to mitigate bad SEO.