I am looking at implementation strategies for a website, it has been a while since I have been in i18n land. So I have been doing some research and I am shocked at the state of things.

Web browsers when they request a page send details of the language that they would like to receive it in Accept-Languageen-us,en; is what my firefox sends with every request. This tells the server that I would like english content. (us english, but that is another rant). This is part of the core http standard

What this would mean is that when I go to a website, the server could send back responses that depend on my browsers setting. http://www.example.com/ Would show english content if I was browsing with an en language set, french if I had fr set and so on. This is a pretty good user experience. That page represents a point in the web graph you would want to link to it from anywhere and not care about the locale of the person reading the link.

So it seems like a nice mechanism to create a multi lingual site, right.

WRONG.

Google make this next to impossible. Googlebot dosn't send a accept-language header, which is valid http but it means that you only get one language indexed.

In fact google even states:

"Keep the content for each language on separate URLs. Don’t use cookies to show translated versions of the page. Consider cross-linking each language version of a page. That way, a French user who lands on the German version of your page can get to the right language version with a single click."
So they are saying, don't track user preferences via cookies (or headers), they also say that they ignore language codes in a page and guess the language based on the content.

What google would like apparently is:

http://www.example.com/ (shows a language picker page with no content)
http://www.example.com/en - shows the english version
http://www.example.com/fr - shows the french version
etc

Why is this bad?

As a french user I may create link to the french url, as an english user the english, so even though they have the same equivalent content. A german user may not see a link to the page or may assume there is no german translation because there isn't a link in their language.

Remember Google also doesn't like duplicate content on a website, so you are going against google's own wishes to do the language dance.

What can you do about it?

If you don't care about your alternate languages being indexed then you don't need to do anything, you can use the http header (backed with a cookie preference override). If you have a web app (which won't be indexed) this is not a bad idea.

However a lot of pages do need to be indexed, there you are stuck with language based URLs, there isn't even a language option when creating site-maps.

You can avoid being marked down by google for duplicate content by using rel-alternate meta tags:

<link rel="canonical" href="http://www.example.com/" />

<link rel="alternate" hreflang="en" href="http://www.example.com/" />

<link rel="alternate" hreflang="en-gb" href="http://en-gb.example.com/" />

<link rel="alternate" hreflang="en-us" href="http://en-us.example.com/" />

<link rel="alternate" hreflang="de" href="http://de.example.com/" />

I don't often have a go at the big G, but in this case I think they have done the web a disservice.