Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

Working with multilingual websites

Friday, March 19, 2010 at 9:03 AM

Webmaster Level: Intermediate

A multilingual website is any website that offers content in more than one language. Examples of multilingual websites might include a Canadian business with an English and a French version of its site, or a blog on Latin American soccer available in both Spanish and Portuguese.

Usually, it makes sense to have a multilingual website when your target audience consists of speakers of different languages. If your blog on Latin American soccer aims to reach the Brazilian audience, you may choose to publish it only in Portuguese. But if you’d like to reach soccer fans from Argentina also, then providing content in Spanish could help you with that.

Google and language recognition


Google tries to determine the main languages of each one of your pages. You can help to make language recognition easier if you stick to only one language per page and avoid side-by-side translations. Although Google can recognize a page as being in more than one language, we recommend using the same language for all elements of a page: headers, sidebars, menus, etc.

Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.

Someone who comes to Google and does a search in their language expects to find localized search results, and this is where you, as a webmaster, come in: if you’re going to localize, make it visible in the search results with some of our tips below.

The anatomy of a multilingual site: URL structure


There's no need to create special URLs when developing a multilingual website. Nonetheless, your users might like to identify what section of your website they’re on just by glancing at the URL. For example, the following URLs let users know that they’re on the English section of this site:

http://example.ca/en/mountain-bikes.html
http://
en.example.ca/mountain-bikes.html

While these other URLs let users know that they’re viewing the same page in French:

http://example.ca/fr/mountain-bikes.html
http://fr.example.ca/mountain-bikes.html


Additionally, this URL structure will make it easier for you to analyze the indexing of your multilingual content.

If you want to create URLs with non-English characters, make sure to use UTF-8 encoding. UTF-8 encoded URLs should be properly escaped when linked from within your content. Should you need to escape your URLs manually, you can easily find an online URL encoder that will do this for you. For example, if I wanted to translate the following URL from English to French,

http://example.ca/fr/mountain-bikes.html

It might look something like this:

http://example.ca/fr/vélo-de-montagne.html

Since this URL contains one non-English character (é), this is what it would look like properly escaped for use in a link on your pages:

http://example.ca/fr/v%C3%A9lo-de-montagne

Crawling and indexing your multilingual website


We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.

If you’re going to localize, make it easy for Googlebot to crawl all language versions of your site. Consider cross-linking page by page. In other words, you can provide links between pages with the same content in different languages. This can also be very helpful to your users. Following our previous example, let’s suppose that a French speaker happens to land on http://example.ca/en/mountain-bikes.html; now, with one click he can get to http://example.ca/fr/vélo-de-montagne.html where he can view the same content in French.

To make all of your site's content more crawlable, avoid automatic redirections based on the user's perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

And last but not least, keep the content for each language on separate URLs - don't use cookies to show translated versions.

Working with character encodings


Google directly extracts character encodings from HTTP headers, HTML page headers, and content. There isn’t much you need to do about character encoding, other than watching out for conflicting information - for example, between content and headers. While Google can recognize different character encodings, we recommend that you use UTF-8 on your website whenever possible.

If your tongue gets twisted...


Now that you know all of this, your tongue may get twisted when you speak many languages, but your website doesn’t have to!

For more information, read our post on multi-regional sites and stay tuned for our next post, where we'll delve into special situations that may arise when working with global websites. Until then, don't hesitate to drop by the Help Forum and join the discussion!

The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

47 comments:

Ben Griffiths said...

What if you are using Google Translate? I've found the translations to be pretty accurate (at least for Spanish). If you use Google Translate on your site, could Google index the site in multiple languages?

Tommo said...

Regarding the characters like é in URLS like http://example.ca/fr/vélo-de-montagne.htm. Do you recommend this or should the URL read http://example.ca/fr/velo-de-montagne.htm ?

Jose Nobile said...

In my personal experiencie, google don't positioning urls encoded, is better don't encode url's, and good config in web server for recognition.

And ever all in UTF-8.

Nick said...

Ben if you think google translate is accurate, not sure your spanish is so good...

Azhar Iqbal said...

It's great to optimize a multilingual website, i now realized that Google translate or any automated translation is not too good

SEO Pakistan, Dubai

Mike Unwalla, TechScribe said...

@Nick: Ben if you think google translate is accurate, not sure your spanish is so good...

Ben is correct. Google Translate is "pretty accurate (at least for Spanish)".

English text was translated into Spanish by Google Translate. Six professional translators evaluated the translation for fluency and for accuracy. The translation is satisfactory (http://www.international-english.co.uk/mt-evaluation-en-es.html).

Allen Brown said...

If Google "ignores all code-level language information", how can I indicate that my page is English and is not multi-language? I have a series of pages where a significant minority of the content consists of names and towns of birth. Google has decided some of these pages are French, German or even, bizarrely, Slovenian. How can I ensure that they are treated as English?

ZenCocoon said...

To complete Tommo and Jose Nobile comments on international URLS, what would be the best way to handle Greek urls ?

Having in English:
http://example.com/en/contact

What would you recommend :

1) written with Latin letters
http://example.com/el/epikoinonia

2) encoded to support Greek letters (is that even possible to be displayed this way in the browser's address bar?)
http://example.com/el/Επικοινωνία
--
Sébastien Grosjean - Sivota, Lefkada

budisakty said...

I really enjoyed your article,It has been extremely helpful,The information provided by you is very good,
is very excellent.For that I need to say that I am very grateful to the information you share through your blog. good luck with you.

Ahmad said...

Nice

David said...

In a country like Belgium, with more than one official language and many capable of understanding (reading) more than one of those language... what would be recommended.

I have http://www.poureva.be/ for French and http://www.vooreva.be/ for Dutch. Both web site share the same database with all the article reachable. However the heading/menu/footer are in the language of the domain and not the language of the article.

Should I force a domain switch when one user want to see content in the other language?

What about letting bi-lingual audience know about things available in the other language?

bob said...

In all honesty Google translate is not very professional at least for French. BUT translating a site or having a bilingual site is very costly and where I live it's actually the law ( Quebec, Canada.) Many clients simply cannot afford having two versions of a site and choose to use Google translate instead.

OnTheGoSystems said...

I need to frame and keep your explanation about avoiding automatic visitor language detection.

Folks keep asking us to add it to our multilingual WordPress plugin (WPML) and we keep explaining it's going to cause indexing problems.

Great to have something to reference back to.

Robert Neville said...

Talking about languages, I go to Google Webmasters as usual and everything is in German. I'm not German, I don't live in Germany, I don't speak German, my computer/IP has nothing in German.

Could you please turn off those stupid language detection settings that never work and just leave it in plain old English until the user decides to change? Or at the very least put a dropdown menu to choose languages from. This is MIGHTY annoying and an inexcusable error of judgement in your UI.

Sergio said...

Ben: Google translate is pretty good to get the general idea of what a page is about. As you say, it does a pretty good job, however it is not always completely correct. You would not want to have you marketing material developed by someone who writes "pretty" good, but can make incorrect choice of words, and render the material useless. Just try to translate this text into Spanish, and back into English, you will be enlightened.

BGiffuni said...

@Ben, @Mike, I just read a little bit of the study about Google Translate and just by the review of the first translator the tool is totally unacceptable to provide good information and somewhat decent user experience for the site.
As per Reviewer: "Overall, I can more or less understand the general idea and some of the detail. However, some of the key words, examples and sentences do not make sense, appear to be contradictory, do not follow on logically or are unclear, so the point being made is somewhat lost and the reader gets a bit confused. Therefore, as a result, doubt creeps in as to the overall understanding." T1.

So I tried Google translate on my site www.blueadvertising.biz and if you want to laugh a little try it yourself.

Janine Libbey said...

"We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam."

I think it's hilarious that employees at Google don't recommend automated translations because they don't make sense considering the poor quality of what Google Translate produces.

http://www.pandltranslations.com

pascal said...

Over the last weekend, I had a go at Apache Multiviews, to make it more accessible, user and SEO friendly.

Most important, tweaking MultiViews to work with the recommended site structure such as mysite.com/en/whatever.

Read the article at http://www.fellerich.lu/articles/multiviews
(currently only in english, sorry guys :-)

dvdroest said...

I am also very curious how you can best show non-ascii in the URL and then especially for the asian market where you cannot use a latin fallback character.

About automatically translating content for your site. I believe that if you are serious about your website then automatic translations are not an option.

I think that some website owners sometimes forget that the content on the site is the reason the site exists, if the translations will be bad, you might as well not publish it. Also, badly written content will not give a good impression of your company.

Arvind said...

Nice Article.
Thanks for Great Post sharing with us.

Smile said...

I reached this post after google webmaster login page was presented in a language that I could not understand, with no way of changing it! the irony...

Affar said...

Great post,

But in some scenarios changing the language is done at session level and not by the URLs. So how can I tell Google to index the other languages?

CHELSEA BLOGGER said...

It's great to optimize a multilingual website, i now realized that Google translate or any automated translation is not too good

http://balibestjegeg.blogspot.com/

tikoim said...

Referring to Ben's question - what is really about the auto-translate-things-n-stuff. I'm mean if Google is indexing this auto translated content so this could also be a way of easy generating content, not to call it spam, but in a way... ok, this would be really interesting to know how Google handles this and if they can distinguish between human translation and 'their own' translation. Cheers

Adnan said...

You should really try to make a visual distinction between your post titles and body h3 tags. It's very confusing.

Bart Cuylaerts said...

Why encourage properly escaped URL’s ? Users should not be confronted with strange looking URL’s. URL’s should be user-friendly, not Google friendly.

BTW: Have you ever seen an escaped Russian URL? It’s scary!

Hakan said...

my personal experiencie, google don't positioning urls encoded, is better don't encode url's, and good config in web server for recognition.

John Mueller said...

@Tommo - Either é or e in URLs is fine for us. While more focused on the query, this blog post might also be interesting in that regard: http://googlewebmastercentral.blogspot.com/2006/08/how-search-results-may-differ-based-on.html

@Allen Brown - Keep in mind that Google can recognize more than one language on a page, so even if we see a lot of non-English content on those pages, we will likely still see enough English content to know that the page is in English as well. Minimizing non-English content makes it easier though.

@ZenCocoon - Both are fine for us and both kinds can be found in the search results. Which one would users search for and click on?

@David - I'd recommend making multilingual content available on just one URL and redirecting the other domains to that URL. You could expose that to users by making it easy for them to switch between languages with normal links (text or images).

@bob - Allowing users to use Google Translate is fine (there are widgets that make that easy), we just recommend making sure that those automated translations are not crawled and indexed. If you are doing this for your site, one way to work on translations could be to use the Google Translator Toolkit, which lets you & trusted helpers polish translations for your pages: http://translate.google.com/toolkit

@Robert Neville - I agree that automated language choice can be frustrating, sorry to hear that you're seeing this on our sites. One quick way to get around that for most of our sites is to add a "&hl=en" to the URL, though ultimately it would be better if this weren't necessary.

@pascal - do you have a sample site to try out? :)

@Affar - Googlebot generally does not use cookies, so session-level information will generally be dropped for future accesses. If you have content in different languages, you need to make sure that it is on different URLs, not just dependent on sessions.

@Bart Cuylaerts - Encoded URLs are only for the browser; most modern browsers automatically show URLs in their unencoded form. If a URL is not encoded on the page, it may point to a different URL on the server if the content on the page is not encoded in the same way (eg a URL in UTF-8 on a page with ISO 8859-5 content may result in an incorrect URL on the server side).

Jose Nobile said...

--
@Bart Cuylaerts - Encoded URLs are only for the browser; most modern browsers automatically show URLs in their unencoded form. If a URL is not encoded on the page, it may point to a different URL on the server if the content on the page is not encoded in the same way (eg a URL in UTF-8 on a page with ISO 8859-5 content may result in an incorrect URL on the server side).
--
OK, but if you encode your webpage, URL, server-scripts, content in database, all, in UTF-8. Zero Problems, all working fine. Don't encode URL's. Isn't good for SEO.

pascal said...

@John Mueller: Yes, the same site that has the multiviews article. I actually extended the article slightly.

Up to now, it seems to work great. One caveat, though: MSIE behind a proxy. The proxy in question modifies and partially strips the response headers, so that MSIE is left with the Vary header, which is poorly supported. The language selection won't work as advertised unless you reload each page manually. Or you need to disable cookies to make it work behind this particular proxy.

info said...

Great post. I am using English and Greek for my website. I ve set it to UTF-8. Do you think that Google ignores the greek urls?Thanks

ZenCocoon said...

@info I believe than @John Mueller meant that Greek URLs are fine for Google:

"@ZenCocoon - Both are fine for us and both kinds can be found in the search results."

@John Mueller : "Which one would users search for and click on?"

I guess they would search in native Greek, but would click and expect latin converted urls. I believe most people are not used to encrypted domains and URLs and usually don't even know it possible. (Like me few days ago for the URLs ;-))

The other factor to consider as webmaster when using encrypted URL is visitors without native keyboard.
Example, a Greek visitor traveling, find himself using a qwerty keyboard in a cybercafe. He do remember the URL in Greek: http://example.com/el/Επικοινωνία, but can't type it directly in the address bar as he don't have access to Greek characters.

I think this as a User Experience issue. Sure he can go back to http://example.com/el/ and move through pages again, but does this feels natural to you ?

P.S.: Encrypted domains are even worst, without a native keyboard, you must either search for the site or to remember the ASCII version of the url.
Example : Bücher.ch (in Greman) is in ASCII xn--bcher-kva.ch. Do you feel this user friendly ?

Thanks a lot for this great article and comments, this bring detailed answers to complicated problems and it's much appreciated.

Allen Brown said...

@John Mueller - thanks for that but how can I mark the different sections of the page in a way that Google won't ignore? Why work out it (and potentially get it wrong) if the author already know the language he's using?

adwords said...

@John Mueller and @Jose Nobile: Thx, good point ;-)

Samuca Joe said...

And last but not least, keep the content for each language on separate URLs - don't use cookies to show translated versions. // YouTube don't follow this recommendation anymore. Why?

@----------------- said...

eu gostaria de saber se colocar como lingua "pt-pt" e não "pt-br" 'xml:lang="pt-pt" lang="pt-pt"' influencia muito no referenciamento do meu site se ele é pro publico brasileiro?

O meu site é em 14 idiomas e no caso do brasileiro é www.portugues.xxx.fr

agradeço desde ja

Bridget said...

We have a site with multiple country extensions (.com, .fr, etc). When a user from a country outside the US goes to our .com site, we want them to be redirected to their correct country site (for purchasing in the correct currency). What is the best way to do this without interfering with the search engine crawls? We are considering a meta refresh with content=0. Does the French googlebot have a French IP address? Or are all googlebot's IPs from the US?

Thanks!!

sudip said...

good suggestion indeed, but what can be done if i have to optimize a site in turkish or french language only. In this case the keywords will be in lacal language, and thus can't be submitted on general directories. SO in this case what should be our strategy.

fredp said...

U don't speak about hreflang attribute's of href
I think this is important attribute as it indicates in which language the link is.

KBSD said...

Hi,

in the specific case of a Multilingual International Website - www.domain.com - with languages in sub-folders such as /fr/ or /ru/, i have got ranking and indexing issues.

Quite tricky and advanced, question is posted here :
http://www.google.com/support/forum/p/Webmasters/thread?tid=4d5267ac05688d7b&hl=en&fid=4d5267ac05688d7b00048e3ee351baed

Please drop in if you have got info, i can't find that around !

thanks :)

freddy said...

I translated the static content on my site into 17 languages, and use visual communication where possible. For some dynamic section, where users post the content, I use google translater. I limit the characters to 1000 too. It is not perfect, but by and far, it does succeed in communicating the info, even in sometimes very bizarre ways.

chpo said...

I should probably post it as a separate thread, but seen so much useful information regarding multilingual websites posted here I was tempted to place my question here.
I have my home page in 3 languages with cross-links indexed by Google (I can search text from all three versions).
But in Google Webmaster Tools – Keywords there are only English keywords (I downloaded all 120 of them). Even though content of Russian and Hebrew versions of my home page was indexed, crawler did not pickup any keywords from these versions.
Do I have to do something special to have crawler to consider keywords in languages other than English?

Alice said...

Is it preferable to use lang=en or lang=fr in url i.e passing language as a querystring or only the static url will be preferable

eg
www.domain.com/countryname/en/
www.domain.com/countryname?lang-en

which one is better to use from search engine's point of view?

Ryan and Barbs said...

To all those praising Google Translate it is definately not there yet. It will , however, get there I found that the conversion rates on my site were horrible when using google translate(spanish and portuguese).I had it professionally translated by pallavicini translation(http://www.pallavicinitranslation.com) and the change was incredible.

Eric LeRiche said...

From an SEO perspective do you recommend creating the home page to offer languages and then redirect them to the proper pages?

My issue here is that the index page ends up with no content...

Comments?

ronzo said...

Hi all, I would appreciate some advice on how to best implement multilingual sites to boost our presence in foreign countries and raise the profile on localised Google etc

So

We already have two domain names

graphskill.com and graphskill.co.uk

I built the website on graphskill.co.uk, then the boss register the .com so I just put a redirect on .com

I have the google translate on our current site, so if anyone does get there, they can change the language but that is not sufficient.

If an english language user goes to Google and types in one of our products (say 20 mm Stainless steel U-bolt) we appear fairly high up the rankings (could still do better). If someone from Germany does the same and enters the same phrase we appear. BUT they won;t they will type in the German version, so Google will not return our results.

So how do I go about this

I am guessing .de website will appear higher up the google.de search results etc

Do I register a .de website and place a whole new site there (obviously costs invovled with the new name but we can go with that) or can I do register a subdomain (free) as say www.de.graphskill.co.uk and place the Gerrman site there?

Perhaps I could register lots of subdomains de.graphskill, fr.grphskill etc and put a new foreign language site in each of them then perhaps have

graphskill.com > main page for the company where users can choose their localk site which directs them to de. fr. etc

Any ideas what would be best (seo wise)to do?

Google Webmaster Central said...

Hi everyone,

Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.

Thanks and take care,
The Webmaster Central Team