Virtual OS/2 International Consumer Education
VOICE Home Page: http://www.os2voice.org
December 2001

[Newsletter Index]
[Previous Page] [Next Page]
[Feature Index]

editor@os2voice.org


Multi-language websites with Apache web server

By Christian Langanke © December 2001

Table of contents:



Introduction

I have been using OS/2 since 1990, and developing several freeware programs for OS/2 during that time. Some years ago, I thought that it would be nice to have my own web site, where I wanted to offer my programs as well as some documentation. Just as with most of my programs, I also wanted my website to support at least two languages, namely German as my native language and, of course, English to reach most OS/2 users not speaking German. Until some weeks ago, my website used HTML hyperlinks to distinguish between the two languages, which means nothing other than that I hardcoded the language information within the filenames and directory names in the HTML links.

After having more experience with the Apache web server, I learned about a dynamic technique called MultiViews, which makes it obsolete to hardcode language information in the HTML code. MultiViews are an implementation of the so-called "content negotiation" within Apache, as defined in the HTTP/1.1 specification as well as in some RFCs. This means that the server and the browser negotiate about what would be the best variant of a requested document to be returned by the server, if several variants are available.

MultiViews are designed to differ not only between languages, but also between other type of content (like e.g. between HTML, plain text and Postscript), but supporting automatic language selection is probably the most common use of this feature and only this is addressed here. When I found out that my internet provider is also using Apache, I began testing how MultiViews would work and very quickly changed my website to use this feature. With this article I want to share my experience with you.



Do I really need Apache for this?

The Apache web server is of course not the only web server that supports MultiViews (content negotiation). Also other servers that do not support it yet, may well be extendable to support a similar function, as for example I recently coded a filter extension for the free GoServe web server from the IBM employee written software program. With this extension GoServe at least supports the part of content negotiation regarding language selection.

This article however deals with the Apache approach only. In order to use the Apache MultiViews on your website, you are required to have either a provider using Apache webserver or your own machine running Apache on it. Moreover, of these two cases this article only covers the first one, explaining in detail how you can make use of MultiViews with the server of your provider, and what you could ask your provider for in case this feature is not yet configured on this web server. With this information also an advanced Apache user should be able to configure its own server with no problem.



What webserver is your provider using?

First of all, it is very likely that providers today are using Apache webservers, mostly on BSD or Linux. If you don't know how to determine what kind of webserver your provider is using, there is a simple way to do this: enter an invalid URL from within his webserver (just append any character to a valid URL). The webserver will, of course, complain that the requested file could not be found, and this message will contain the name of the webserver being used.

By the way, in rare cases Apache is configured to display a modified error message, for example to support error messages in different languages - then the admin of your server is using exactly the technique being described in this article. If such a message does not mention the webserver being used, you have to send the providers' admin an email and just ask.

[Editor's note: An alternative method is to go to http://uptime.netcraft.com/up/graph and enter the URL and it will tell you the server and operating system being used.]

If your provider does not use the Apache server, and you do not want to use conventional HTML links to distinguish between languages, there are other mechanisms to support automatic language selection from the server side. These are based on different mechanisms like e.g. CGI and PHP and are not discussed here.



What Apache MultiViews can do for you - from the users perspective

Do you know multilingual web sites where you first come to an entry page and have to click onto a flag to get this site in your preferred language? Well, having to do this every time annoys me quite much, and I always think "hey, can't this site remember what language I want?".

With Apache driven websites supporting MultiViews this is no longer an issue. All you have to do as a user in order to take advantage of MultiViews is simply install a browser in your native language. Then the website automatically and seamlessly takes care for your language. But how is that accomplished?

Did you ever notice the language settings dialog in the preferences of your browser and possibly thought "well, what is this being used for anyway?". You may have guessed it already, yes, it is to tell your browser which language or list of languages you prefer to receive from web servers. Mostly you do not even have to maintain this list, as a web browser usually adds its language to this list during installation. So if you install a web browser in your native language, for the browser this will automatically be also your  preferred language.

The browser sends the list of preferred languages to a web server with every request. The bad news is: most servers and/or the websites hosted on them just ignore this list. The good news is: websites supporting Apache MultiViews will make use of it: if the requested page is available in one of your preferred language, the page is returned in the first preferred language supported. If not, the page is returned in a fallback language (usually english).

As a result, as a user you will perhaps not even notice that a given web site supports multiple languages, instead you just have the language displayed that fits your needs best. No more clicking on flags, no more chasing links to read a website in your favourite language, just enter the web site and start to read...

The rest of this article deals with the implementation of that neat feature on the server side. If you are a user only and/or don't care for hosting a multilingual website, you are through by now and will possibly just be glad about every website serving you well using MultiViews. If you own a multilingual website or are just interested further, here is the rest of the story.



What Apache MultiViews can do for you - from the webmasters perspective

MultiViews also help the webmaster a lot. Most other techniques being used so far in order to support multiple languages are for example:
All these techniques require some language identifiers hardcoded within the links in your HTML code or being passed as additional parameters to scripts (although a dynamic scripting method is far better than hardcoding language information).

If you use MultiViews instead, managing content in different languages is completely transparent to the HTML code, as the server does all the work required to distinct between content of different languages. Here is how it works.



MultiViews in short

Just as with any other technique, content in different languages has to be provided in separate files, one file for each language. The trick is that language identifiers are not embedded within the filename or pathname of a file, but appended as additional file extensions to the names of files containing the web pages.

The huge difference to other techniques is that the browser does not select between files with content of several languages according to the links within your HTML code. Instead the server checks for the additional filename extension to distinguish between language variants. Moreover, this method is completely transparent to the web browser, as the language identifiers are not even sent to the browser. This results in a very convenient side effect: you can link within your files without specifying the language identifiers, having exactly the same links for every language and thus avoiding errors during translation.

The above first sounds like one could add any kind of file extensions to a filename, but that is not true - Apache only supports known or additionally configured MIME types or language identifiers.

In the sample coming up now, we use en as the identifier for the English language and de for the identifier for the German language, as defined in RFC 1766. Refer to the language selection dialog within your browser settings to see more language identifiers.

Now let's think of a single file named index.html to make explanation easier. In order to support English and German language, you normally would have files on your web space like

or
This means that the language identifiers are to be hardcoded in the links to your files. If you translate your website to an additional language, you have to adjust the links to your own pages according to the new language, otherwise you get links to the wrong language.

Now prepare for MultiLinks by naming the files like this, where English is the fallback language:

Note:

While the purpose of the first two files is obvious, why do we need the third file? This one is returned by Apache, if neither the English nor the German language was set as preferred by the user, but any other; unfortunately Apache does not cooperate in a sensible way here. In order to provide a file with a fallback variant we may not use the name index.html and for obvious reasons, we also cannot use any other language identifier as an additional filename extension. Instead we may only use a valid MIME type here and logically we have to reuse the MIME type extension properly describing the contents of the file, which is .html  again. The same scheme would apply to PHP files, in this example this would result in the name index.php.php for the fallback variant.

On your own Unix server you will most likely use a symbolic link within the filesystem to access index.html.en through index.html.html, so that you don't have to provide the english version twice. On an OS/2 machine, or when you don't have telnet access to your Unix-based server to create such links, you are required to use a copy for the fallback variants or use serverside includes, unfortunately wasting some harddisk space.

But all in all: isn't that simple?



Are Multiviews already available on your web server?

The Apache server package is mostly distributed with sample configuration files already specifying the MultiViews option. So it is very likely that also your provider has configured his Apache to support MultiViews and that you don't have to deal with any Apache configuration issues at all.

To find out if that is the case, just upload test files according to the above sample onto your webspace and request index.html within your browser (without appending the language identifier to the filename!). Further, play around with the language settings within the preferences of your browser and reload the document after each change:

If MultiViews work for you, be happy - you can jump over the next two sections, which describe what to do, when it does not work at all.



Configuring your webspace for MultiViews

So the above example did not work for you? At this point it seems as if the admin of your provider has not allowed MultiViews in the main configuration of Apache. As usual samples for the Apache main configuration include this option, he very likely has removed it (question is, whether more on purpose or more by accident).

In this situation hopefully another option is left for you: you can modify Apache's behaviour for the files residing in your web space, provided that at least this is allowed within the main configuration of Apache. You do that by creating an additional configuration file named .htaccess  and upload it into your webspace. You can have one per directory, but since settings are passed onto subdirectories, one .htaccess  file in the top level directory of your web tree is sufficient in most cases. A common exception to this rule are .htaccess  files that restrict access to a certain subdirectory, this file is only placed into that directory. Note that the value of a directive in the .htaccess  file of a subdirectory will override the value from the .htaccess  file in the parent directory, and so forth.

The .htaccess  file may contain a lot of different directives. To name the most important uses, you can configure Apache for example to

Now let's try if a .htaccess  file does the MultiViews trick for you. For that it is required to have a line with the directive Options and at least the parameter MultiViews like
Options MultiViews
If you define other options as well, specify them together on one line looking like one of these examples:
  Options MultiViews <other_option> <other_option>
or
  Options <other_option> MultiViews <other_option>
Now upload a file named .htaccess  containing the options line to the directory, where the files of the above sample reside.

IMPORTANT: Do not edit the .htaccess  file on your local system with an editor that appends an end-of-file character (ASCII code 26) to the file, like the Tiny Editor (TEDIT.EXE) and the OS/2 system editor (E.EXE) do. Apache will ignore such a file even when uploaded in ASCII mode!

Now test the MultiViews by accessing index.html of the sample above subdirectory. The following may occur:

  1. Apache correctly loads the language variants.
    Bingo! You're done!

  2. Apache complains about a configuration error.
    This means that the admin does not allow you to use your own .htaccess  files. Delete the .htaccess  file from within your webspace again and read further in the next section.

  3. Apache does not load the language variants, but just reports that index.html could not be found.
    This means that the admin changed the default name of .htaccess  to a different value. You have to ask him what filename you have to use instead.



My provider does not allow MultiViews, what do I do now?

If you find out, that the admin of your provider has configured the Apache server to neither allow MultiViews by default nor allow you to you use your own .htaccess  configuration files, you might want to send him an email, explaining your intention and kindly asking him to allow either method.

If he agrees, very good! If not, I can think of only two reasons why he might not want to do this:

  1. He is not sure whether this could be a security issue, because he does not understand that feature. You can safely assure him that this is not the case. For that point him to the Apache documentation about content negotiation, which describes the MultiViews feature in detail.

  2. He might think that this could decrease the performance of the server, as stated in the Apache documentation about content negotiation. Assure him that you won't have more than only a few files within one subdirectory, so that the webserver will not spend much time searching for language variants - only having a very large number of files within one directory will slow down Apache by using MultiViews in exactly that directory. This point probably becomes clearer when reading further within the documentation about how MultiViews work in detail.

Note:



Migrating multilingual websites to MultiViews

Now that MultiViews are working for you, you possibly already have a multilingual website, which is using HTML links containing hardcoded language information. If so, migrating to MultiViews first of all means work to you, before you can benefit from this method in the future, requiring less effort to maintain your website and offering a more convenient interface to the user.

The migration method depends on how you have used language identifiers embedded within file and/or directory names until now:

There is one option left for you to get started with MultiViews when having already a multilingual website: just replace the main index file with a MultiView version, but link to your other web pages, which have not yet been migrated.

Advantages are:

Disadvantages are:



Migrating monolingual websites to MultiViews

If your website currently supports only one language and you want to add another language now, you can manage that easily using MultiViews and moreover, this allows a very smooth migration. Since you did not already have embedded language identifiers in filenames and directory names, you

You will most likely convert complete subdirectories, but in the following example we have migrated and non-migrated files of one subdirectory in order to make clear, that this really can be done file by file. Here the first two pages are available in two languages and the fallback variant, while the two last pages are available only in one language. This really allows smooth migration:

Again the fact must be stressed that all links to these files need not to be changed, since the MultiViews method is completely transparent to the browser! All links do not include the language identifier.




Conclusion

I was really excited to implement the MultiViews feature on my own homepage since I read about it, and was very happy when I found out that my provider does not allow me to use my own .htaccess  files, but has its web server already configured to support MultiViews anyway.

From a webmaster's point of view: whenever I have to support a multilingual website in the future, I will try to avoid any other method at all costs, because MultiViews are flexible, intuitive and far less error-prone than any other method that I know. It can also easily be combined with other techniques such as using PHP scripts, in this case no language parameter is required anymore.

From a user's point of view: I also regard highly that MultiViews take care of my language preferences, and it is very simple for users to configure the language list within the preferences of their web browser (if that is required in rare cases anyway).

Now that I know the MultiViews technique, which can seamlessly and automatically integrate multiple languages into a website, I would use flags and other hardcoded links only if I did not have an Apache web server at all. I think that most users do not care for how many languages a website supports, as long as the preferred language of the user or the best alternative to that is automatically supported. So in my opinion that integration of multiple languages into the website is the best one, that you hardly notice, if at all.

References:

Christian Langanke's home page: http://www.clanganke.de/
Apache documentation (content negotiation): http://httpd.apache.org/docs/content-negotiation.html
Apache web server home page: http://httpd.apache.org/
IBM GoServe web server home page: http://www2.hursley.ibm.com/goserve/
GoServe filter extension for adding content negotiation regarding language selection - http://www.clanganke.de/os2/archive/gosmv100.zip


Christian Langanke has been professionally involved with OS/2 for 10 years and deals with topics from installation and software distribution via network integration of OS/2 systems and applications up to intra-/internet and host connectivity.

On his homepage he provides several self-created OS/2 programs for free. He also is author of the forthcoming Team OS/2 Internet Assistant for OS/2 and eComStation and supports OS/2 Netlabs in providing CVS services for OS/2 internet projects like e.g. Odin und Everblue with his Netlabs Open Source Archive Client and Administrator packages.


[Feature Index]
editor@os2voice.org
[Previous Page] [Newsletter Index] [Next Page]
VOICE Home Page: http://www.os2voice.org