Give your web app international appeal with PHP, Part II

In part one we covered the basics of how to get your website internationalised. In this part we're going to take things a stage further and look at some of the real world problems you might encounter when working with other languages.

Plurals

Dealing with plurals is the first challenge when internationalising a website. Let's imagine you're building a blogging engine and want a simple label at the bottom of each post to say how many comments have been posted. You could begin by creating two entries in the PO file; one for 'comment' and one for 'comments', then choosing one based on how many comments there are. The problem with this approach is you are assuming there are only ever two types of plural (which there are in English). This is not always the case with other languages.

Eastern European languages for example quite often have three plural forms, some Asian languages such as Japanese don't even use plural forms. The plural forms section of the gettext manual has the following example of how they work in Polish when describing a number of files (plik):

1 plik
2,3,4 pliki
5-21 pliko'w
22-24 pliki
25-31 pliko'w

As you can see it has 3 different types of plurals (plik, pliki, pliko'w), other issues arise in languages, like Spanish, where other words change depending on plurality, for example:

the red car
el coche rojo

the red cars
los coches rojos

In this case just altering the Spanish word for car (coche) wouldn't be enough since the article and adjective need to be changed as well.

So how do we compensate for this? It turns out gettext already has a way of dealing with this using the function ngettext over gettext in places where we need to use plurals. It works the same as gettext but takes two extra parameters: the first is the sentence in it's plural form and the second is the number relating to the plural, so this might be the number of comments or in this case the number of cars.

Our PHP will look like this:


  //Old singular form
  echo gettext("the red car");

  //New plural form
  echo ngettext("the red car", "the red cars", $numberOfCars);

When we generate our PO file using xgettext it will see that we've used ngettext and generate a place for our translator to type in the singular and plural forms of the whole sentence, like so:


  msgid "the red car"
  msgid_plural "the red cars"
  msgstr[0] "el coche rojo"
  msgstr[1] "los coches rojos"

The number in square brackets indicates the type of plural, so for Polish we would have:


  msgid "file"
  msgid_plural "files"
  msgstr[0] "plik"
  msgstr[1] "pliki"
  msgstr[2] "pliko'w"

But if we wanted to print '42 files' how would gettext know which of the three plurals to pick? We have to tell it through a formula. Somewhere near the top of your PO file (you may need to open it in a plain text editor for this) you need to add something similar to the following to describe your language. In this example we'll just use English:


  "Plural-Forms: nplurals=2; plural=n != 1;n"

So what on earth does that mean? Well nplurals tells gettext how many plurals the language has (English has two), plural then defines the formula where n is the number passed to ngettext via the third parameter. In this case if n == 1 then plural will equal 0 (i.e. not a plural) otherwise it will equal 1 (i.e. use the plural). Thankfully we can reuse this for languages such as German, Spanish, Italian and Dutch. For other languages such as Polish and Czech the formula becomes a bit more complicated:


  Plural-Forms: nplurals=3;
                plural=n==1 ? 0 :
                n>=2 && n<=4 && (n0<10 || n0>=20) ? 1 : 2; //Polish

  Plural-Forms: nplurals=3;
		plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;  //Czech

Fortunately the plurals section of the gettext manual has a list (they start about half way down) of most of the formulas you're ever likely to need, so you can just copy and paste them into your PO file header. This saves much banging of your head against the wall whilst trying to work out terrifying formulas for languages like Russian:


  Plural-Forms: nplurals=3;
		plural=n==1 && n0!=11 ? 0 :
	        n>=2 && n<=4 && (n0<10 || n0>=20) ? 1 : 2;

Yikes!

Dynamic Data

One of the problems mentioned briefly in part one was dynamic data. For example: what if you want to place a user's name into a sentence: “John likes apples”. This could be approached like this:


  echo $name . " " . gettext("likes apples");

What happens when our new language needs to put something in front of the name? For example in Spanish this sentence would be:

A John le gustan las manzanas

An 'a' has been added to the start of the sentence. It's not optional and we need a way to allow our translator to add it. The simplest approach to this is to use something like sprintf e.g.


  echo gettext("%s likes apples");

Our translator can then provide the following translation in the PO file:


  msgid "%s likes apples"
  msgstr "A %s le gustan las manzanas"

Our PHP then becomes:


  echo sprintf(gettext("%s likes apples"), $name);

Sprintf also allows for numbered arguments which means the translator can swap their order if they need to, numbered arguments can look at bit confusing to non-techies so you could write your own version of sprintf that uses a simpler numbered syntax, on Diarised this is exactly what we did which means instead of seeing something like %2$s, our translators see {2}.

While sprintf is good for situations like the one above it's not so great for links. At present you may need to stick something like this in your PO file:


  msgid 'www.example.com“>click here to visit our home page'

Just as we didn't want to give our translator PHP code we shouldn't be giving it HTML either. While we could break it down and add entries for both 'click here' and 'to go to our home page' this isn't a very good solution since your translator will almost certainly need to see the whole sentence in context in order to produce an accurate translation.

For Diarised we wrote a sprintf style function to look for square brackets and turn the text inside them into links e.g.:


  echo linkprintf("[click here] to visit our home page", "www.example.com“);

This means we can hide HTML from our translator and also keep the urls out of the translation file, so our translator sees:


  msgid '[click here] to visit our home page'
  msgstr '[haz clic aquí] para visitar nuestra página de inicio

Localising other content

As we've seen gettext is great for localising text but what about things like images, videos and audio? For Diarised we needed localised images for our Diary graphic on the home page as well as for some of the buttons. We approached this by writing a function that emulated gettext but would return localised files if available, dropping back to English by default.


  function get_localised_resource($localePath, $normalPath, $filename){
    $locale = $_SESSION[”locale”];    //get the current locale
    $localePath = “locale/$localePath/$filename”;    //path to localised resource
    if(file_exists($localePath)){    //if our localised version exists, return it
      return $localePath;
    }else{
      $normalPath = $normalPath.”/”.$filename;    //if not check for our fallback version
      if(file_exits($normalPath)){    //if it exists, return it
        return $normalPath;
      }else{
	return null;    //otherwise return null
      }
    }
  }

  function get_localised_image($filename){
    return get_localised_resource(”images“, “templates/_img”, $filename);
  }

Here we have a generic localised function that checks for a localised resource and an example of the kind of function you could write to use it. The first parameter we pass (”images”) tells our function which folder our localised content is in, so if our locale is Spanish our path will be locale/es_ES/images/.

Our second parameter is where to look if it can't find the localised version, we keep our regular images in a folder called templates but you'll obviously need to update this for your own projects. This means we can put something like this into our HTML:


  “>

From this point adding — for example — localised video is just a case of creating a new function get_localised_video that just passes different directory names to our main get_localised_resource function. For Diarised we even went as far doing this for our e-mails. Although, since these are text, we could have used gettext it's always neater to keep content like e-mails separate from code and this allowed us to do it.

Automatically setting the locale

A nice touch for a localised website is to try and work out what language the user might want the website to appear in. Although there's no way of knowing for certain we can make an educated guess that will probably get it right for most of your visitors. The way to do this is by checking the language header sent by the user's browser when it requests a page. For example mine looks like this:


  en-gb,en-us;q=0.7,en;q=0.3

This lists the three preferred languages, en-gb (British English), en-us (US English), en (English), the last two are followed by q= and a number indicating the quality value, the higher this value the more the user prefers it. If the figure is missed (such as with en-gb) it uses the default value of 1. Lets look at another example:


  es-es,de;q=0.7,en;q=0.3

This particular user is saying, send me Spanish if you've got it, if not German and if you haven't got that then English, so with this in mind how can we go about sending a page in the user's preferred language if it's available? In PHP you can check the user's preferred language by calling $_SERVER["HTTP_ACCEPT_LANGUAGE“]. From there you can write a function that uses regular expressions to work out the user's preferred language or if you're feeling lazy you can just use the one we wrote for Diarised:


  function get_accept_language(){
    $matches = preg_split("/,[ ]?/", $_SERVER["HTTP_ACCEPT_LANGUAGE“]);
    $results = array();
    foreach($matches as $match){
      if(preg_match(”/;q=(d[.d] )/”, $match, $scoreArr)){
        $score = $scoreArr[1];
      }else{
        $score = 1;
      }
      $results[$score] = str_replace($scoreArr[0], “”, $match);
    }
    krsort($results);
    return $results;
}

This will return the languages codes in an array sorted in the order the users wants, this allows you to search through the array until you find a language you support. It's worth remembering you should only do this as a last resort when you don't know for certain which language your user wants. So if they've chosen to see the website in French don't force it to appear in German just because their browser says is German their preferred language.

Conclusion

That's it. hopefully with the help of these articles you'll be able to start producing your own internationalised websites. We've covered a lot but actually this is just the tip of the iceberg. Other issues include formatting dates, times, numbers, names and telephone numbers. There are also region specific things even between countries that speak the same language for example the US has zip codes while in the UK we have post-codes. I'll leave you to figure out some of this for yourselves!

If you'd like to learn more about this the internationalisation section of the w3c website is a great start. They have lots of useful information on things such as date formats, forms and character sets as well as a nice list of tips that cover most of the issues you might run into. I'd also highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) from Joel on Software, character encoding issues can be a complete nightmare so it's well worth trying to understand how they work so you can avoid them.

Free Workshops

Watch one of our expert, full-length teaching videos. Choose from either HTML, CSS or Wordpress.

Start learning

Comments

0 comments on “Give your web app international appeal with PHP, Part II