Wednesday, October 31, 2007

Google Tech Talk: Plurilingualism on the internet

Google hosts a surprising number of really interesting tech talks about language. Back in July I attended a particularly good one by Stephanie Booth, about plurilingualism on the internet. Here's the abstract:

More people are multilingual than purely monolingual. Yet the internet is a collection of monolingual silos. Where are the multilingual spaces? How can online applications assist the people who bridge the linguistic chasms, instead of hindering them? How do present applications decide what language to present? IP address or keyboard locale detection are clearly bad solutions. How could this be done better? This talk addresses some localization issues, but beyond that, questions the very way languages are dealt with on the internet.

It's definitely worth watching in full, but if you want the highlights, these are some of the more interesting ideas I took away from it:

  • Code-switching! I'd forgotten there was a term for it. Code-switching is awesome, especially as a form of word-play. Everyone should try it.
  • Stephanie is multilingual, and when she blogs she prefaces each post with a short summary in the language in which it wasn't posted (that is, posts in English get a synopsis in French, and vice versa). This lets her reach two language communities at once, without the tedium and mess of double-posting each post in full. Check out these recent examples.
  • "Some people really resent being shown languages they don't understand."
    Google develops software with a global reach, and we put a lot of care into trying to make sure users get our products in the right languages; but this quote was an interesting reminder that getting it wrong can provide a very negative experience for a particular user. Right now, for example, we use IP address as a factor in determining which version of Google search to show. If you're browsing from a US IP, we'll show you in English; if you're browsing from a French IP, we'll show you in French. But what if you're browsing in Switzerland? We'll show you, but should we show the German, French, Italian, or Rumantsch version? We generally default to German, which—statistically—is the right answer, but for all the French/Italian/Rumantsch speakers is clearly the wrong answer. And what about someone from China who's road-tripping across Europe? She's probably going to want to see Google in Chinese, rather than being served a different language every time she logs on.
  • The lang and hreflang attributes are underutilized and offer some really cool potential for ways of understanding documents and hyperlinks. The most common use of lang is in the <html> tag, to define the language of an entire webpage: <html lang="en-US">. But you could also use it to define smaller subsections: stick it in a <blockquote> tag when you're quoting a different language; stick it in a <div> or a <p> if you have a section of text in a different language (for example, a summary at the top of a blog post!).

    The hreflang attribute is even more interesting to me, since I'd never heard of it before. From W3C:
    The hreflang attribute provides user agents with information about the language of a resource at the end of a link, just as the lang attribute provides information about the language of an element's content or attribute values.
    So if you link to a cool website in Spanish, you could throw <a hreflang="es" href=""> in the <a> tag. The thing about these attributes, though—especially hreflang—is that they're underutilized because no technology takes advantage of them. But no technology takes advantage of them because they're underutilized. If we ever find a way to break out of this Catch 22, I could imagine some cool opportunities (visualizations for language targets, applications in search and social networking... the sky's the limit!).

1 comment:

JohnMu said...

How about using the hreflang attribute together with CSS to show the user which language the content the link-target is in:

a[hreflang~=es] {
background: url('/flags/es.gif') no-repeat center left;
padding-left: 17px;

You'd have to add styles for all the languages and adjust the padding to match the size of the flag. Most web-stats packages have a collection of flags that could be re-used (obviously check the license first).