Translating taxonomies and categories

So as I mentioned, Livia, Jorge and me have been looking into international information architecture. What happens when you run a site in multiple languages/locales and need to manage the information architecture of that site? Can you just translate a taxonomy from one language to another? We are gathering a lot of material, and we’ll start sharing that and opening up the conversation. Me, I plan to write a series of blog posts on international or global IA, of which this is the first.

Let me move this along – I’d like to talk about translatability of categories a bit today. There is much more to come, later. This is kind of a long post, and rambling, too, so bear with me.

Say I run a website with recipies. We have an extensive soup section. Now, the word “soup” isn’t just a word, it is also a category, in which you can classify many particular soups, for example, the soup you ate today (a specific instance of a soup). “Tomato soup” is also a category, on our site it’s a subcategory of “soups”. So you can see there is a difference between a word and a category. A category might be “everything about this company”, in which we can put all the information about a company, and it has a label “About us”, consisting of two words.

Information architects like to group things together to make them easy to find (into categories). We’ve done research with our users and it turns out many of them would like, on a recipy site, to have a look at various chunky soups. Chunky soups are soups with big bits floating in them. Some people like chunks in their soup – I certainly do, most of the time.

it's chunky and they like it In the US, chunky soups is a well known category (ask anyone what it is, yey). So we’d like a link on our site saying “chunky soups”, under which you can then find various types and examples of those soups.

First, we have a labeling problem in English: “chunky soups” is actually a trademark by Campbell (a big soup maker). You’re legally not allowed to use it. But we strike a deal with them and they let us use it.

So “chunky soups” is a category, and even better, a category (and a label) that our users understand. Some particular soups will be part of the category, others won’t. We start using it on our site, our users find what they need. Peachy.

Then we start developing a Spanish version of our site. And a French one, at the same time. We want to get into these markets.

Can we just translate the category “chunky soups”, and if so, can we use the same soups within that category? Is the category even relevant to Spanish users? I asked some Colombian Spanish friends (a wholly unscientific survey) how they classify soups. They said, a soup is either a “sopa”, a “caldo” or a “crema”. Chunky soup as a category doesn’t seem relevant for this user group.

Let’s be clear: it’s not that chunky soups don’t exist in Colombia. Colombians have chunks in their soups, big ones, I’ve tried them. It’s that the category “chunky soups” just isn’t used in daily life, isn’t relevant. It, in practice, doesn’t exist. I don’t think our Spanish-speaking users (if my little survey extends to all Spanish speakers) will look for chunky soups on our website.

Let me be even more clear. I am not talking about dictionary definitions here. I am not trying to find out what the “real” meaning of chunky soup is, or what the real meaning of a “caldo” is. I ask my users – the way they classify things is what matters. Looking it up in a dictionary only helps so much. Asking a chef for the “correct” translation is problematic too, you want the category used and understood by users, not by a domain expert like a chef.

So we seem to have an example of a category in one language/culture that doesn’t really exist (or isn’t useful) in another language/culture. A similar example is “chowder”, an English category of soup that I honestly don’t know a Dutch equivalent for. Let me stress this: it is not just that I don’t know the translation of the word. I don’t know of the existence of the category in Dutch. I’ve never heard anyone mentioning anything like a “chowder” soup. And I like soup! A “clam chowder” soup for example, would a far as I know just be a “clam soup” (translated) in Dutch. No chowder involved. Through Google, I find this definition for chowder: “A thick American soup made of meat or fish and vegetables with spices. It is almost like a stew.” Note how they explain the term to English speakers from other cultures by mentioning it is “almost like a stew”.

A second issue with translating categories is something one might call “semantic overlap“. I didn’t invent the term, it seems to be well known when discussing language and words (although I am still searching for a definition). The only difference here is that I am talking about categories, not words.

Anyway, a category in one language might have an applicable translation, but that category often doesn’t mean completely the same thing. For example, the Spanish word for “house” is “casa”. But the meaning of the category “casa” might not be 100% identical to the category “house”. It is conceivable that, if you ask Spanish speakers to point out “casas” in a city, they’ll point to some building at some point that, as an English speaker, you would never classify in the English language category “house”. (I am not sure this is a valid example, my friends kick me when I start asking again “what kinds of X exist in Spanish” so I haven’t actually tested this. Better examples are welcome.)

In other words, categories in different languages often don’t mean 100% the same thing. And that missing overlap can create problems for our website. If we categorize products in one language within a category, and then translate that category, we can’t automatically assume that all the same products will be categorized under the same category.

I am still working out examples of this stuff. I’m not even sure I’m right with all these statements. The only way to get examples I’ve found is to ask native speakers, so it takes some time. Comments are appreciated!

A third problem with translating categories lies in the relationships between categories. Categories are often grouped in taxonomies, in trees (with varying structures). You click “soups” first, then you get subcategories like “vegetable soups” or “meat soups” or whatever.

I am not sure that you can always assume that every category in the taxonomy can be translated. Some languages might have less granularity in how they classify things in a certain domain. (I won’t mention 100 words for snow, don’t worry. I don’t think that is exactly what this is about). In other words, in English, you might have category A, subcategory AB and a subcategory of that, ABC. It is conceivable that in Spanish, there is no word for AB, just for A and ABC. I haven’t found an example yet though. Again, comments appreciated!

So this is all very interesting: culture-specific categories, semantic overlap of translations, translating relationships between categories. But is it practical? Have you encountered problems like this in practice? It’s not because it’s intellectually interesting that this path in our research will also turn out to be particularly practical.

A final note: translatability of categories seems to be closely related with the ambiguity of your taxonomy. In a taxonomy of countries (almost no ambiguity, although Tibetans might disagree), or a taxonomy of products, there is little ambiguity, and translation should be fairly straightforward. In a subject category that helps people find stuff, there might be a lot of ambiguity and translation might be harder. Ambigious taxonomies are also the ones that require the most research by the information architect, so you could say that, if you need a lot of research to develop a category, you’ll also need to work hard to translate it.

Comments and such are very welcome. Remember, this is our thinking in the very early stages. Also, the soup example I used is just an example. It may not even be correct. Here are some of the other examples I’ve been playing with over the last few days and thoughts. Access to native speakers is crucial with this work, and it’s hard work finding good examples, so if you can shoot down my examples please do. If you can provide better ones, that’s even more appreciated.

  • “Habitacion” in Spanish means, pretty much, “room”. But not entirely: if you ask a Spanish speaker to count the habitaciones in a house, they won’t count the living room. Problems with semantic overlap. There are other translations for “room” in Spanish, but I don’t think there exists an equivalent of “habitacion” in English, at least not one that’s as commonly used. A funny thing happened when I was asking native speakers about this, by the way. They wouldn’t hesitate in saying: “there are 2 habitaciones in this house”, but if I would press on (to get all the info), they’d start doubting and say: “Maybe I was mistaking.” They’re not. It’s like usability testing – the user is right.
  • “Vaso” is a decent translation for “cup”. But again, I think there are differences. I didn’t have a chance to explore them much though.
  • Does the basic-levelness of a category have something to do with its translatability? You would expect a basic level category to be universal.
  • I don’t think that, because the example we used is a category introduced by a company (or was it?), that it is invalid. But I’d like to find better examples.

36 thoughts on "Translating taxonomies and categories

  1. On the aifia members list, Alan Gilchrist gives an example of differences in granularity between German and English for the word skidding – in a car. Germans don’t have a word for skidding, but they do have two words, Rutschen and Schleudern, for skidding forwards and skidding sideways.

  2. HaHa-soup. Funny anecdote about soup in Latin America. I was on a trek. I reached a remote hostel and was feeling >sick>. I asked for, and ordered the quintessential American-get-well-dish, chicken soup. Maybe it was the chef’s specialty or maybe healthier hikers demand heartier meals, but the chicken soup I ordered came back like a swamp of chicken, long flat noodles, vegetables, and oily broth.

    The chef had no concept of simple soup. I asked for the “claro soupa”, and received only a blank stare. I had a power bar and went to bed. (>_<)

  3. Hi Peter,
    One of your Colombia PBHers here. I hafta say I don’t venture out of PBH, but I was bored and decided to venture to the rest of your site and started reading this. What a great post about the complications of X-lnaguage communication. It is so true, and I think it even exists *within* languages that are spoken in many parts. Ie. Quebec French and French French. Or for your chunky soup example – how widely known would chunky soup be to people in South Africa or New Zealand?

  5. I have not read the six articles thoroughly so I apologise if I need to be pointed out that I missed something or misinterpreted it. It is because you have made your point clear and are simply bloating the posts with needless information. This is not a debate, what you are stating is not denied and fairly obvious. This is the type of information that goes into a one line dot point in a broader topic about i18n.

    Keep the following in mind when developing i18n supporting applications:
    * Many words do not have a direct or suitable translation to other languages [maybe an example or two here]

    If anyone takes i18n seriously, the solution is obvious. I have coded a framework with i18n support and found that this problem is overcome by a simple method:

    Each resource (eg: article, post) can have categor(ies) in each language. While this may seem like a lot of output and work, it is simple when it comes to the application frontend. When a user accesses the site, they view it in the default language or choose another language. Then, whenever they browse the resources, they see it listed by categories for their language. So while US English may have a category for chowder with a recipe for clam chowder in it, Australian English will have the recipe filed under soup. When resources are added, categories are added for each language where the categories are not the same. Assuming in most cases, the category it is filed under is the same for another language (with the category name translated), it usually suffices and does not involve any work.

    As for folksonomy and i18n, that is a problem with the application developer’s laziness or the fact that they don’t care – being more realistic, they may not have time or not have noted the importance.

    Again, the problem fits into one or two lines and the solution is to have a word bank to get the base word from a stem and provide translations (not as simple as taxonomy here as the lack of direct/suitable translation problem here means that a user tagging a resource in English may not have that tag suitably translated to French, I am still working on a solution here other than human staff editing.

    Feel free to send an email to alert me to a post if you would like a reply. On a quick review of this post, I noticed my tone and I ask that you do not take this the wrong way.

