Joho the Blog: Berkman Lunch: National Health Information Infrastructure: “Alan Goldberg of Goulston & Storrs (and HealthLawyer) is giving a Tuesday lunchtime talk on the national health information infrastructure.

He says it’s a big deal: Medicare has 1 million providers who are involved in 1 billion claims per year. NHII crosses political boundaries; everyone from Bush to Hillary, from Ted to Newt, all support having an infrastructure that enables electronic record sharing. The NHII will require technologies, standards, systems, values, applications, and laws.”

Standards are the invisible structure that keeps society running, and the people creating the standards are often just as invisible. It’s all messy, political and hard work, just like information architecture.

A Look at “Guided Navigation” for Enterprise Search — Featured Product — CMS Watch: A good article on Endeca’s “guided navigation” (their marketing term for using a faceted classification system).

“Indeed, the success of multifaceted taxonomies in the commerce space has raised substantial expectations that similar clarity could be brought to bear on enterprise content repositories. It turns out, of course, that enterprise content is not so neatly structured.”

AIFIA | IA Progress Grants: “Two grants will be awarded in February, 2005. Applicants must be AIfIA members. Each grant is for US$1000, with $500 awarded upon project initiation and $500 awarded upon completion. […] Applications should propose work that has the potential to benefit information architecture practioners in a practical way. This includes, for example, original research, a new synthesis of important existing research, or development of an innovative new technique.”

Staying up to date with IA

Here is my list of resources to stay up to date if you are into information architecture. I haven’t included all the general UX lists, these are fairly specific for ia, metadata and search and such. Add your own to the comments and I’ll add them to this list.

I am especially trying to include not-well-known but useful resources.

Must have’s: don’t miss these 2:

Mailing lists for discussion:

  • Sigia-L: the original IA list, and still good, although some people unsibscribed because of occasional noise and nastyness. I still find it very valuable.
  • AIfIA Members: members only list similar to Sigia-L, but definitely worth the AIfIA subscription fee.
  • IA-CMS: low volume, focused on IA for CMS.
  • Faceted classification list. Active on and off.
  • Searchloggers: low volume but specialized.


Blogs and RSS feeds. I only included blogs that are almost only about IA, or the list would be endless. I didn’t include any of the numerous library blogs.

Other languages.

Want more? More mailing lists and newsletters on the IAWiki (most of the ones here are also there, but not all).

iaslash A-Z Indexes for Web Sites: Usage and Implementation

iaslash A-Z Indexes for Web Sites: Usage and Implementation. IASlash asks why IA’s don’t implement more A-Z indexes, and the answer seems to be that it is a specialized skill, and we’re not used to it.

Maybe the answer is really much simpler. A-Z indexes are particularly effective for known item searching (when you know what you’re looking for and what it’s called). But there is a technology that is much cheaper to implement and also very effective for known-item searching. It’s called a search engine.

For an example of an A-Z index on the web, check the BBC A-Z index. Notice the value it ads by having human editors choose terms (like “Accidents”) and grouping BBC sites and pages underneath those (like “First Aid”). This way they help users with the paraphrase problem. It’s definite added value, but similar added value can be used in a search engine by proposing additional search terms to users and using best bets.

Reducing the Cost of Translation through Reuse. A decent talk by Ann Rockly. You can just sign up (use any email address, no confirmation required) and watch the talk. There is some marketing stuff at the beginning and the end of the talk, you can skip that by jumping to “Speaker 2” in the index dropdown. (Thanks to Liv.)

Ann talks about the translation lifecycle and cost, and her approach to a unified content strategy. The approach of using reusable “content objects” (small bits of content, like a paragraph, that only have to be created (and translated!) once) sounds like a good fit for fairly structured content (press releases, product info), but less of a good fit for fairly unstructured content. Any experiences with an approach like this?

Searching for a cheap VoIP telephone service

I am looking to get a phone number. I spend time in Belgium, New York, and sometimes in other countries in Europe, so a VoIP number I can take with me would be good. I also want to be able to receive faxes. I already have Skype, so I can use that for outgoing calls, if needed, so I don’t need unlimited outgoing calls. I mainly need incoming cals. It should of course have voicemail. Finally, I’m working for myself now, and it should be affordable.

  1. BroadVoice
    20$, unlimited free calling to Belgium, US, Canada and a bunch of other countries. Or, for $9.95 unlimited calls within NY state only.

  2. Lingo is similar: $19.95 for unlimited calls with anyone in Canada, the US and western Europe. The basic plan is $15 and gives you 500 worldwide minutes.
  3. Sunrocket: for 25$ you get unlimited US calls and free equipment.
  4. Vonage is $24 for the unlimited plan (US and Canada only), $15 for the basic plan (500 minutes in US or Canada). It doesn’t include Western Europe, so for me, that’s bad.
  5. A regular landline in NYC. How much would that be?
  6. Get an extra line on my girlfriend’s cellhpone. 10$

Can I get fax service with any of these? Experiences?

Note that charges are always more than you expect in the US. Broadvoice has a good table explaining other charges. The Broadvoice service ends up being $312 a year, or 26$ a month, not $20.

If you want a Gmail invite (I have 3), add a comment. Invites go to the first three commenters. Don’t forget to leave your email address.

Some stats – it’s getting to be the end of the year after all. Russel Beatie started it.

On the domain, which serves both my Colombia site and this weblog (and some other, less popular stuff like my new India site), during the last month, I got an average of 3800 visits a day, viewing 20,000 pages (and an average of 3 hits per page). I served 15 Gigs over the month, half a gig a day. My host has limitations (no mod_rewrite on subdomains) ,but for 5 bucks a month, who am I to complain? And service is decent. The Google ads on the Colombia site make me about 200$ a month, so that’s pretty nice.

There were 57056 404 pages (I should do something about that). My RSS feeds gets 25,000 hits a month, about 900 a day. The top 3 entry pages are my RSS feed, my Colombia homepage and the homepage of this blog.

Here’s a view on how my traffic has been evolving in 2004. There’s a strange peak in hits this month that isn’t reflected in pageviews or visitors, so that’s probably some redesign I did that I already forgot about, adding more css files or pictures or something.

About 30,000 people come from Google searches, 1500 click through from Bloglines a month.

Interesting search queries: sorry everybody (my site is the 10th result on Google). I pointed to the sorry everybody website. 278 people came through that Google result page. How to make a documentary is still going strong – I’m the second result with a post from january. I almost feel a responsibility there, so I decided to add some navigation within that post to help people find more related stuff.

WorldChanging: Another World Is Here: Wireless Cities: “If cities evolve, what will shape their evolution over the next few decades? Salon has an interesting article today about the use of wireless technologies as the drivers for urban change.”

Good question. Cities are slow changers though. European cities centers are still shaped around the same streets they were 500 years ago.

Since I am starting a new discussion site, I have to seed it. There should always be some new content, to pull people in. Until there is enough conversation going on (this might take 6 months), I have to put a lot of writing work in it. This is one reason to only host community sites about topics you’re passionate about. Drupal luckily lets you schedule future posts, so I just uploaded a few hundred pictures to my new India community site, and most of them are scheduled to slowly start appearing over the next 6 months. Yey for Drupal.

Day 5: training

  1. Six days with the Akshaya project: day 1: overview
  2. Day 2: technology
  3. Day 3: entrepreneurs
  4. Day 4: promotion
  5. Day 5: training
  6. Day 6: conclusions

Once people were convinced the training could be worthwhile, it took a few months for each entrepreneur to train one person in each of the families they were responsible for. Each center trained between 1000 and 3000 people. Most centers used two or three trainers to get the job done, and many centers stayed open late to accomodate working people – rishaw drivers, fishermen, farmers, businessmen, housewives.
Continue reading

Translating taxonomies and categories

So as I mentioned, Livia, Jorge and me have been looking into international information architecture. What happens when you run a site in multiple languages/locales and need to manage the information architecture of that site? Can you just translate a taxonomy from one language to another? We are gathering a lot of material, and we’ll start sharing that and opening up the conversation. Me, I plan to write a series of blog posts on international or global IA, of which this is the first.

  1. Translating taxonomies and categories
  2. Translating categories, translating terms
  3. Translating the Dewey Decimal Classification system
  4. Designing the relationship between content and locales
  5. Emergent i18n effects in folksonomies
  6. The Maori versus Dewey, and why limiting access can be culturally appropriate.

Let me move this along – I’d like to talk about translatability of categories a bit today. There is much more to come, later. This is kind of a long post, and rambling, too, so bear with me.

Say I run a website with recipies. We have an extensive soup section. Now, the word “soup” isn’t just a word, it is also a category, in which you can classify many particular soups, for example, the soup you ate today (a specific instance of a soup). “Tomato soup” is also a category, on our site it’s a subcategory of “soups”. So you can see there is a difference between a word and a category. A category might be “everything about this company”, in which we can put all the information about a company, and it has a label “About us”, consisting of two words.

Information architects like to group things together to make them easy to find (into categories). We’ve done research with our users and it turns out many of them would like, on a recipy site, to have a look at various chunky soups. Chunky soups are soups with big bits floating in them. Some people like chunks in their soup – I certainly do, most of the time.

it's chunky and they like it In the US, chunky soups is a well known category (ask anyone what it is, yey). So we’d like a link on our site saying “chunky soups”, under which you can then find various types and examples of those soups.

First, we have a labeling problem in English: “chunky soups” is actually a trademark by Campbell (a big soup maker). You’re legally not allowed to use it. But we strike a deal with them and they let us use it.

So “chunky soups” is a category, and even better, a category (and a label) that our users understand. Some particular soups will be part of the category, others won’t. We start using it on our site, our users find what they need. Peachy.

Then we start developing a Spanish version of our site. And a French one, at the same time. We want to get into these markets.

Can we just translate the category “chunky soups”, and if so, can we use the same soups within that category? Is the category even relevant to Spanish users? I asked some Colombian Spanish friends (a wholly unscientific survey) how they classify soups. They said, a soup is either a “sopa”, a “caldo” or a “crema”. Chunky soup as a category doesn’t seem relevant for this user group.

Let’s be clear: it’s not that chunky soups don’t exist in Colombia. Colombians have chunks in their soups, big ones, I’ve tried them. It’s that the category “chunky soups” just isn’t used in daily life, isn’t relevant. It, in practice, doesn’t exist. I don’t think our Spanish-speaking users (if my little survey extends to all Spanish speakers) will look for chunky soups on our website.

Let me be even more clear. I am not talking about dictionary definitions here. I am not trying to find out what the “real” meaning of chunky soup is, or what the real meaning of a “caldo” is. I ask my users – the way they classify things is what matters. Looking it up in a dictionary only helps so much. Asking a chef for the “correct” translation is problematic too, you want the category used and understood by users, not by a domain expert like a chef.

So we seem to have an example of a category in one language/culture that doesn’t really exist (or isn’t useful) in another language/culture. A similar example is “chowder”, an English category of soup that I honestly don’t know a Dutch equivalent for. Let me stress this: it is not just that I don’t know the translation of the word. I don’t know of the existence of the category in Dutch. I’ve never heard anyone mentioning anything like a “chowder” soup. And I like soup! A “clam chowder” soup for example, would a far as I know just be a “clam soup” (translated) in Dutch. No chowder involved. Through Google, I find this definition for chowder: “A thick American soup made of meat or fish and vegetables with spices. It is almost like a stew.” Note how they explain the term to English speakers from other cultures by mentioning it is “almost like a stew”.

A second issue with translating categories is something one might call “semantic overlap“. I didn’t invent the term, it seems to be well known when discussing language and words (although I am still searching for a definition). The only difference here is that I am talking about categories, not words.

Anyway, a category in one language might have an applicable translation, but that category often doesn’t mean completely the same thing. For example, the Spanish word for “house” is “casa”. But the meaning of the category “casa” might not be 100% identical to the category “house”. It is conceivable that, if you ask Spanish speakers to point out “casas” in a city, they’ll point to some building at some point that, as an English speaker, you would never classify in the English language category “house”. (I am not sure this is a valid example, my friends kick me when I start asking again “what kinds of X exist in Spanish” so I haven’t actually tested this. Better examples are welcome.)

In other words, categories in different languages often don’t mean 100% the same thing. And that missing overlap can create problems for our website. If we categorize products in one language within a category, and then translate that category, we can’t automatically assume that all the same products will be categorized under the same category.

I am still working out examples of this stuff. I’m not even sure I’m right with all these statements. The only way to get examples I’ve found is to ask native speakers, so it takes some time. Comments are appreciated!

A third problem with translating categories lies in the relationships between categories. Categories are often grouped in taxonomies, in trees (with varying structures). You click “soups” first, then you get subcategories like “vegetable soups” or “meat soups” or whatever.

I am not sure that you can always assume that every category in the taxonomy can be translated. Some languages might have less granularity in how they classify things in a certain domain. (I won’t mention 100 words for snow, don’t worry. I don’t think that is exactly what this is about). In other words, in English, you might have category A, subcategory AB and a subcategory of that, ABC. It is conceivable that in Spanish, there is no word for AB, just for A and ABC. I haven’t found an example yet though. Again, comments appreciated!

So this is all very interesting: culture-specific categories, semantic overlap of translations, translating relationships between categories. But is it practical? Have you encountered problems like this in practice? It’s not because it’s intellectually interesting that this path in our research will also turn out to be particularly practical.

A final note: translatability of categories seems to be closely related with the ambiguity of your taxonomy. In a taxonomy of countries (almost no ambiguity, although Tibetans might disagree), or a taxonomy of products, there is little ambiguity, and translation should be fairly straightforward. In a subject category that helps people find stuff, there might be a lot of ambiguity and translation might be harder. Ambigious taxonomies are also the ones that require the most research by the information architect, so you could say that, if you need a lot of research to develop a category, you’ll also need to work hard to translate it.

Comments and such are very welcome. Remember, this is our thinking in the very early stages. Also, the soup example I used is just an example. It may not even be correct. Here are some of the other examples I’ve been playing with over the last few days and thoughts. Access to native speakers is crucial with this work, and it’s hard work finding good examples, so if you can shoot down my examples please do. If you can provide better ones, that’s even more appreciated.

  • “Habitacion” in Spanish means, pretty much, “room”. But not entirely: if you ask a Spanish speaker to count the habitaciones in a house, they won’t count the living room. Problems with semantic overlap. There are other translations for “room” in Spanish, but I don’t think there exists an equivalent of “habitacion” in English, at least not one that’s as commonly used. A funny thing happened when I was asking native speakers about this, by the way. They wouldn’t hesitate in saying: “there are 2 habitaciones in this house”, but if I would press on (to get all the info), they’d start doubting and say: “Maybe I was mistaking.” They’re not. It’s like usability testing – the user is right.
  • “Vaso” is a decent translation for “cup”. But again, I think there are differences. I didn’t have a chance to explore them much though.
  • Does the basic-levelness of a category have something to do with its translatability? You would expect a basic level category to be universal.
  • I don’t think that, because the example we used is a category introduced by a company (or was it?), that it is invalid. But I’d like to find better examples.

I just spent 10 minutes deleting literally over a 1000 spam comments on my site. Luckily WordPress makes it fairly easy to catch them and delete them, and I haven’t even installed any special anti-spam plugins.

One thing that could help is a simple algorythm that says that:

IF many similar posts are posted in a short period of time (similar defined as: one of the post fields has identical text, or the post contains the same url), then it should flag them all and let me delete them all with one click. Now I have to scroll through pages of spam posts and delete them all.

A characteristic of spam is that it’s automated, and they try to spam a lot of your older posts at the same time, whereas a normal commenter never starts commenting on dozens of old posts at the same time.

But kudos to WordPress, it works pretty well out of the box.

What is Word sense disambiguation?: “In computational linguistics word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence. For example, consider the word “bass”, two distinct senses of which are:

1. a type of fish
2. tones of low frequency

and the sentences “The bass part of the song is very moving” and “I went fishing for some sea bass”. To a human it is obvious the first sentence is using the word “bass” in sense 2 above, and in the second sentence it is being used in sense 1. But although this seems obvious to a human, developing algorithms to replicate this human ability is a difficult task.”

What does word-sense disambiguation have to do with international information architecture?

Me, Livia and Jorge are doing research on international information architecture. I will be posting more about our research efforts here during the following weeks and months.

SimpleBits | CSS Bug of the Day: “You know those desk calendars where you tear off a page for each day of the year? Typically, each day comes with a little nugget of useless info to start your day. Someone should create one based on CSS bugs, where each day talks about a different bug and its workaround.”

They sure should. CSS bugs is what stops real widespread adoption. I would design in all CSS if I didn’t have to test things so much (there’s always some browser in which it breaks). So then I go back to tables.

Internet Archive: “Freecache is shelved for now. We have not installed the right redirects, but yes, it is being shelved. We did not have a good proposition or mature enough tool to attract other freecache sites. The Coral system, using the planetlab system, does have a critical mass. I talked with the leader of that project and hopefully they will be supporting large files better.”

Too bad, Freecache was a promising experiment. I hope they try again.

Yahoo! Groups: podcasting in the early stages: “I have some very interesting statistics and am finding out that their are a few people out their that are either using broken software or their software is not smart enough to pull the podcast only once. I am on the verge of banning about a dozen IP’s as they have all pulled over 500 megs each as the the two shows combined are only 30 megs.”

Not sure how long this has been going on, but Bloglines shows, at the top of the screen, how many people within their system are subscribed to a feed. From now on, I’m going to make an effort to send people to some of the less popular blogs I’m subscribed to. Be nice to the long tail.