ITworld.com – Filing system taxonomy blues: “All those pieces of paper. They all need to be filed somewhere. But where? Is this piece of paper best filed under ‘insurance’, or ‘house’ or ‘Acme Insurances Inc.’ or ‘bills’? So many options, so many incorrect taxonomies to choose from. So much desire to find the one that is perfect. I should know better of course. There is no perfect taxonomy for a filing system.” (via Jon Udell)

SiteLines – Ideas About Web Searching: May 2005 Archives: “In a March 29 2005 paper, Ira Machefsky and John Fernandez of search engine Accoona, clue us in to some key differences between searchers in China and the US. The authors compared search terms used in the US and Chinese versions of Accoona. Whereas US searchers focus on news, gossip, and entertainment, Chinese searches show a strong focus on business information, particularly manufacturing.”

I’ve been interested in finding out differences in search habits in different cultures. Any other pointers?

The RSS wars

Here’s what might happen: Apple, Google, Yahoo et al will start expanding RSS (with namespaces) or creating their own. Their tools will use these namespaces, so developers will start supporting them as well (to get their stuff into iTunes and such). So brace for the RSS wars.

Anthropology at work: “Brenda runs through the mundane minutiae of her daily life: how her three sons like her to iron their T-shirts and tracksuits, but hate it when she gets them mixed up – so she has created a labelling system to tell the identical, perfectly pressed T-shirts apart.”

The Application of Weblike Design to Data: Designing Data for Reuse: “Every episode uniquely identifiable and addressable, forever!”

Simple yet long term useful metadata is the best metadata. I am often surprised by how ambitious some people are when trying to do enterprise-wide metadata: “We’ll collect dozens of fields of metadata for each piece of content!” Yeah right. The BBC didn’t make that mistake: simplicity first, yet great ambition at the same time. Brilliant.

Drupal is a great CMS but it was always kinda hard to follow what was going on with it. So finally, a Drupal newsletter! With an unfortunate name though. Drupal Drops

I use Bloglines, and I read about 300 feeds. I wish it would make it easier to remove feeds from my list, since there are quite a lot I don’t follow anymore.

I find myself writing series on this blog fairly often: related articles that can be read together. The problem is that blogging software doesn’t provide an easy way to create navigation to make sure the individual posts of a series hang together. I have been using numbered bulleted lists of links at the beginning of a post, as in my Akshaya project series, or the series about global IA.

I wish WordPress (my blog software) would provide some way of indicating a series and then automatically generate navigation for them. In general, I wish blogging software would let me bring much more structure in my blog entries, when needed.

Apart from the fact that it doesn’t work on Firefox, yet, Yahoo’s new PhotoMailis brilliant. Let the email wars begin! I think Yahoo might seduce me back to their service, after years of neglect.

By the way, if you hadn’t noticed, Yahoo is totally the new hip company with cool new products. They’ve been hiring IA’s like crazy, too.

If you are moving in NYC and need someone to help you carry that heavy sofa and stuff upstairs/downstairs, Greg at 917 257 23 17 is experienced, very helpful and has super reasonable rates. He helped me move and I can heartily recommend him.

The Dewey Decimal people are trying to figure out how to classify graphic novels (ie. comic books). Discussion here. The kinds of questions they’re trying to answer are:

– should graphic novels go in the 700s (arts) or the 800s (literature)? (Answer: like comic books, 700s)

– Should they be lumped together with comic books? (Answer: yes – “separating graphic novels would be difficult for classifiers to do consistently”)

– How do you distinguish between comics, graphic novels, …? Answer: we can’t. (“We have tentatively decided to treat everything from single-frame caricatures to three-frame newspaper comic strips to comic books to graphic novels all in the same way. Although this is a broad range of material, we have found no good places to break the continuum so as to separate the material usefully into different categories.”)

– How do we subdivide? Answer: by country of writer/artist. This makes sense, because in the world of comics, styles kind of follow geographical boundaries (the Belgian/French school, the US school, the Japanese school). How long this will last I’m not sure about. Meanwhile, the Dewey editors are considering subarranging by country of original publication, rather than country of artist or writer. The rationale is that artists and writers of different nationalities may collaborate on the same work, and a single artist or writer may contribute to works originally published in different countries, but the artists and writers will aim for the style of the country in which the work is to be published. Mmm…

Meanwhile, on the international front, Discussions are underway on a new Arabic translation of DDC 22. Right now, they are considering an additional optional arrangement in the famously biased 200 Religion schedule. This proposal will be discussed at the ALA Annual Conference in June 2005, and at the IFLA Conference in August 2005.

Enterprise search still a technology conversation

In short, best bets (where an editor can select the top results for certain search queries) is seen by many information professionals as about the cheapest and best way to improve your search engine, but the enterprise search industry doesn’t have much of a clue. Many enterprise search products don’t explicitly support this. More generalized, most companies seem to think of search as a technology problem, whereas most of the consultants and experts understand the importance of adding people to the mix.

In 2003, I started an article at Onlamp like this: “A useful search engine is more than a search algorithm. This article explains how to create a search query analysis tool, a best bets feature, and a basic controlled vocabulary.”

The idea was to write for the techies who are building the tools about what we, information architects, think are the things missing from most search engines. Onlamp is O’Reilly’s publication for open source hackers, and I was on a mission to spread the word about IA (also to other groups, like designers). My point was: there are easy things you can add to your search engine that let humans add value to it, like best bets, or a search log analysis tool. It’s not rocket science – if I could write a techie how-to article, the search vendors should be able to figure this out.

Last week, at the 2005 Enterprise search summit, I did a little unscientific survey with the vendors about best bets. I asked them if they had such a functionality in their product (I had to explain it to most), and what they called it. The results were in line with my overall impression of enterprise search. Most of the products work like this:

  1. Spider content and rank
  2. Auto-generate and auto-populate taxonomies to add value to search

Notice the absence of humans in that process.

The control panels of the products tend to contain a section with sysadmin-like functionality, and some analytics (most allow you to see what search queries people have been using). Most of them assume that the person using it has been trained to use this tool. There is surprisingly little functionality aimed at the person whose job it might be to tune the engine with best bets and such. The people I spoke with who actually do that job, use things like Perl scripts or open source software to analyse search queries. (For example, I was told Googlebox doesn’t handle logging multilingual search queries (it searches fine), so one person used Webalizer instead.)

When I asked the best bets question (“does your product do best bets, defined as …”), even after explaining the functionality, I got surprisingly many blank stares. Best whats? Why would you want to do that?

Some products have best bets, but the closest a lot of them could come was to say you could create rules to improve the result of certain documents. That’s like saying, sure, you can do HTML with Word. In theory perhaps, but it’s not really useful.

Here is an incomplete list of products that do best bets, and what they call it. This is an unscientific and uncomplete survey, which may have mistakes in it. Don’t use it to judge a particular product, use it to get a sense of the field.

  • Autonomy: you can kinda do them through rules.
  • BA-insight: no best bets Yes, through SharePoint.
  • FAST: yes (although I have doubts here).
  • IBM: yes, they’re called Quick Links.
  • ISYS: not really.
  • Mondosoft: yes, they’re called Top Hits.
  • Open Text: it’s coming up in their next release.
  • SER Solutions: no.
  • Verity: yes, calls them Sponsored Links.
  • Vivisimo: yes, kind of.

I didn’t have time to ask the other vendors – feel free to add in the comments.

By the way, to work well with users, best bets should appear in-line with the other search results, not separate from them. If I was to do a more complete survey, I’d add that in as a criteria, together with an easy to use admin interface, CV functionality and an easy to use search analysis tool that includes analysis of suddenly popular queries.

This is probably the event that we’ll look back at when we think “when did Google loose it’s sexyness again?”

Yahoo is the new darling of the technorati (the people, not the company). The circle is round.

I just got to Belgium (for 2 weeks, with a workshop in Edinburgh thrown in). Long flight, crazy jetlag. And lots of paperwork to get done.

The Garamond Agency: “The Garamond Agency represents authors of non-fiction exclusively. Our clients are academics, scholars, journalists, business people, and writers whose books make important ideas accessible to a wide audience of readers.”

Linux radio show – LugRadio: the first Lugradio podcast has an audio review of my book. I’m posting this before hearing it, so I don’t know if it’s positive or negative. Whatever, I got the Japanese translation (just out) the other day, so I’m happy. Anything in Japanese is cool.