Data cleanup, and allowing for ambiguity.

In my recent talk about IA, I also mentioned the cost of data entry, and the advantages of keeping it simple for users (or I may have skipped that slide).

Related story about LinkedIn:

“The secret ingredient that most surprised me about their pipeline was Peter’s use of Mechanical Turk. A lot of their headaches come from the fact that users are allowed to enter free-form text into all the fields, so figuring out that strings like "I.B.M.", "IBM" and "IBM UK" are all the same company can be a real challenge. You can get a long way with clever algorithms, but Peter told me what when these sort of recognition problems get too hairy, he reaches for his ‘algorithmic Swiss Army knife’: the human brain-power of thousands of Turks.”

Nice example.

Related to the ease of data entry, by the way, is the fact that open text fields allow for ambiguity, a point I also tried to make in my talk.

When you ask users to classify themselves, it’s good to allow for ambiguity, because you’ll never come up with categories that everyone feels comfortable with. (Even with binary things like man/woman it’s good to leave some ambiguity, like “Other”, or “Would rather not share”.) Another good example of allowing for ambiguity from Google Health:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s