Metacrap: there are indeed real

Metacrap: there are indeed real and often unrecognized reasons why metadata hasn’t taken off.

People are lazy and stupid are bad analysis (is that a plural?) of these reasons: people aren’t lazy, they are just not interested in tagging metadata to stuff if they don’t see an advantage to it: what’s in it for the authors?.

People lie is an obvious one. Know thyself is where it gets good, another good one that isn’t understood by many people starting out in the metadata world is that schemas aren’t neutral. The conclusion (implicit metadata, like Google’s analysis of links, is more useful) surely has some truth to it. Interesting. Maybe the whole idea of having people tag their stuff with metadata is deeply flawed?

0 thoughts on “Metacrap: there are indeed real

  1. Hi Peter,
    I recently posted my own response to the ‘metacrap’ argument to xml-dev, in a thread talking about RDF, though I note now that the specific section of the mail (pasted below) doesn’t mention RDF at all. (you’ll have to substitute “blog comment” for “email” etc for it to make sense here ;-)

    The “metacrap” argument is a complete red herring, because it makes the
    assumption that the creation of metadata must involve extra effort on the
    part of the system/application users. In most circumstances there is stacks
    of metadata on hand, and I don’t have far to look for an example. Nearby
    there’s all the mail header date/time & routing material etc, there’s a
    thread in the archives. Without any extra effort on my part, there’s a sig
    with the addresses of my web space and some of the material I’m working on.
    Linked to that there is biographical information about me. Your address is
    here too, which may or may not be used to get to biographical information
    about yourself, but it does describe a communication channel to you through
    which more information could be obtained.
    Ok, so the mail client I’m using doesn’t take advantage of all this, and
    that to recall any of this pile of information post-mortem scraping is
    needed. This client (Outlook) is smart enough that if I didn’t have defenses
    then a bit of kiddy code from a third party could make it spam everyone in
    my address book. Why shouldn’t the client use this information in a
    consistent, secure and useful fashion? C’mon mail dude, the spell checker
    knows this is in UK English, why don’t you?
    Hopefully Chandler [an RDF-based mail client project] will.

    Full message archived at:

  2. Hi Danny,
    I disagree. The metadata already available is generally unambigious metadata like date-of-creation, author and such. That is the *least* interesting type of metadata.

    The really interesting stuff happens when you classify things as being about a certain topic, or of a certain quality and such. The more interesting metadata happens to be mostly metadata that can only be created by humans. Software can do some of the work (clustering, collaborative filtering), but is still lightyears away from the value that human indexing, however ambigious, can add.

  3. It’s silly to make a judgement about metadata as a whole based just on search engines that attempt to index the entire web and the disparate quality of metadata found there. The real power is when you control both the search engine and the documents it’s indexing.

    And whoever said good metadata = neutral metadata? It’s not possible and your users views aren’t neutral either.

  4. Hi Peter,
    Actually I probably agree with you – the more human side of metadata can be a lot more interesting. But the metacrap argument seemed largely targetted at the more mundane side. If you did want to gather e.g. personal categorization data then some extra effort would be required on the part of the author. But even something like saving a file to a folder called /shopping/vegetables provides information that could be collected automatically.

    btw, Let’s not that there is whole lot more you can do with metadata in addition to categorization, indexing & searching of documents, and there are a whole lot of things that metadata can talk about outside of the web.

  5. Much of google’s automatically generated meta data relies on a complete index of the web, or at least a complete index of relevant documents. This is only possible in special cases, or if you are google. Other google meta data relies on analysis of the page itself, which is much easier for the rest of us.

    But self supplied meta data and auto-generated meta data are not the only two kinds. Annotations are an example of third party meta-data, as are weblog postings (and mix the two:

    I submit that if you combine third party meta data with a mechanism for assigning trust to those third parties, you can create a source with value. Trust could come from the fact that you know the person, or trust could come from FOAF info… The value does come from the bias, but bias without trust is not nearly as valuable.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s