XML, trees and laticces.

I’ve been wondering about XML and highlighting text. If I have some text, and I highlight a piece and tag it with some subject, I can easily express that in XML.

But what if then I highlight another, partly overlapping part of the sentence and want to tag that with another tag? How am I supposed to identify that with XML? I don’t want to do what HTML does with overlapping Italics Bold tags, because I want to remember what I highlighted as 1 section, not cut it up in different sections that are tagged the same way. (Is this making sense).

So what is the solution to this? Is there any? I think the fundamental problem is that, at a data modeling level, XML lets you build trees very easily, but not laticces (overlapping structures).

Help.

0 thoughts on “XML, trees and laticces.

  1. So two ideas here, neither of which are clean:
    1) Serialize the text: <bold>Peter has</bold> <bold + italics>a tough problem</bold + italics> <bold>here</bold>
    2) Use non-closing tags, like the IMG tag in HTML: <bold char_length=”29″>Peter has <italics char_lenth=”15″>a tough problem here.

    Frankly, XML is not meant for this stuff, so I think anything is going to be a hack.

    Thanks to Jan (http://www.hikingviking.org) for the serialization idea.

  2. This is one of the classic problems of embedded markup. Ted Nelson “solved” the problem for collaborative commentary by indexing into a raw text stream, which remained unchanged. Others, like Mike Vulpe (developer of the i4i S4 SGML/XML markup engine) have also sought to dissociate markup from the content.

    But, as others have noted here, embedded XML markup is fundamentally unfriendly to your requirements.

    We do need a usable way of referring to arbitrary, often overlapping spans of text — if only in order to associate indexing concepts (topics) with those arbitrary spans.

  3. I think my student, Huan Gao, did this using the Multivalent Document Model from UCB. But I’m at a conference and she isn’t. Also since she completed her degree a couple of weeks ago she may be a bit hard for me to find right away. Write to me for more details if you like.

  4. I think my student, Huan Gao, did this using the Multivalent Document Model from UCB. But I’m at a conference and she isn’t. Also since she completed her degree a couple of weeks ago she may be a bit hard for me to find right away. Write to me for more details if you like.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s