I’ve been wondering about XML and highlighting text. If I have some text, and I highlight a piece and tag it with some subject, I can easily express that in XML.
But what if then I highlight another, partly overlapping part of the sentence and want to tag that with another tag? How am I supposed to identify that with XML? I don’t want to do what HTML does with overlapping Italics Bold tags, because I want to remember what I highlighted as 1 section, not cut it up in different sections that are tagged the same way. (Is this making sense).
So what is the solution to this? Is there any? I think the fundamental problem is that, at a data modeling level, XML lets you build trees very easily, but not laticces (overlapping structures).
Help.
So two ideas here, neither of which are clean:
1) Serialize the text: <bold>Peter has</bold> <bold + italics>a tough problem</bold + italics> <bold>here</bold>
2) Use non-closing tags, like the IMG tag in HTML: <bold char_length=”29″>Peter has <italics char_lenth=”15″>a tough problem here.
Frankly, XML is not meant for this stuff, so I think anything is going to be a hack.
Thanks to Jan (http://www.hikingviking.org) for the serialization idea.
Here’s another approach:
http://www.livejournal.com/users/urbansheep/1059435.html
This is one of the classic problems of embedded markup. Ted Nelson “solved” the problem for collaborative commentary by indexing into a raw text stream, which remained unchanged. Others, like Mike Vulpe (developer of the i4i S4 SGML/XML markup engine) have also sought to dissociate markup from the content.
But, as others have noted here, embedded XML markup is fundamentally unfriendly to your requirements.
We do need a usable way of referring to arbitrary, often overlapping spans of text — if only in order to associate indexing concepts (topics) with those arbitrary spans.
I think my student, Huan Gao, did this using the Multivalent Document Model from UCB. But I’m at a conference and she isn’t. Also since she completed her degree a couple of weeks ago she may be a bit hard for me to find right away. Write to me for more details if you like.
I think my student, Huan Gao, did this using the Multivalent Document Model from UCB. But I’m at a conference and she isn’t. Also since she completed her degree a couple of weeks ago she may be a bit hard for me to find right away. Write to me for more details if you like.