Recommend An XML Clipboard for Semi-Structured Data (Email)

This action will generate an email recommending this article to the recipient of your choice. Note that your email address and your recipient's email address are not logged by this system.

EmailEmail Article Link

The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.

Article Excerpt:
The problem: Data is still frequently transmitted as semi-structured ASCII text, but applications require structured information. For example, I often want to input an address into an electronic address book based on text from an email or web page, esp. a signature. (Yes, there's vCard, but I think consistent usage is still rare, and using it requires extra steps.) Another example is inputting financial transactions into a finance program from an on-line banking page - I now do much of my bill paying on-line using a web interface from my bank, and I hate having to laboriously (and erroneously) copy each section of the transaction's output (HTML) into GNUCash.

The solution I'm playing with has two parts: a) a general-purpose data recognizer clipboard that converts semi-structured text to XML, and b) support by applications for recognizing XML on the clipboard. The thought is that by converting to XML (i.e., semi-structured data that has been marked up), we've done the hard low-level data recognition work, leaving it to the applications to 'take what they can use'.

Usage scenario: consider copying the following email signature (from this example) to the clipboard:

Jim Frazier, President
The Gadwall Group, Ltd. - IT and Ebusiness Strategies
Batavia, Illinois 630-406-5861 jfrazier@gadwall.com
http://www.gadwall.com http://www.cynicalcio.com
Seminars and Training - Consulting - Publications

(I don't know him - it's just the first public signature I found.) It's easy to envision a straight-forward regular expression-based tool that pulls out the following:

<clip>
<name>
<first>Jim</first>
<last>Frazier</last>
<title>President</title>
</name>
<location>
<town>Batavia</town>
<state>Batavia</state>
<country>Batavia</country>
</location>
<phone>630-406-5861</phone>
<email>jfrazier@gadwall.com</email>
<url>http://www.cynicalcio.com</url>
</clip>

Here's what the text copied from a bank's on-line statement might look like:

04/27/05 | Checking | Check 2067 | 2067 | $-45.00

Where the columns are: Date, Account, Description, Check #, and Amount. In this case the date might be:

<clip>
<date>
<month>04</month>
<day>27</day>
<year>2005</year>
</date>
<text>Checking</text>
<text>Check 2067</text>
<number type="integer">2067</number>
<currency unit="dollars">-45.00</currency>
</clip>

You get the point - basically it's just a set of lower-level recognized data. It's up to the application to put the pieces together in a more specific and meaningful way.

Thoughts:
  • It would be great to program custom rules using a nice scripting language, such as in JavaScript (used to program Konfabulator widgets).
  • As a work-around to requiring applications to recognize XML, we might try the XML clipboard plus a mouse/keyboard macro playing program (e.g., QuicKeys).
  • There would need to be a DTD standard for this. (I'm sure one exists somewhere.)
  • Hasn't this been done (partly) by Apple's old Data Recognizers idea? A few references here and here.


Article Link:
Your Name:
Your Email:
Recipient Email:
Message: