The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.
The problem: Data is still frequently transmitted as semi-structured ASCII text, but applications require structured information. For example, I often want to input an address into an electronic address book based on text from an email or web page, esp. a signature. (Yes, there's
vCard, but I think consistent usage is still rare, and using it requires extra steps.) Another example is inputting financial transactions into a finance program from an on-line banking page - I now do much of my bill paying on-line using a web interface from my bank, and I hate having to laboriously (and erroneously) copy each section of the transaction's output (HTML) into
GNUCash.
The solution I'm playing with has two parts: a) a general-purpose data recognizer clipboard that converts semi-structured text to XML, and b) support by applications for recognizing XML on the clipboard. The thought is that by converting to XML (i.e., semi-structured data that has been marked up), we've done the hard low-level data recognition work, leaving it to the applications to 'take what they can use'.
Usage scenario: consider copying the following email signature (from
this example) to the clipboard:
Jim Frazier, President
The Gadwall Group, Ltd. - IT and Ebusiness Strategies
Batavia, Illinois 630-406-5861 jfrazier@gadwall.com
http://www.gadwall.com http://www.cynicalcio.com
Seminars and Training - Consulting - Publications
(I don't know him - it's just the first public signature I found.) It's easy to envision a straight-forward regular expression-based tool that pulls out the following:
<clip>
<name>
<first>Jim</first>
<last>Frazier</last>
<title>President</title>
</name>
<location>
<town>Batavia</town>
<state>Batavia</state>
<country>Batavia</country>
</location>
<phone>630-406-5861</phone>
<email>jfrazier@gadwall.com</email>
<url>http://www.cynicalcio.com</url>
</clip>
Here's what the text copied from a bank's on-line statement might look like:
04/27/05 | Checking | Check 2067 | 2067 | $-45.00
Where the columns are: Date, Account, Description, Check #, and Amount. In this case the date might be:
<clip>
<date>
<month>04</month>
<day>27</day>
<year>2005</year>
</date>
<text>Checking</text>
<text>Check 2067</text>
<number type="integer">2067</number>
<currency unit="dollars">-45.00</currency>
</clip>
You get the point - basically it's just a set of lower-level recognized data. It's up to the application to put the pieces together in a more specific and meaningful way.
Thoughts:
- It would be great to program custom rules using a nice scripting language, such as in JavaScript (used to program Konfabulator widgets).
- As a work-around to requiring applications to recognize XML, we might try the XML clipboard plus a mouse/keyboard macro playing program (e.g., QuicKeys).
- There would need to be a DTD standard for this. (I'm sure one exists somewhere.)
- Hasn't this been done (partly) by Apple's old Data Recognizers idea? A few references here and here.