TruerWords Logo
Google
 
Web www.truerwords.net

Search TruerWords

Welcome
Sign Up  Log On

“Character Sets and Conversant (and the Eudora Problem)”

From: Seth Dillingham In Response To: Top of Thread.  
Date Posted: Thursday, April 20, 2006 12:14:40 PM Replies: 3
   
Enclosures: None.

We recently decided to 'standardize' all Conversant text (everything from templates to messages) on UTF-8. I've been quite happy with this decision, as it made it possible (well, easier... it was always possible) to host truly international and multi-lingual sites, and made it a lot easier to deal with content coming in from a variety of sources like Microsoft Word. For example, we no longer bat an eye at ‘fancy’ characters like “curly quotes” or — for another example — long dashes.

This hasn't come without some pain on our end, though. We have to figure out what character set was used for the text being sent when a new message is created. That's supposed to be pretty easy: email, for example, generally includes a special header called "Content-Type" which lists the character set.

The problem is when the software that sent the email lies to us. This is where I'm stumped at the moment.

One of my clients uses Qualcomm's Eudora for all of his mail. He sends HTML messages to his Conversant site, and the messages always contain curly quotes.

Here's the problem: Eudora claims the message's character set is us-ascii. This is the simplest character set in use today, and converting it to UTF-8 should not be a problem... but those of you who have any experience with character sets already know what I'm going to say, right?

US-ASCII (a.k.a. ASCII) doesn't have curly quotes. Eudora must be lying about the character set, right? US-ASCII is a seven-bit set, and the character codes for the curly quotes are all in the 8-bit range, so it seems that it must be lying to me.

This has plagued me for days, and the client is beginning to think I'm being lazy by blaming his Eudora. (Surely a big company would never do anything so obviously wrong with 'established' software, right?)

Anybody have any suggestions? I haven't had much luck looking for answers in Google. What character set is Eudora really using when it sends "above ascii" text but claims it's all ASCII? Is it the platform-native character set, like windows-latin-1 or iso-8859-1, or something else entirely?


Discussion Thread:
Trackbacks:

There are no trackbacks.



TruerWords
is Seth Dillingham's
personal web site.
More than the sum of my parts.