UTF-7 with JavaMail
Java has a pretty cool api for handling email called JavaMail. There are classes for constructing and sending multipart email as well as classes for connecting to mailboxes and processing the messages therein.
I hit a bit of a problem, however, when some of the emails in the pop3 mailbox I was trying to read threw java.io.UnsupportedEncodingException: utf-7. The problem was that the javamail api, while supporting a whole host of character encodings, didn’t support UTF-7. I hadn’t heard of this encoding before and thought maybe, as these were spam emails I was trying to read through, the sender had set a silly value in the header to try and throw off simple mail readers such as mine. However a quick visit to wikipedia informed me that UTF-7 is a real encoding, albeit a rarely used one.
It appears that if you don’t want UTF-7 encoded email to break your mail reader, you have to decode it yourself. The encoding seems a little awkward but there is a solution on this mailing list.
For my purposes, however, it was sufficient to extract the text in an extremely quick and dirty way by reading the raw bytes from the message input stream and only writing out those which fell within the ascii character range (outputting a “?”, otherwise):
private String cheapBodyExtract(InputStream input)
throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int c;
while((c = input.read()) != -1)
{
baos.write(c);
}
byte[] bytes = baos.toByteArray();
for(byte b : bytes)
{
if(b < 32 || b > 127)
{
b = '?';
}
}
return new String(bytes);
}