Thursday, February 26, 2009

Word html to real html

Got a mail from a customer where he wanted me to use the html in from the mail in the mails that our system is going to generate. Fine I thought when I saw it. It only took me 3 seconds after viewing the source code of the email when it hit me. This is f#¤!ing word generated html. After the short setback I remembered having listened to one of Stack Overflow’s podcasts where Jeff talked about their problems with their WYSIWYG editor and the similarities that it had with decoding Word html. So a quick google search gave me the http://www.textism.com/wordcleaner . Perfect it did the work. One caveat is that it stripped all class declarations and styling, so that part I have to do by my self, and that it’s only free for files below 20kb.

BTW Jeff wrote his own parser, thou it only works with 2003 versions of Word.