Every now and then I receive a Word Document that I have to display as a web page. I'm currently using Django's flatpages to achieve this by grabbing the html content generated by MS Word. The generated html is quite messy. Is there a better way that can generate very simple html to solve this issue using Python?
A good solution involves uploading into Google Docs and exporting the html version from it. (There must be an api for that?)
It does so many "clean ups"; Beautiful Soup down the road can be used to make any further changes, as appropriate. It is the most powerful and elegant html parsing library on the planet.
This is a known standard for Journalist companies.
I found this web page: http://www.textfixer.com/html/convert-word-to-html.php
It converts a formated text to simple HTML markup, preserving bold, italic, links and paragraphs, but not adding tags for font-sizes and faces. Exactly what I needed to save some time.