How do you convert a Word Document into very simple html in Python?


Every now and then I receive a Word Document that I have to display as a web page. I'm currently using Django's flatpages to achieve this by grabbing the html content generated by MS Word. The generated html is quite messy. Is there a better way that can generate very simple html to solve this issue using Python?

10/20/2009 7:52:43 PM

Accepted Answer

A good solution involves uploading into Google Docs and exporting the html version from it. (There must be an api for that?)

It does so many "clean ups"; Beautiful Soup down the road can be used to make any further changes, as appropriate. It is the most powerful and elegant html parsing library on the planet.

This is a known standard for Journalist companies.

10/20/2009 8:20:46 PM

I found this web page:

It converts a formated text to simple HTML markup, preserving bold, italic, links and paragraphs, but not adding tags for font-sizes and faces. Exactly what I needed to save some time.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow