4 April 2004

Paginate HTML from Word

PagifyAfter ribbing David about open sourcing his content management system, I decided to put up myself. So here’s my first Free Software project: pagify is a perl script that takes the output of Microsoft Word’s “Save as Web Page...” and

  • cleans out the cruft and proprietary XML gunk,
  • splits the file into HTML pages wherever a Heading 1 style appears, and
  • converts endnotes into footnotes on the appropriate pages.
I personally use it as the first step in formating Human Rights Watch’s many long and footnoted reports for the Web. Pagify is released under the GNU General Public License and will live at http://backspace.com/pagify/.