How to convert long Word documents with footnotes into Web friendly chunks?

Write a perl script!

What does it do?Word to Web pages

Pagify is a simple perl script that:

  1. removes the cruft and proprietary XML gunk from Word’s “Save as Web Page...” HTML output,
  2. splits the file into separate HTML pages wherever a Heading 1 style appears,
  3. and converts endnotes (if any) to footnotes on the appropriate pages.

You can use it to clean single page documents, too.

Download it here.

How do you use It?

You must have access to a computer with perl installed to run this script.

  1. Download the script.
  2. Save your Microsoft Word document using “Save as Web Page...”
    If you are using Word for Windows XP, select “Save As...” and file type “Web Page, Filtered.”
  3. Create (or locate) an output directory and put your HTML files there.
  4. Edit the file configuration information at the top of the pagify script.
  5. Run!

Pagify puts the cleaned and chopped files into your output directory.

Depending on how many Heading 1’s are in your file, it names them index.htm, 1.htm, 2.htm, 3.htm,... etc. respectively

If files with the same names exist in your output directory, pagify will overwrite them without warning.

There’s still plenty of work to be done, but it does make a useful first pass when coding big documents.

If you use it, find a bug, or have a patch do .

How do you pronounce “pagify”?

Say: “page if eye.”

Release Notes

0.1.3 - released April 29, 2004

0.1.2 - released April 8, 2004

0.1 - released April 4, 2004

Things to Do


This software is copyright (C) 2004 John Emerson. It is distributed under the terms of the GNU General Public License (GPL). Because it is licensed free of charge, there is NO WARRANTY, it is provided AS IS. The author can not be held liable for any damage that might arise from the use of this software. Use it at your own risk.

This page was last updated on January 19, 2005.