Converting Scanned Documents to Editable Text with Office 2003
If you have Microsoft Office 2003 installed and have access to a scanner, you can scan and edit the document in Microsoft Word using MS Office Document Imaging. Optical Character Recognition (OCR) software is used to translate scanned images into computer-editable text. There are some limitations to this Office component and complex jobs may require purchasing software with more features, but many tasks can be accomplished by following these instructions.
- Obtain the scanned document for OCR:
- Scan document as a .tif (or .tiff) file
- Send to your email address
- Save the attachment to your desktop
- Open the document in Microsoft Office Document Imaging (you may be prompted to install this feature if it has not already been installed - go ahead and do so):
- Start | All Programs | Microsoft Office | Microsoft Office Tools | Microsoft Office Document Imaging
- File | Open | Browse to the .tif file on your desktop
- Complete conversion process:
- Tools | Send Text to Word
- Chose "All pages for conversion" OR
- "Selected pages"
- Shift-click each page you want in the pane on the left
- Deselect "Maintain pictures in output" (unless you wish to keep pictures)
- Press OK
- Press OK when you see a dialog box telling you that "MS Office Document Imaging must recognize the text in this document (OCR) before you can perform this operation. This may take awhile."
- Fix your file in Word:
- Save the newly-created .htm as a Word file:
- File | Save As | change "save as type" to Word Document (*.doc)
- Change the View for better readability
- View | Print Layout
- Clean it up
- Highlight and delete any unusual characters that might have been things such as handwriting or formatting characters on the original document
- Change the manual line breaks to paragraph returns with Word's Replace command (you may wish to turn on Show/Hide so that you can see the non-printing characters)
- Edit | Replace
- In "Find what:" type ^l (that's shift-6, then lower case L)
- In "Replace with:" type ^p (that's shift-6, then lower case P)
IMPORTANT: You should proofread your document carefully to make sure that during the scanning process things converted correctly. It's not unusual to see a dropped letter here and there, which could change dramatically the reading of the document.

