Personal tools
You are here: Home Help Computing Help - All Topics Miscellaneous Converting Scanned Documents to Editable Text with Office 2003

Converting Scanned Documents to Editable Text with Office 2003

If you have Microsoft Office 2003 installed and have access to a scanner, you can scan and edit the document in Microsoft Word using MS Office Document Imaging.  Optical Character Recognition (OCR) software is used to translate scanned images into computer-editable text.  There are some limitations to this Office component and complex jobs may require purchasing software with more features, but many tasks can be accomplished by following these instructions.


  1. Obtain the scanned document for OCR:
    1. Scan document as a .tif (or .tiff) file
    2. Send to your email address
    3. Save the attachment to your desktop
  2. Open the document in Microsoft Office Document Imaging (you may be prompted to install this feature if it has not already been installed - go ahead and do so):
    1. Start | All Programs | Microsoft Office | Microsoft Office Tools | Microsoft Office Document Imaging
    2. File | Open | Browse to the .tif file on your desktop
  3. Complete conversion process:
    1. Tools | Send Text to Word
    2. Chose "All pages for conversion" OR
    3. "Selected pages"
      1. Shift-click each page you want in the pane on the left
    4. Deselect "Maintain pictures in output" (unless you wish to keep pictures)
    5. Press OK
  4. Press OK when you see a dialog box telling you that "MS Office Document Imaging must recognize the text in this document (OCR) before you can perform this operation.  This may take awhile."
  5. Fix your file in Word:
    1. Save the newly-created .htm as a Word file:
      1. File | Save As | change "save as type" to Word Document (*.doc)
      2. Change the View for better readability
        1. View | Print Layout
    2. Clean it up
      1. Highlight and delete any unusual characters that might have been things such as handwriting or formatting characters on the original document
      2. Change the manual line breaks to paragraph returns with Word's Replace command (you may wish to turn on Show/Hide so that you can see the non-printing characters)
        1. Edit | Replace
        2. In "Find what:" type ^l (that's shift-6, then lower case L)
        3. In "Replace with:" type ^p (that's shift-6, then lower case P)

IMPORTANT:  You should proofread your document carefully to make sure that during the scanning process things converted correctly.  It's not unusual to see a dropped letter here and there, which could change dramatically the reading of the document.

Document Actions