Sunday, March 27, 2011

Pulling Text from a PDF

Last week after hunting high and low for a short story text to read in my English 9 class I discovered a great story, but it was only in a PDF. I wanted to manipulate it a bit and highlight some of the vocabulary and sections that stu
dents should be looking out for. I also wanted to add comprehension questions at the end to launch into our in-class discussion the following day. Long story short I needed to be able to convert the text into text.

Every once in a while I had to do this before, but I had never come up with a good method. Apple claims that with the new version of Snow Leopard that you can do it in Preview. I have tried this a few times and it just doesn’t work the way it is supposed to every time. Plus it only works on some PDFs and not on every one.

After looking for a fool proof solution - and the many software solutions out there that promised this and didn't deliver - I realized that in fact I was looking too far. Google Docs does and did exactly what I was looking for with the “Upload” feature. I have used this before to transfer Word and Pages documents into Google Docs, but never a PDF. Thanks to Google's back end technology it can do the conversion f

rom PDF to text for you quickly and easily.

Here’s how you would do it:

1.) Go to your Gmail account and click on the Docs tab at the top.

2.) Once in Docs you will notice next to your "Create new" drop down there is a button called "Upload." Click on that:

3.) This brings you to an Upload files page. Click the blue text that says "Upload files." Follow the steps to upload your file. This is very similar to attaching a file to an e-mail.


4.) After you attach it you will notice that the page will refresh and it will have your document listed in the box. Upload more documents if you want.

5.) Google uses Optical Character Recognition (OCR) to complete the transfer. There are however limits and restrictions. For a full outline of those and how OCR works click here.

6.) At the bottom of the page be sure to check the box titled: "Convert text from PDF to Google Documents" so that conversion can happen.
7.) After you click "Start Upload" the document will come back with the files created as a direct URL to your new text Google Document. Click on the link and you will get something that looks like this:
As you will notice there will be the scanned pages above each block of text that came from the PDF. The OCR is not perfect and it will not transfer certain tables and diagrams, but with the text it does a great job. Now that the text is in a Google Document you can edit, manipulate, or even export to a Word Document.

Google Docs saved the day for me and allowed my class to be very productive all within only one file. Most important this technology is free with a Google account. Happy converting!