Reading a Doc file using Java

Hi by using a simple java IO we can’t read contents of a doc file.

Jakarta poi has some utilities to read contents of a file using its api.

Every line of the doc file is considered as a paragraph.

import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class ReadandWrite {
public static void readDocFile(String filePath) {
POIFSFileSystem fs = null;
try {
fs = new POIFSFileSystem(new FileInputStream(filePath));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
System.out.println(“Word Document has ” + paragraphs.length+ ” paragraphs”);
for (int i = 0; i < paragraphs.length; i++) {
paragraphs[i] = paragraphs[i].replaceAll(“\\cM?”, “”);
System.out.println(paragraphs[i]);
}
} catch (Exception e) {
}
}
public static void main(String[] args) throws Throwable {
readDocFile(“yourfile.doc”);
}
}

For this i used  poi-3.0-FINAL.jar.

Advertisements
  1. Just want to say what a great blog you got here!
    I’ve been around for quite a lot of time, but finally decided to show my appreciation of your work!

    Thumbs up, and keep it going!

    Cheers
    Christian, Satellite Direct Tv

    • Sunil Agarwal
    • May 24th, 2010

    Very Nice Information….

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: