Reading a Doc file using Java

Hi by using a simple java IO we can’t read contents of a doc file.

Jakarta poi has some utilities to read contents of a file using its api.

Every line of the doc file is considered as a paragraph.

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class ReadandWrite {
public static void readDocFile(String filePath) {
POIFSFileSystem fs = null;
try {
fs = new POIFSFileSystem(new FileInputStream(filePath));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
System.out.println(“Word Document has ” + paragraphs.length+ ” paragraphs”);
for (int i = 0; i < paragraphs.length; i++) {
paragraphs[i] = paragraphs[i].replaceAll(“\\cM?”, “”);
} catch (Exception e) {
public static void main(String[] args) throws Throwable {

For this i used  poi-3.0-FINAL.jar.

    Sunil Agarwal
    May 24th, 2010

    Very Nice Information….

