Reading a Doc file using Java
Hi by using a simple java IO we can’t read contents of a doc file.
Jakarta poi has some utilities to read contents of a file using its api.
Every line of the doc file is considered as a paragraph.
import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class ReadandWrite {
public static void readDocFile(String filePath) {
POIFSFileSystem fs = null;
try {
fs = new POIFSFileSystem(new FileInputStream(filePath));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
String[] paragraphs = we.getParagraphText();
System.out.println(“Word Document has ” + paragraphs.length+ ” paragraphs”);
for (int i = 0; i < paragraphs.length; i++) {
paragraphs[i] = paragraphs[i].replaceAll(“\\cM?”, “”);
System.out.println(paragraphs[i]);
}
} catch (Exception e) {
}
}
public static void main(String[] args) throws Throwable {
readDocFile(“yourfile.doc”);
}
}
For this i used poi-3.0-FINAL.jar.
Advertisement
Just want to say what a great blog you got here!
I’ve been around for quite a lot of time, but finally decided to show my appreciation of your work!
Thumbs up, and keep it going!
Cheers
Christian, Satellite Direct Tv
thank you
Thanks..
Very Nice Information….
thnx