org.apache.pdfbox.pdfparser
Class PDFParser

java.lang.Object
  extended by org.apache.pdfbox.pdfparser.BaseParser
      extended by org.apache.pdfbox.pdfparser.PDFParser

public class PDFParser
extends BaseParser

This class will handle the parsing of the PDF document.

Version:
$Revision: 1.53 $
Author:
Ben Litchfield

Field Summary
 
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
DEF, document, ENDOBJ, ENDSTREAM, pdfSource
 
Constructor Summary
PDFParser(InputStream input)
          Constructor.
PDFParser(InputStream input, RandomAccess rafi)
          Constructor to allow control over RandomAccessFile.
PDFParser(InputStream input, RandomAccess rafi, boolean force)
          Constructor to allow control over RandomAccessFile.
 
Method Summary
 COSDocument getDocument()
          This will get the document that was parsed.
 FDFDocument getFDFDocument()
          This will get the FDF document that was parsed.
 PDDocument getPDDocument()
          This will get the PD document that was parsed.
 void parse()
          This will parse the stream and populate the COSDocument object.
 void setTempDirectory(File tmpDir)
          This is the directory where pdfbox will create a temporary file for storing pdf document stream in.
 
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpaces
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFParser

public PDFParser(InputStream input)
          throws IOException
Constructor.

Parameters:
input - The input stream that contains the PDF document.
Throws:
IOException - If there is an error initializing the stream.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi)
          throws IOException
Constructor to allow control over RandomAccessFile.

Parameters:
input - The input stream that contains the PDF document.
rafi - The RandomAccessFile to be used in internal COSDocument
Throws:
IOException - If there is an error initializing the stream.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi,
                 boolean force)
          throws IOException
Constructor to allow control over RandomAccessFile. Also enables parser to skip corrupt objects to try and force parsing

Parameters:
input - The input stream that contains the PDF document.
rafi - The RandomAccessFile to be used in internal COSDocument
force - When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file
Throws:
IOException - If there is an error initializing the stream.
Method Detail

setTempDirectory

public void setTempDirectory(File tmpDir)
This is the directory where pdfbox will create a temporary file for storing pdf document stream in. By default this directory will be the value of the system property java.io.tmpdir.

Parameters:
tmpDir - The directory to create scratch files needed to store pdf document streams.

parse

public void parse()
           throws IOException
This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.

Throws:
IOException - If there is an error reading from the stream or corrupt data is found.

getDocument

public COSDocument getDocument()
                        throws IOException
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.

Returns:
The document that was parsed.
Throws:
IOException - If there is an error getting the document.

getPDDocument

public PDDocument getPDDocument()
                         throws IOException
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:
The document at the PD layer.
Throws:
IOException - If there is an error getting the document.

getFDFDocument

public FDFDocument getFDFDocument()
                           throws IOException
This will get the FDF document that was parsed. When you are done with this document you must call close() on it to release resources.

Returns:
The document at the PD layer.
Throws:
IOException - If there is an error getting the document.


Copyright © 2002-2010 The Apache Software Foundation. All Rights Reserved.