PHP-XPDF API
Class

XPDF\PdfToText

class PdfToText

The PdfToText object.

This binary adapter is used to extract text from PDF with the PdfToText
binary provided by XPDF.

Methods

__construct(string $binary, Logger $logger)

Constructor

__destruct()

Destructor

PdfToText setPageQuantity(integer $pages)

Set the default number of page to extract When extracting text, if no page end is provided and this value has been set, then the quantity will be limited.

PdfToText open(string $pathfile)

Opens a PDF file in order to extract text

PdfToText close()

Close the current open file

PdfToText setOuputEncoding(string $charset)

Set the output encoding.

string getOuputEncoding()

Get the ouput encoding, default is UTF-8

string getText(integer $page_start = null, integer $page_end = null)

Extract the text of the current open PDF file, if not page start/end provided, etxract all pages

static PdfToText load(Logger $logger)

Look for pdftotext binary and return a new XPDF object

Details

at line 47
public __construct(string $binary, Logger $logger)

Constructor

Parameters

string $binary The path to the `pdftotext` binary
Logger $logger A logger wich will log the events

at line 56
public __destruct()

Destructor

at line 76
public PdfToText setPageQuantity(integer $pages)

Set the default number of page to extract When extracting text, if no page end is provided and this value has been set, then the quantity will be limited.

Set this value to null to reset it

Parameters

integer $pages The numebr of page

Return Value

PdfToText

Exceptions

InvalidArgumentException

at line 95
public PdfToText open(string $pathfile)

Opens a PDF file in order to extract text

Parameters

string $pathfile The path to the PDF file to extract

Return Value

PdfToText

Exceptions

InvalidArgumentException

at line 114
public PdfToText close()

Close the current open file

Return Value

PdfToText

at line 129
public PdfToText setOuputEncoding(string $charset)

Set the output encoding.

If the charset is invalid, the getText method
will fail.

Parameters

string $charset The output charset

Return Value

PdfToText

at line 141
public string getOuputEncoding()

Get the ouput encoding, default is UTF-8

Return Value

string

at line 157
public string getText(integer $page_start = null, integer $page_end = null)

Extract the text of the current open PDF file, if not page start/end provided, etxract all pages

Parameters

integer $page_start The starting page number (first is 1)
integer $page_end The ending page number

Return Value

string The extracted text

Exceptions

LogicException
RuntimeException

at line 223
static public PdfToText load(Logger $logger)

Look for pdftotext binary and return a new XPDF object

Parameters

Logger $logger The logger

Return Value

PdfToText

Exceptions

BinaryNotFoundException