class PdfToText
The PdfToText object.
This binary adapter is used to extract text from PDF with the PdfToText
binary provided by XPDF.
Methods
__construct(string $binary, Logger $logger)
Constructor |
||
__destruct()
Destructor |
||
PdfToText |
setPageQuantity(integer $pages)
Set the default number of page to extract When extracting text, if no page end is provided and this value has been set, then the quantity will be limited. |
|
PdfToText |
open(string $pathfile)
Opens a PDF file in order to extract text |
|
PdfToText |
close()
Close the current open file |
|
PdfToText |
setOuputEncoding(string $charset)
Set the output encoding. |
|
string |
getOuputEncoding()
Get the ouput encoding, default is UTF-8 |
|
string |
getText(integer $page_start = null, integer $page_end = null)
Extract the text of the current open PDF file, if not page start/end provided, etxract all pages |
|
static PdfToText |
load(Logger $logger)
Look for pdftotext binary and return a new XPDF object |
Details
at line 47
public
__construct(string $binary, Logger $logger)
Constructor
at line 56
public
__destruct()
Destructor
at line 76
public PdfToText
setPageQuantity(integer $pages)
Set the default number of page to extract When extracting text, if no page end is provided and this value has been set, then the quantity will be limited.
Set this value to null to reset it
at line 95
public PdfToText
open(string $pathfile)
Opens a PDF file in order to extract text
at line 114
public PdfToText
close()
Close the current open file
at line 129
public PdfToText
setOuputEncoding(string $charset)
Set the output encoding.
If the charset is invalid, the getText method
will fail.
at line 141
public string
getOuputEncoding()
Get the ouput encoding, default is UTF-8
at line 157
public string
getText(integer $page_start = null, integer $page_end = null)
Extract the text of the current open PDF file, if not page start/end provided, etxract all pages
at line 223
static public PdfToText
load(Logger $logger)
Look for pdftotext binary and return a new XPDF object