Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving the PDF Text in Array. #16

Open
manuelos opened this issue Apr 16, 2017 · 10 comments
Open

Retrieving the PDF Text in Array. #16

manuelos opened this issue Apr 16, 2017 · 10 comments

Comments

@manuelos
Copy link

Hi there, is there a way to retrieve the data separated in an array?
I have a PDF with a table like this:

Asesor Emisor Carpeta Cis
13315 29036 20001310 20001178

--
But I get the output like this:
AsesorEmisorCarpetaCis13315290362000131020001178

I want to store the data in a database but getting that output doesn't help at all. I want to get an array like this:

Array(
[0] => "Asesor",
[1] => "Emisor",
[2] => "Carpeta",
[3] => "Cis",
[4] => "13315",
[5] => "29036",
[6] => "20001310",
[7] => "20001178"
)

Any help will be appreciated, thanks in advice.

@manuelos
Copy link
Author

I solved it by setting the value of BlockSeparator in main class like this:
public $BlockSeparator = '#$' ;

And using that value for exploding the string.

$file = 'sample2' ;
$pdf = new PdfToText ( "$file.pdf" ) ;

foreach( $pdf -> Pages as $page_number => $page_contents){
$lines = explode(PHP_EOL,$page_contents);
foreach($lines as $key=>$line){
$texts = explode('#$',$line);
}
}

@christian-vigh-phpclasses
Copy link
Owner

@manuelos
Copy link
Author

Thanks for your quick response. I'm practically new trying your powerful library and so far it's been the best I've tried.
I was wondering something else, if it's possible to get the text from specific document areas? For example giving the coordinates of two points to draw a rectangle and then get all the text inside of it.

@christian-vigh-phpclasses
Copy link
Owner

@manuelos
Copy link
Author

Yeah, you couldn't explain it better. That is exactly what I'm looking for and it's nice to know I'm not the only one who thought about it. That would be a really useful feature because we could design some templates for our document pages and we could expect some information in those areas, so if we don't get anything then we would be pretty sure that the value is empty instead of conclude it with other methods. This is something that has to do with tables where some column values may be empty.

@christian-vigh-phpclasses
Copy link
Owner

@manuelos
Copy link
Author

Hi Christian,

I sent you the email with the sample PDF files hoping they work for the tests you need to do.

Thanks for your support.

@christian-vigh-phpclasses
Copy link
Owner

@christian-vigh-phpclasses
Copy link
Owner

@manuelos
Copy link
Author

manuelos commented Sep 6, 2017

Hi Christian.

How are you? I'm back again with my project where I'm using your library and I have some issues about empty values.
I have sent to you an email with the details, hope you receive it.

Thanks in advice for your support.

Greetings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants