I'm currently trying to extract text from a PDF document, but I encountered some strange cases with the Tj operator. Normally I dealt with cases like these:
Tc (SOME_TEXT) TJ
Now I encounter a case like this:
Tm [
( )1.828
(5)1.841
(2)1.828
(2)1.828
(4)1.841
(9)1.828
(.)1.828
(6)1.841
(4)
]
TJ
Which converts to string '52249.64'. Now I have encountered yet another strange case:
Only info I could find is this: A string passed to Tj is always to be interpreted according to the Encoding or CMap for the font. (In this case I expect it is a CIDFont with a CMap)
Td (
\t\004\007\020\007\016\016\026\020
)
Tj
I still don't understand. Are these some kind of indexes that indicate an offset in some kind of character array or do I have to decode these values? Thanks!