I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?
I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?
import fitz
doc = fitz.open("2303.11366v4.pdf")# download from https://arxiv.org/pdf/2303.11366
print(doc[2].get_text().split('Figure 1')[0])
I got this text:
@7DVNFOHDQVRPHSDQDQGSXW
LWLQFRXQWHUWRS
'HFLVLRQPDNLQJ
7DVN@RIRSHQ
RUFORVH
SDUHQWKHVHVRQO\>@
3URJUDPPLQJ
7DVN:KDWSURIHVVLRQGRHV-RKQ
/DQFKHVWHUDQG$ODQ'HDQ)RVWHU
KDYHLQFRPPRQ"
5HDVRQLQJ
>@
$FWLRQWDNHSDQIURPVWRYHEXUQHU
2EV1RWKLQJKDSSHQV>@
$FWLRQFOHDQSDQZLWKVLQNEDVLQ
2EV1RWKLQJKDSSHQV>@
7KLQN>@QRYHOLVWMRXUQDOLVW
FULWLF>@QRYHOLVW
VFUHHQZULWHU>@FRPPRQLV
QRYHOLVWDQGVFUHHQZULWHU
$FWLRQ²QRYHOLVWVFUHHQZULWHU³
GHIPDWFKBSDUHQVOVW
LIVFRXQW
VFRXQW
VFRXQW
VFRXQW
>@
UHWXUQ
1R
6HOIJHQHUDWHGXQLWWHVWVIDLO
DVVHUWPDWFKBSDUHQV
(QYLURQPHQW%LQDU\5HZDUG
5XOH/0+HXULVWLF
+DOOXFLQDWLRQ
>@IDLOHGEHFDXVH,LQFRUUHFWO\
DVVXPHGWKDWWKH\ERWKKDGWKH
VDPHPXOWLSOHSURIHVVLRQV>@
DFFXUDWHO\LGHQWLI\LQJWKHLU
SURIHVVLRQV
>@ZURQJEHFDXVHLWRQO\FKHFNV
LIWKHWRWDOFRXQWRIRSHQDQG
FORVHSDUHQWKHVHVLVHTXDO>@
RUGHURIWKHSDUHQWKHVHV>@
>@WULHGWRSLFNXSWKHSDQLQ
VWRYHEXUQHU>@EXWWKHSDQ
ZDVQRWLQVWRYHEXUQHU>@
>@
UHWXUQ
@6RWKHSURIHVVLRQ
-RKQ/DQFKHVWHUDQG$ODQ'HDQ
)RVWHUKDYHLQFRPPRQLVQRYHOLVW
$FWLRQ²QRYHOLVW³
>@$FWLRQWDNHSDQIURP
VWRYHEXUQHU
>@2EV
I tried various different Python packages, but all of them resulted in garbled text.
It seems AI could recognize what does this text says.
1. - You are in the middle of a room.
2. [TASKCLEANSOMEDANANDPUTITINCOUNTERTOP] - The task is to clean a pan and put it on the countertop.
3. [DECISIONMAKING] - This section is about decision making.
4. [TASKYOUAREGIVENALISTOFTWOSTRINGSOFOPEN ORCLOSEPARENTHESESONLY[]] - The task is to work with a list of two strings of open or close parentheses.
5. [PROGRAMMING] - This section is about programming.
6. [TASKWHATPROFESSIONDOESJOHNLANCHESTERANDALANDEANFOSTERHAVEINCOMMON?] - The task is to determine what profession John Lanchesterand Alan Dean Foster have in common.
.....
This text is from claude haiku.
But I need translate the text to human language with python.
