

7·
17 days agoIt’s a curse because it’s used for things other than what it’s intended to. It’s doing a good job representing printed material, but unfortunately people very commonly expect it to be something more akin to a word processor file.
It’s a curse because it’s used for things other than what it’s intended to. It’s doing a good job representing printed material, but unfortunately people very commonly expect it to be something more akin to a word processor file.
I know the pain. While there are definitely solutions that work sometimes, there’s just no “one size fits all” that I’m aware of. PDFs can represent text very differently internally.
What I did for one project where extracting the text produced a complete mess was to convert the PDF pages to images and then OCR them…
Hate? Digital decluttering feels really good, for me anyway.
I watched it, was mid. I was hoping for something more adult oriented.
Interesting, I’ll keep it in mind next time I have to deal with this problem (hopefully never but who knows).
A few years ago I was in contact with researchers that were developing an AI tool to parse PDFs (I think they didn’t care about converting to editable formats, but extracting data), from their material I got the impression that it’s extremely difficult to do right using traditional algorithms.