What’s my best option to put about 100 pages of photocopied typewriter pages in German online with #OCR? I think my very old CanoScan LiDE 25 scanner comes with some OCR software. Was there a significant improvement which is freely available, now? How cool is the workflow with pictures taken from a phone and tesseract? Any other free options?
I thought perhaps orientation is to blame but apparently not.
Current status: apt install yagf tesseract-ocr-deu. "YAGF is a graphical interface for cuneiform and tesseract text recognition tools on the Linux platform. With YAGF you can scan images via XSane, import pages from PDF documents, perform images preprocessing and recognize texts using cuneiform from a single command centre. YAGF also makes it easy to scan and recognize several images sequentially." Looking forward to cuneiform OCR!
#ocr #tesseract
Current status: as soon as xsane is finished scanning the page, #yagf crashes. I think I'm going to use tesseract directly, from the command line. Or at least try this for one page and if it works, find a workflow to scan all those 100 pages using my old scanner and then a Python script like the one suggested by @vickysteeves for all the nitty gritty details. Better than improvised bash hacking!
Current status: running a perl script which loops through scanimage and calls tesseract on every image. Also my status: nine instances of tesseract running simultaneously and load > 25. 😂
@kensanata Are you planning to publish it?
@seanl Sure, I'll post it on my website. Like the story of my other grandfather. https://alexschroeder.ch/wiki/Roland_Li-Marchetti