Remove jpeg artifacts for scanned texts

I have a scanned PDF of a textbook, but the PDF is aggressively compressed so lots of JPEG artifacts are present and affects its readability. Example: enter image description here I tried a variety of method to fix it but the outcome is not great. waifu2x: Looks better but still have weird artifacts. Also very slow. convert -threshold 70% in.jpg out.png

Is there a fast and effective way to get rid of these artifacts?

Solution 1:

PDF is not a image format, it's just a container that holds images. You have to extract those images, save them in a lossless format (or at least lower the compression otherwise you will add new artefacts). Afterwards you can try to get rid of the artefacts manually or use existing auto filters. However they need to configure them manually specifically to the image. The last step would be to reintegrate them into a PDF.

However there is not "fast, universal" way to remove those artefacts. If there would be, those artefacts - simply speaking - wouldn't have been created in order to reduce the file size.

The only way to get rid of the artefacts would be to recognize the symbols (letters, numbers etc.) and get rid of everything else, which might be done by an OCR software. There is advance OCR software which can work with low resolution documents, but often it is not free. You don't have to buy the software but check for an online service (there are dozens out there). Consider that this will essentially change your graphic files to text files.

How to use INDIRECT with a SUMIF formula for a date range

How is the Tool Menu next to the cursor called? Can it be edited?

Same AllowedIPs for multiple peers with wireguard

What limits the size (GB) of RAM module I can install in a RAM slot? [duplicate]

Unable to change from UEFI to Legacy Mode

Batch script to append record to csv files

Can I re-partition my C: drive during (for example) factory reset to give it a bigger space allocation?

Trying to use miredo, missing libtun6.so.0

How to enable CORS on server-side code in java

Clone object within object [duplicate]

JDBCConnectionException "Unable to acquire JDBC Connection" with spring boot

Access to image at '' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource [duplicate]