How can instantaneously extract text from a screen area using OCR tools?
Maybe there is already some tool that does that, but you can also create a simple script with some screenshot tool and tesseract, as you are trying to use.
Take as an example this script (in my system I saved it as /usr/local/bin/screen_ts
):
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick scrot
select tesseract_lang in eng rus equ ;do break;done
# Quick language menu, add more if you need other languages.
SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT
scrot -s $SCR_IMG.png -q 100
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.
mogrify -modulate 100,0 -resize 400% $SCR_IMG.png
#should increase detection rate
tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt
exit
And with clipboard support:
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick scrot xsel
select tesseract_lang in eng rus equ ;do break;done
# quick language menu, add more if you need other languages.
SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT
scrot -s $SCR_IMG.png -q 100
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% $SCR_IMG.png
#should increase detection rate
tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt | xsel -bi
exit
It uses scrot
to take the screen, tesseract
to recognize the text and cat
to display the result. The clipboard version additionally utilizes xsel
to pipe the output into the clipboard.
NOTE: scrot
, xsel
, imagemagick
and tesseract-ocr
are not installed by default but are available from the the default repositories.
You may be able to replace scrot
with gnome-screenshot
, but it may take a lot of work. Regarding the output you can use anything that can read a text file (open with Text Editor, show the recognized text as a notification, etc).
GUI version of the script
Here's a simple graphical version of the OCR script including a language selection dialog:
#!/bin/bash
# DEPENDENCIES: tesseract-ocr imagemagick scrot yad
# AUTHOR: Glutanimate 2013 (http://askubuntu.com/users/81372/)
# NAME: ScreenOCR
# LICENSE: GNU GPLv3
#
# BASED ON: OCR script by Salem (http://askubuntu.com/a/280713/81372)
TITLE=ScreenOCR # set yad variables
ICON=gnome-screenshot
# - tesseract won't work if LC_ALL is unset so we set it here
# - you might want to delete or modify this line if you
# have a different locale:
export LC_ALL=en_US.UTF-8
# language selection dialog
LANG=$(yad \
--width 300 --entry --title "$TITLE" \
--image=$ICON \
--window-icon=$ICON \
--button="ok:0" --button="cancel:1" \
--text "Select language:" \
--entry-text \
"eng" "ita" "deu")
# - You can modify the list of available languages by editing the line above
# - Make sure to use the same ISO codes tesseract does (man tesseract for details)
# - Languages will of course only work if you have installed their respective
# language packs (https://code.google.com/p/tesseract-ocr/downloads/list)
RET=$? # check return status
if [ "$RET" = 252 ] || [ "$RET" = 1 ] # WM-Close or "cancel"
then
exit
fi
echo "Language set to $LANG"
SCR_IMG=$(mktemp) # create tempfile
trap "rm $SCR_IMG*" EXIT # make sure tempfiles get deleted afterwards
scrot -s "$SCR_IMG".png -q 100 #take screenshot of area
mogrify -modulate 100,0 -resize 400% "$SCR_IMG".png # postprocess to prepare for OCR
tesseract -l "$LANG" "$SCR_IMG".png "$SCR_IMG" # OCR in given language
xsel -bi < "$SCR_IMG".txt # pass to clipboard
exit
Aside from the dependencies listed above you will need to install the Zenity fork YAD from the webupd8 PPA to make the script work.
I created a free and open source program for this purpose:
https://danpla.github.io/dpscreenocr/
Don't know if any one need my solution. Here is one that runs with wayland.
It shows the character-recognition in a Text-Editor and if you add the paramter "yes" you got the translation from the goggle trans tool (Internet connection is mandatory) Before you can use it install tesseract-ocr imagemagick and google-trans. Start the script i.e. in gnome with Alt+F2 when you see your text that you want to recognize. Move the courser arround the text. Thats it. This script was testetd only for gnome. For other window manager it musst be accommodate. To translate the text in other languages replace the language ID in line 25.
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick google-trans
translate="no"
translate=$1
SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT
gnome-screenshot -a -f $SCR_IMG.png
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.
mogrify -modulate 100,0 -resize 400% $SCR_IMG.png
#should increase detection rate
tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
if [ $translate = "yes" ] ; then
trans :de file://$SCR_IMG.txt -o $SCR_IMG.translate.txt
gnome-text-editor $SCR_IMG.translate.txt
else
gnome-text-editor $SCR_IMG.txt
fi
exit
I just done a blogging about how to use screenshot in modern day. Even though i target Chinese but the screen cast and code is in english. OCR is merely one of the feature.
Feature for my OCR:
Open in konsole+vimx OR gedit to further edit.
For vimx+english, enable spelling checking.
Support dynamic language selection without hard code.
Progress dialog when converting and tesseracting which is slow.
Function code:
function ocr () {
tmpj="$1"
tmpocr="$2"
tmpocr_p="$3"
atom="$(tesseract --list-langs 2>&1)"; atom=(`echo "${atom#*:}"`); atom=(`echo "$(printf 'FALSE\n%s\n' "${atom[@]}")"`); atom[0]='True'
ans=(`yad --center --height=200 --width=300 --separator='|' --on-top --list --title '' --text='Select Languages:' --radiolist --column '✓' --column 'Languages' "${atom[@]}" 2>/dev/null`) && ans="$(echo "${ans:5:-1}")" && convert "$tmpj[x2000]" -unsharp 15.6x7.8+2.69+0 "$tmpocr_p" | yad --on-top --title '' --text='Converting ...' --progress --pulsate --auto-close 2>/dev/null && tesseract "$tmpocr_p" "$tmpocr" -l "$ans" 2>>/tmp/tesseract.log | yad --percentage=50 --on-top --title '' --text='Tesseracting ...' --progress --pulsate --auto-close 2>/dev/null && if [[ "$ans" == 'eng' ]]; then konsole -e "vimx -c 'setlocal spell spelllang=en_us' -n $tmpocr.txt" 2>/dev/null; else gedit "$tmpocr.txt"; fi
rm "$tmpocr_p"
}
Caller code:
for cmd in "mktemp" "convert" "tesseract" "gedit" "konsole" "vimx" "yad"; do
command -v $cmd >/dev/null 2>&1 || { LANG=POSIX; xmessage "Require $cmd but it's not installed. Aborting." >&2; exit 1; }; :;
done
tmpj="$(mktemp /tmp/`date +"%s_%Y-%m-%d"`_XXXXXXXXXX.png)"
tmpocr="$(mktemp -u /tmp/`date +"%s_%Y-%m-%d"`_ocr_XXXXX)"
tmpocr_p="$tmpocr"+'.png'
gnome-screenshot -a -f "$tmpj" 2>&1 >/dev/null | ts >>/tmp/gnome_area_PrtSc_error.log
ocr $tmpj $tmpocr $tmpocr_p &
Combine this 2 code in single shell script to run.
Screenshot 1:
Screenshot 2: