Text OCR ("tocr")

4.3.1 Text OCR ("tocr")

DESCRIPTION

The Text OCR method (code "tocr") allows calling an external Optical Character Recognition (OCR) engine to extract text from the remote desktop image. The preferred OCR engine must be selected and configured in the preferences. Robot 5.x supports:

  • Tesseract OCR is the default one. It is free and supports all major platforms (MS Windows, Linux/Unix, Mac OS X). The disadvantages are lower performance (speed) and accuracy.
  • ABBYY Fine Reader is a commercial OCR engine available at an extra cost. It is faster and more accurate than Tesseract. It is supported on MS Windows only.
  • Google Vision is a cloud service performing OCR. As the desktop image gets uploaded through a secure connection (HTTPS) to the Google site please consider your privacy and security before choosing this method. The OCR process is fairly fast and accurate. The biggest factor in the overall performance is your network speed.

The recognized text may be optionally tested for the presence of a phrase using the "text" (plain text search), "text" and "distance" (tolerant text search) or "pattern" (regular expression matching) parameters.

Regular expression matching matches the recognized text against the specified java.util.regex.Pattern compliant regular expression. Up to version 4.1.3 the expression must match the whole text and no searching for a matching substring is performed. Version 4.1.4 supports searching of the text for matching locations and returns their text and coordinates the same way as the text parameter.

The tolerant (fuzzy) text search is based on the Levenshtein distance. It is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. This metric is enabled through the "distance" parameter which is an integer number specifying how many characters may be omitted or not recognized properly at a maximum to consider the sample text provided in the "text" parameter equivalent. Unlike the regular expressions, the tolerant matching always searches the recognized text for any occurrence of a string matching the given text and distance. There's no need to specify that the string is preceded or trailed by another text.

The method stores the OCR results into a set of TOCR prefixed variables as follows:

Variable Name

Description

_TOCR_ERROR=<errorText>

Contains the error text thrown by the OCR engine.

_TOCR_TEXT=<text>

The recognized text (all lines separated by new line characters).

_TOCR_TEXT_X=<X-coordinate>
_TOCR_TEXT_Y=<Y-coordinate>
_TOCR_TEXT_W=<width>
_TOCR_TEXT_H=<height>

The bounding rectangle of the recognized text (since 3.4).

_TOCR_LINE_COUNT=<number>

Number of recognized text lines (rows).

_TOCR_LINE<n>=<lineText>

Text of the n-th line where <n> is between 1 and _TOCR_LINE_COUNT.

_TOCR_LINE_X<n>=<X-coordinate>
_TOCR_LINE_Y<n>=<Y-coordinate>
_TOCR_LINE_W<n>=<width>
_TOCR_LINE_H<n>=<height>

The bounding rectangle of the n-th line (since 3.4).

 When the text or pattern parameter is specified to search the recognized text for the given string the method creates result variables as follows:

Variable Name

Description

_TOCR_MATCH_COUNT=<number>The number of locations (strings) in the recognized text that match the pattern 
expression or the text and distance (if specified) parameters.
_TOCR_MATCH=<matchingString>The first matching string. If text is employed and distance is set to 0 or it is not 
specified the variable will contain the value of the text parameter.
_TOCR_MATCH_<n>=<matchingString>The n-th matching string where <n> is between 1 and _TOCR_MATCH_COUNT. 
If text is employed and distance is set to 0 or it is not specified the variable will 
contain the value of the text parameter (since 3.5).
_TOCR_MATCH_INDEX=<index>Index (position) of the first match within the recognized text. Indexing starts with 0 
which indicates the beginning of the recognized text.
_TOCR_MATCH_INDEX_<n>=<index>Index of the n-th match where <n> is between 1 and _TOCR_MATCH_COUNT. 
Indexing starts with 0 which indicates beginning of the recognized text (since 3.5).
_TOCR_MATCH_X=<X-coordinate>
_TOCR_MATCH_Y=<Y-coordinate>
_TOCR_MATCH_W=<width>
_TOCR_MATCH_H=<height>
The bounding rectangle of the first matching string (since 3.4).
_TOCR_MATCH_X_<n>=<X-coordinate>
_TOCR_MATCH_Y_<n>=<Y-coordinate>
_TOCR_MATCH_W_<n>=<width>
_TOCR_MATCH_H_<n>=<height>
The bounding rectangle of the n-th matching string where <n> is between 1 and 
_TOCR_MATCH_COUNT.
_TOCR_MATCH_CLICK_X=<X-coordinate>
_TOCR_MATCH_CLICK_Y=<Y-coordinate>
Center coordinates of the first matching string (since 3.4). They 
may be used for tasks such as "find a string using OCR and click it". 
A typical example: 

Compareto method="tocr" cmparea="x:33,y:2,w:200,h:22" text="Cancel"
if ({_EXIT_CODE} > 0) {
  Exit 1
} else {
  Mouse click to=x:{_TOCR_MATCH_CLICK_X},y:{_TOCR_MATCH_CLICK_Y}
}

Since version 4.1 it is recommended to use the Click command instead 
of the above code block:

Click ocr cmparea="x:33,y:2,w:200,h:22" text="Cancel"

_TOCR_MATCH_CLICK_X_<n>=<X-coordinate>
_TOCR_MATCH_CLICK_Y_<n>=<Y-coordinate>
Center coordinates of the n-th matching string where <n> is between 1 and 
_TOCR_MATCH_COUNT.

OPTIONS

The method accepts no template images. The parameter of "passrate" is not applicable in the context of OCR and it is ignored. The parameter of "cmparea" is optional and defaults to the full screen when omitted.

Supported OCR parameters:

text=<textToSearchFor>

Optional text to search the recognized text for. The hosting command will then return either 0 when the OCR was performed correctly and the recognized text contains the given string or 1 otherwise. Indices and screen coordinates of the text locations are then stored to the _TOCR script variables and may be farther processed. This parameter can not be used together with the pattern one.

distance=<0-[textLength]>

Optional Levenshtein distance used in conjunction with the "text" parameter to perform tolerant text matching. See the method specification for details.

pattern=<regularExpression>

Optional java.util.regex.Pattern compliant regular expression to test the recognized text against. This parameter cannot be used together with the text one.

    • Up to version 4.1.3 the expression must match the whole text (no searching for a matching substring is performed). 
    • Since version 4.1.4 the parameter will search the text for matching locations the same way as the text parameter. Indices and screen coordinates of the matching text locations are then stored to the _TOCR script variables and maybe farther processed. This enhancement allows creating actions like "find a matching text location and click it" within a single "Click ocr" command.  


See the Tesseract OCR help page for details on its specific parameters.

RETURNS

The method makes the hosting command throw a runtime error on misconfiguration or an I/O error. Text of the error is made available through the _TOCR_ERROR variable.

The return code otherwise depends on the input parameters. If the method is called to perform just OCR with no result testing it always returns 0 even if no text was recognized. If the method is called with the parameters of "text" or "pattern" it returns 0 if the recognized text matches the given string/regular expression or 1 otherwise.

EXAMPLES

Var _TOCR_LINE_COUNT=0
Compareto method="tocr" cmparea="x:33,y:2,w:200,h:22"
for (i=1; {i}<{_TOCR_LINE_COUNT}+1; i={i}+1) {
   Typeline "{_TOCR_LINE{i}}"
}
 

- Recognize text in the specified desktop area and type it onto the desktop.

Compareto method="tocr" cmparea="x:33,y:2,w:200,h:22" pattern="[aA]pplication"
if ({_EXIT_CODE} > 0) {
  Exit 1
}

- Exit the script when the recognized text doesn't match the 'Application' or 'application' word.

Compareto method="tocr" cmparea="x:33,y:2,w:200,h:22" pattern=".*[aA]pplication.*"
if ({_EXIT_CODE} > 0) {
  Exit 1
}

- The previous example modified to exit the script when the recognized text doesn't contain 'Application' or 'application'.

Compareto method="tocr" cmparea="x:33,y:2,w:200,h:22" text="apple" distance="2"

- Recognize text on the screen and verifies whether it is like and/or it contains a string like 'apple':

  • If the OCR recognizes text like 'There is an apple', the command will report success (the exit code of 0) because an exact match is found.
  • If the OCR recognizes text like 'There is an Apple', the command will report success because word 'Apple' can be changed to 'apple' by substitution of one character (A->a) which is within the tolerated distance of 2.
  • If the OCR recognizes text like 'There are Appls', the command will report success because word 'Appls' can be changed to 'apple' in two operations of (1) substitution (A->a) and (2) addition (+e) or substitution (s->e) which are still within the tolerated distance of 2.
  • If the OCR recognizes text like 'There is Doppler', the command will report success because word 'Doppler' contains a sequence matching 'apple' after a single substitution (o->a).
  • If the OCR recognizes text like 'There are Appis', the command will report failure (the exit code of 1) because word 'Appis' requires at least three operations (A->a, i->l, s->e) to be turned into 'apple' which is more than the tolerated distance of 2.

    When the text is matched, the index of the match location in the text is stored to the TOCR_MATCH_INDEX variable. The index starts from 0 which represents the text beginning. The matching substring is further on saved under the _TOCR_MATCH variable. For example, the example above with the recognized text of _'There are Appls' will create the variables as _TOCR_MATCH=Appls and _TOCR_MATCH_INDEX=10.