Image Based Text Recognition ("text")

4.3.2 Image-Based Text Recognition ("text")


The Image-Based Text Recognition (code "text", since v3.0) allows recognizing text and its coordinates on the screen based on a collection of pre-saved characters. Compared to OCR:

  • OCR engines typically recognize characters by shape patterns. It works out of the box for numerous fonts, text and background colours. When the OCR is applied to a font it was not calibrated for it may fail. Such a situation has either no workaround or there is a lengthy manual process required to train the engine for the particular font (such as the Tesseract Training Process).
  • Robot's Image-Based Text Recognition recognizes characters by images. It works only for the particular font, text colour and background the image collection was built upon. It will get you the text coordinates or even the location of a substring identified through the input text search criteria.

Character image collections can be comfortably created and maintained in the Character Capture Wizard. See its help page for details.

Similar to the "tocr" method, the recognized text can ve verified in three ways:

  1. The "text" parameter alone activates plain text search for the specified string and makes the calling command return 0 (success) when found and 1 (fail) when not found.
  2. A combination of the "text" and "distance" parameters performs tolerant (fuzzy) text search. The "distance" parameter is the Levenshtein distance which is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. This metric is enabled through the "distance" parameter which is an integer number specifying how many characters may be omitted or not recognized properly at a maximum to consider the sample text provided in the "text" parameter equivalent. Unlike the regular expressions, the tolerant matching always searches the recognized text for any occurrence of a string matching the given text and distance. There's no need to specify that the string is preceded or trailed by another text.
  3. The regular expression matching matches the recognized text against the specified java.util.regex.Pattern compliant regular expression. The expression must match the whole text and no searching for a matching substring is performed.

The method stores the OCR results into a set of _TEXT prefixed variables as follows:

Variable Name


_TEXT_ERR=<error_text>Error message. It gets populated only if the method fails to perform text 
recognition, for example when there is no desktop connection or when 
the specified character image collection doesn't exist or can not be read.
_TEXT=<text>The recognized text (full multiline format). The variable is created always 
when the method gets executed successfully (meaning that it doesn't fail 
for a missing desktop connection or invalid character image collection).
The X, Y coordinates and the width and height of the recognized text.
_TEXT_LINE_COUNT=<number>Number of recognized text lines.
_TEXT_LINE<n>=<text>The n-th text line where <n> is the line number between 1 and 
The X, Y coordinates and the width and height of the n-th text line 
where <n> is the line number between 1 and _TEXT_LINE_COUNT.
_TEXT_MATCH=<text>The part of recognized text that matches the "text" parameter (plain text 
search) or a combination of "text" and "distance" (tolerant text search).
_TEXT_MATCH_INDEX=<number>Position (index) of the matching text referred to by _TEXT_MATCH. Indexing
starts from 0 which corresponds to the beginning of the recognized text.
The X, Y coordinates and the width and height of the matching text 
referred to by the _TEXT_MATCH variable.


The method requires one character image collection on the input. The "passrate" parameter in the context of the "text" method defines tolerance to minor differences in the rendering of the characters. As this tolerance is currently experimental and works only for a few larger fonts. It is recommended to keep the pass rate at 100%. The "cmparea" standard parameter can be used to limit the text recognition to a particular rectangular area of the screen. It defaults to the full screen if not specified.

Supported specific parameters:


Optional text to search the recognized text for. The hosting command will then return either 0 when the recognition was performed correctly and the recognized text contains the given string or 1 otherwise. This parameter cannot be used together with the pattern one.


Optional Levenshtein distance used in conjunction with the "text" parameter to perform tolerant text matching. See the method specification for details.


Optional java.util.regex.Pattern compliant regular expression to test the recognized text against. The expression must match the whole text (no searching for a matching substring is performed). This parameter cannot be used together with the text one.


When none of the "text", "distance" and "pattern" parameters is used the calling command returns 0 (success) to indicate a successful execution even when no text gets actually recognized. If the text matching parameters are used the calling command will return either success (0) or fail (non-zero value) depending on whether the recognized text matches or not.


To deal with a failing "text" comparison:

  • Make sure that the background colour and the font type, size and colour you are trying to recognize on the screen are the same as on the character images in the character image collection.
  • As the internal algorithm derives the height of the text line and the space size from the collection images, do not mix images of characters of different font type and/or size into a single collection. Mixing of character images of different font and background colours is OK as long as the characters are of the same font type and size.
  • If the text is ant aliased and/or is not rendered regularly try lowering the pass rate. This will, however, work only for larger fonts.


"C:\MyAutomation\chars"   method="text" cmparea="x:33,y:2,w:200,h:22"
for (i=1; {i}<{_TEXT_LINE_COUNT}+1; i={i}+1) {
   Typeline "{_TEXT_LINE{i}}"

- Recognize text in the specified desktop area and type it onto the desktop.

Compareto "C:\MyAutomation\chars"   method="text" cmparea="x:33,y:2,w:200,h:22" pattern="[aA]pplication"
if ({_EXIT_CODE} > 0) {
  Exit 1

- Exit the script when the recognized text doesn't match the 'Application' or 'application' word.

Compareto "C:\MyAutomation\chars"   method="text" cmparea="x:33,y:2,w:200,h:22" pattern=".*[aA]pplication.*"
if ({_EXIT_CODE} > 0) {
  Exit 1

- The previous example modified to exit the script when the recognized text doesn't contain 'Application' or 'application'.

Compareto "C:\MyAutomation\chars"   method="text" cmparea="x:33,y:2,w:200,h:22" text="apple" distance="2"

- Recognize text on the screen and verifies whether it is like and/or it contains a string like 'apple':

  • If the method recognizes text like 'There is an apple', the command will report success (the exit code of 0) because an exact match is found.
  • If the method recognizes text like 'There is an Apple', the command will report success because word 'Apple' can be changed to 'apple' by substitution of one character (A->a) which is within the tolerated distance of 2.
  • If the method recognizes text like 'There are Appls', the command will report success because word 'Appls' can be changed to 'apple' in two operations of (1) substitution (A->a) and (2) addition (+e) or substitution (s->e) which are still within the tolerated distance of 2.
  • If the method recognizes text like 'There is Doppler', the command will report success because word 'Doppler' contains a sequence matching 'apple' after a single substitution (o->a).
  • If the method recognizes text like 'There are Appis', the command will report failure (the exit code of 1) because word 'Appis' requires at least three operations (A->a, i->l, s->e) to be turned into 'apple' which is more than the tolerated distance of 2.

When the text is successfully matched, the index of the match location in the text is stored to the _TEXT_MATCH_INDEX variable and its X, Y coordinates and width/height to _TEXT_MATCH_X, _TEXT_MATCH_Y, _TEXT_MATCH_WIDTH and _TEXT_MATCH_HEIGHT. The matching substring is further on saved under the _TOCR_TEXT variable. 
For example, the example above with the recognized text of 'There are Appls' will create the variables as _TEXT_MATCH=Appls and _TEXT_MATCH_INDEX=10.