File

3.4.2 File

DESCRIPTION

File - Read from and/or write to text files. The command allows to open or create plain text files, read/write text, search strings/regular expressions and parse values from individual text lines using either the Comma Separated Values (CSV) format or regular expressions.

Files are by default opened and saved in UTF-8 encoding. The file content is read using the Java character streams line by line and stored into a string buffer in the memory. All subsequent I/O operations are performed on the buffer and the content is written to the file (or output file) only when the file either gets explicitly closed or when the script finishes in a standard way (meaning not through unexpected or abnormal program termination). With regard to the used technology, the command is suitable to process smaller files (up to tens of kB) and its performance significantly degrades with large data files.

Since v6.1 the File command can read MS Word (*.doc, *.docx) and PDF files (*.pdf). The content is converted to plain text and then processed as if the input file was a text one. It is not possible to modify the files and save them in the original format. The content can be only saved in form of a plain text file.

Since v6.2.3 the command improves handling of BOM characters. If an existing file with BOM is opened, modified and saved in the UTF-8 encoding the command inserts the appropriate BOM character into the file. No effort is made to insert the BOM character to newly created files.

The command defines this set of actions:

  • Open - open an existing text file in read/write mode.
  • Create - create a new file in memory.
  • Append - append text to the end of the file.
  • Insert - insert text to a position in the file.
  • Find - search the file for a text value.
  • Read - read text from a position in the file.
  • Parse - parse values from a line using CSV format or regular expression.
  • Delete - delete text and/or line.
  • Close - close and save the file.

A typical workflow consists of the following logical steps:

  1. Open an existing file ("File open") or create a new one ("File create"). This step populates variables with the file name, path and structure. If the script works with multiple files at a time, each file must be tagged by an ID and all subsequent File commands must reference it.
  2. To read a value from a known position in the file using "File read". If the position is unknown, "File find" may be used to identify the position of a text value. If the file contains data in a particular format, parse them through "File parse". This command supports reading of values in the CSV format or parsing of values based on a custom regular expression.
  3. To write to the file using either "File append" or "File insert". The append action adds the text to the end of the file. The insert one allows inserting the text into a particular position in the file.
  4. To close a document and save or discard changes call the "File close" command. This step is optional and all documents are closed and eventually saved automatically when the script finishes.

Navigation through the file and retrieval of parsed text is enabled through variables:

Who Creates, Group Name

Variable Name

Description

File open, 
File create

(so called "File Group")

_FILE_FILEFile path (full/absolute).
_FILE_FILENAMEFilename.
_FILE_OUTFILEOutput file path (full/absolute) if it was explicitly specified.
_FILE_OUTFILENAMEOutput file name if it was explicitly specified.

File open
File create
File append
File insert

("Counter Group")

_FILE_LENGTHFile length in characters including the new line ones. Note that it doesn't have to match the file size because some UTF-8 characters may be encoded in multiple bytes.
_FILE_LINE_COUNT

Number of text lines in the file.

File find
File read
File parse
File insert

("Line Group")
_FILE_LINE_NUMBERNumber of the currently processed line. Lines are numbered from 1 (one).
_FILE_LINE_LENGTHLength of the current line in characters excluding the new line character.
_FILE_LINE_TEXTText of the current line (full length without the new line character).
_FILE_LINE_COLUMNColumn (character) number on the line. Column numbering starts from 1 (one) which represents the beginning of the line. This variable is populated by the "find" action to indicate the position of the searched text on the line.
Other commands either mimic the "column" parameter or its default value of 1 (one).
File read_FILE_READText read by the "File read"command.
File delete_FILE_DELETEDText deleted by the "File delete" command.
File parse

("Parse group")
_FILE_PARSE_COUNT

Number of values parsed by "File parse".

_FILE_PARSE_VALUE<N>N-th value parsed from the text line where <N> is between 1 and _FILE_PARSE_COUNT.

The command in general returns 0 on success or one of the following exit codes:

Exit Code

Pseudocode

Description

0

SUCCESS

Successful completion.

1

FAILED_TO_OPEN

Failed to open the input file. Returned just by "File open".

2

FAILED_TO_SAVE

Failed to save to the file. Returned just by "File close".

3

FAILED_TO_FIND

Failed to find a value. Returned just by "File find" to indicate failed text search.

4

INVALID_POSITION

The line and/or column parameters do not point to an existing position in the file. Returned by all commands supporting "line" and "column".

Syntax and parameters of each action are described in details below.  

SYNOPSIS

File open [file=<file> [outfile=<output_file>]  [id=<identifier>]
File create [file=<file>]  [id=<identifier>]
* Red colour indicates obligatory parameters

OPTIONS

file=<file>

- The file to open or create. A relative path is resolved against the calling script location (meaning the folder containing the calling script). If the file being opened is a MS Word (*.doc or *.docx) or a PDF (*.pdf) one the content is converted to plain text. Be aware that PDF files often wrap images of text (screen shots, scanned documents) which can not be extracted.  

outfile=<output_file>

- Optional output file. When specified the command creates a copy of the file and applies all changes to it rather than to the source file. A relative path is resolved against the calling script location (meaning the folder containing the calling script). If the parameter is omitted the file is opened in read/write mode. If the file already exists it will be overwritten. 

id=<identifier>

- An identifier (name) for the file. This parameter can be omitted if the script opens/creates just one file at a time. If multiple files are being opened, the identifier is mandatory and identifies the file in the subsequent File commands. 

RETURNS

The open command returns either 0 (SUCCESS) or 1 (FAILED_TO_OPEN). The create command always returns 0 because it creates the file just in memory.  If the command exits with 0 it populates variables from the file_vars_file and Counter variable groups.

EXAMPLES

File open file="data.csv"

- Open a CSV file located in the same directory as the script in read/write mode.

File open file="C:\Data\data.csv" outfile="newdata.csv"

- Open a CSV file located in the specified directory in the read-only mode. When the file is closed, save the content and eventually all changes into the specified output file in the script directory. If the output file exists it will be overwritten.

File create file="C:\Data\log.txt"

- Create a new file content buffer in the memory and associate it with the specified file for output.

File append  [text=<text>]  [id=<identifier>]

* Red colour indicates obligatory parameters

OPTIONS

text=<text>

- Text to append to the end of the file. It may contain any UTF-8 characters. To indicate a line break use "\n". If you need to use the "\n" sequence in the normal text, double the backslash character ("
n").

id=<identifier>

- Identifier of the file to append the text to. It must be equal to the ID specified in the File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command always returns 0 (success). As it changes file size and eventually the number of lines, the command updates variables from the Counter group.

EXAMPLES

File append text="This is one line\nwhile this is another one"

- Append two lines of text, "This is one line" and "while this is another one" to the end of the file.

File append text="screws\\nails"

- Append one line of text, "screws\nails". The backslash character must be in this case doubled because it would be otherwise interpreted as a newline character.

File insert  [text=<text>]  [line=<line_number>]  [column=<column_number>]  [id=<identifier>]
* Red colour indicates obligatory parameters

OPTIONS

text=<text>

- Text to insert into the file. It may contain any UTF-8 characters. To indicate a line break use "\n". If you need to use the "\n" sequence in the normal text, double the backslash character ("\\n").

line=<line_number>

- The line number to insert the text to. Numbering starts at 1. If the line number is out of the range the command fails with the exit code of 4 (INVALID_POSITION).

column=<column_number>

- The column (character number) to insert the text into. Numbering starts at 1. If not specified the command inserts by default to the line beginning (column=1).  If the column is greater than the number of characters on the line the command fails with the exit code of 4 (INVALID_POSITION).

id=<identifier>

- Identifier of the file to insert the text into. It must be equal to the ID specified in the File open or create command.  The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command returns 0 (SUCCESS) or 4 (INVALID_POSITION) if the line and column parameters do not point to an existing position in the file. As it changes file size and eventually the number of lines, the command updates variables from the Counter group. The command also updates the Line variable group to provide information about the line pointed to by the [line, column] coordinates.

EXAMPLES

File read line=2
File insert text=" and potatoes" line=2 column={_FILE_LINE_LENGTH}+1

- Append " and potatoes" to the end of the second line. The "File read" command is called to get the line length (the _FILE_LINE_LENGTH variable).

File find text="bananas"
File insert text=" and potatoes" line={_FILE_LINE_NUMBER} column={_FILE_LINE_COLUMN}+7

- Search for "bananas" and insert the text to create "bananas and potatoes". Note that the example doesn't test whether the find command succeeds.

File find  [text=<text>]  [line=<line_number>]  [column=<column_number>]  [direction=<forward|backward>]  [scope=<line|file>]  [id=<identifier>]
* Red colour indicates obligatory parameters

OPTIONS

text=<text>

- The text to search for. It may contain any UTF-8 characters. To indicate a line break use "\n". If you need to use the "\n" sequence as normal text, double the backslash character ("\\n").

line=<line_number>

- The line number to start the search from. Numbering starts at 1. If the line number is out of the range the command fails with exit code of 4 (INVALID_POSITION).

column=<column_number>

- The column (character number) to start the search from (to be used together with "line"). Numbering starts at 1. If not specified the command searches by default from the line beginning (column=1).  If the column is greater than the number of characters on the line the command fails with exit code of 4 (INVALID_POSITION).

direction=<forward|backward>

- The search mode. The default one is forward and searches from the position specified by \[line, column\] towards the end of file or line. The backward mode searches in the opposite direction from the given position.

scope=<file|line>

- The search scope. The default one is file and searches the whole file or its part. The line scope allows to search just the specified line or its part and the searched text should not contain the new line character.

id=<identifier>

- Identifier of the file which is to be searched. It must be equal to the ID specified in the File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command returns 0 (SUCCESS) if the text is found, 3 (NOT_FOUND) if the text is not found or 4 (INVALID_POSITION) if the line and column parameters do not point to a valid position in the file. If the search succeeds, the Line variable group is updated to provide the target \[line, column\] coordinates.

EXAMPLES

File find text="bananas"
if ({_EXIT_CODE} == 3) {
  Exit 3 
}
File insert text=" and potatoes" line={_FILE_LINE_NUMBER} column={_FILE_LINE_COLUMN}+7

- Search the file forwards for "bananas" and insert the text to create "bananas and potatoes". If the word is not found the script will be terminated with the exit code of 3.

File read  [line=<line_number>]  [column=<column_number>]  [length=<length_in_chars>]  [id=<identifier>]
* Red colour indicates obligatory parameters

OPTIONS

line=<line_number>

- The number of the line to read from. Numbering starts at 1. If the line number is out of the range the command fails with exit code of 4 (INVALID_POSITION).

column=<column_number>

- The column (character number) to read from. Numbering starts at 1. If not specified the command reads from the line beginning (column=1).  If the column is greater than the number of characters on the line the command fails with exit code of 4 (INVALID_POSITION).

length=<length_in_chars>

- Optional length specifying how many characters should be read. The range may not exceed the text line bounds. If the length parameter is not specified the command reads to the end of the specified line (excluding the new line character).

id=<identifier>

- Identifier of the file to read from. It must be equal to the ID specified in the File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command returns 0 (SUCCESS) if the text is located and read successfully or 4 (INVALID_POSITION) if the line and column parameters do not point to a valid position in the file. If successful the extracted text is stored into the _FILE_READ variable. The command also updates the Line variable group to provide information about the line pointed to by the [line, column] coordinates.

EXAMPLES

File find text="bananas" line=2 scope=line
File read line=2 length={_FILE_LINE_COLUMN}
Type "{_FILE_READ}" 

- Find the "bananas" word on the second line, read the text before the word and type it.

File parse  [line=<line_number>]  [delimeter=<delimeter_char>]  [separator=<separator_char>]  [trim=<true|false>]  [id=<identifier>]
File parse  [line=<line_number>]  [pattern=<regular_expression>]  [trim=<true|false>]  [id=<identifier>]
* Red colour indicates obligatory parameters

The command by default reads values from the specified line according to the Comma Separated Values (CSV) specification. The command is compatible with rules specified in the Comma-Separated Values article at Wikipedia and supports multi line values. The parsing mechanism may be in addition customized with optional custom text delimiter, value separator and trimming mode.

When the "pattern" parameter is specified, the command parses the line based on the provided Java-compatible regular expression. This approach takes advantage of the java.lang.String.split() method and it is fundamentally different from the CSV mechanism. For example, to parse individual words separated by a space use regular expression "\s". This mode may not be mixed with CVS parsing and "pattern" cannot be specified at the same time as "delimiter" and/or "separator".

The parsed values are made available through a set of numbered variables (_FILE_PARSE_VALUE1, _FILE_PARSE_VALUE2 ...) and a counter (_FILE_PARSE_COUNT) and may be retrieved in the script through a "for" loop with nested variable names (see the examples section). The command also modifies the line variables and sets the line number to the last processed line. This is an important feature allowing to iterate correctly over lines which may contain multiline values.

OPTIONS

line=<line_number>

- The number of the line to parse. Numbering starts from 1. If the line number is out of the range the command fails with exit code of 4 (INVALID_POSITION).

delimeter=<delimeter_char>

- The character which will serve as text delimiter. If the parameter is not specified it defaults to CSV compatible double quote ( " ).

separator=<separator_char>

- The character which will serve as value separator. If the parameter is not specified it defaults to CSV compatible comma ( , )

pattern=<regular_expression>

- A regular expression expressing the value to separate the values by. This parse mode internally relies on the java.lang.String.split() method. The expression must comply with the Java Pattern specification.

trim=<true|false>

- The value of true trims white spaces from the beginning and end of each parsed value. Be aware that this mode is not CSV compatible and doesn't meet the requirements of RFC 4180. The default value is false (do not trim).

id=<identifier>

- Identifier of the file to read & parse from. It must be equal to the ID specified in the File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command returns 0 (SUCCESS) if the text is located and read successfully or 4 (INVALID_POSITION) if the line and column parameters do not point to a valid position in the file. On success, the command populates the Parse variable group and also updates the Line one with information about the processed line.

EXAMPLES

Let's have a set of data listed as an example on Wikipedia:

1997FordE350ac, abs, moon3000.00
1999ChevyVenture "Extended Edition"
4900.00
1999ChevyVenture "Extended Edition, Very Large"
5000.00
1996JeepGrand CherokeeMUST SELL!
air, moon roof, loaded
4799.00

The corresponding CSV file looks as follows:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

The following script parses the lines one by one and prints out the individual CSV values (to see the results open a text editor on the connected remote desktop). It also calculates and prints out a sum of all prices located usually in the fifth value on the line. Note that we cannot simply iterate over the number of lines in the file because the second last line contains a multiline value.

File open file="data.csv"

# We declare the fifth variable just to supress compiler error in the Eval cmd below
Var sum=0 _FILE_PARSE_VALUE5=0

for (i=1; {i}<{_FILE_LINE_COUNT}; i={i}+1) {
  File parse line={i}
  Typeline "Line #{i}:" 
  for (j=1; {j}<{_FILE_PARSE_COUNT}+1; j={j}+1) {
    Typeline " Value #{j}: {_FILE_PARSE_VALUE{j}}" 
  }

  # Add the car price from column 5 to the sum
  Eval sum={sum}+{_FILE_PARSE_VALUE5}

  # As the parse command updates the Line var group with number of the last
  # processed line, this will alow us to skip lines with multiline values
  Var i={_FILE_LINE_NUMBER}
}
Typeline "Summary value: ${sum}" 


When the script is executed it types the following output on the desktop:

Line #1:
  Value #1: 1997
  Value #2: Ford
  Value #3: E350
  Value #4: ac, abs, moon
  Value #5: 3000.00
Line #2:
  Value #1: 1999
  Value #2: Chevy
  Value #3: Venture "Extended Edition"
  Value #4: 
  Value #5: 4900.00
Line #3:
  Value #1: 1999
  Value #2: Chevy
  Value #3: Venture "Extended Edition, Very Large"
  Value #4: 
  Value #5: 5000.00
Line #4:
  Value #1: 1996
  Value #2: Jeep
  Value #3: Grand Cherokee
  Value #4: MUST SELL!
air, moon roof, loaded
  Value #5: 4799.00
Summary value: $17699

Another example: Let's have a text file with numbers separated by one or more spaces or tabulators:

1  14   23  9   100
117   5  7

To calculate the sum of all numbers into a variable called "count" one would typically use the following script. Note that as the data file is not CSV, it is necessary to use a Java regular expression "\s".

File open file="C:\numbers.txt" 
Eval count=0
Var _FILE_PARSE_COUNT=0 _FILE_PARSE_VALUE1=0

for (i=1; {i}<{_FILE_LINE_COUNT}+1; i={i}+1) {
  File parse line={i} pattern="\s"
  for (j=1; {j}<{_FILE_PARSE_COUNT}+1; j={j}+1) {
     Eval count={count}+{_FILE_PARSE_VALUE{j}}
  }
}

File delete  [line=<line_number>]  [column=<column_number>]  [length=<length_in_chars>]  [id=<identifier>]
* Red colour indicates obligatory parameters

OPTIONS

line=<line_number>

- The line number to delete. Numbering starts at 1. If the line number is out of the range the command fails with exit code of 4.

column=<column_number>

- The column (character number) to delete the text from. Numbering starts at 1. If not specified the command deletes from the beginning of the line (column=1). If the column is greater than the number of characters on the line the command fails with exit code of 4.

length=<length_in_chars>

- Optional length specifying how many characters should be deleted. The resulting delete area may exceed the text line bounds and in such a case the delete operation is applied to the terminating newline character and then to the following line or lines. If the length exceeds the file size, all the content file content after the specified position is deleted. If the length parameter is not specified the command deletes just to the end of the specified line including the new line character.

id=<identifier>

- Identifier of the file to read & parse from. It must be equal to the ID specified in the File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The command returns 0 (SUCCESS) if the text is located and deleted successfully or 4 (INVALID_POSITION) if the line and column parameters do not point to a valid position in the file. The command saves the deleted text to the _FILE_DELETED variable. As the delete operation changes file size and eventually the number of lines, it updates variables from the Counter group as well as the Line one.

EXAMPLES

File delete line="1"

- Delete the first line (including the newline character).

File delete line="2" length={_FILE_LENGTH}

- Delete everything from the second line to the end of the file and leave just the first line.

for (i=1; {i}<{_FILE_LINE_COUNT}; i={i}+1) {
  File delete line={i} length=10
}

- Delete first 10 characters on each line. 

File read line=1
File delete line="1" column={_FILE_LINE_LENGTH}+1 length=1

- Remove the newline character located at the end of the first line and join the first and second line.

File close  [id=<identifier>]  [save=<true|false>]
* Red colour indicates obligatory parameters.

OPTIONS

save=<true|false>

- True saves the file to the file system, false discards any changes. The default value is "true". The file is saved only if it has been modified by the script and/or if another output file was specified.

id=<identifier>

- Identifier of the file to close. It must be equal to the ID specified in the previous File open or create command. The parameter doesn't have to be specified if the script opens/creates just one file and no ID is specified in the open/create command.

RETURNS

The open command returns either 0 (SUCCESS) or 2 (FAILED_TO_SAVE) on an I/O error. It also clears up all File specific variables from the context.

EXAMPLES

File open file=test.txt
...
File close 

- Close the file. If the content has been modified, save the changes to the test.txt file.

File open file=test.txt outfile=test2.txt
...
File close 

- Close the file. The content loaded from test.txt will be written to test2.txt regardless of whether it has been modified or not.

File open file=test.txt id="testfile"
...
File close id="testfile" save=false

- Close the file and discard any eventual changes. As the "testfile" ID was assigned to the file in "File open", it must be specified in the "File close" one as well as in any other File call between these two commands.