Excel
3.4.1 Excel
DESCRIPTION
Excel - Read from and/or write to Microsoft Excel (.xls) and (.xlsx). The command also supports files with macros (.xlsm) since 6.2.2. Macros can't be updated but they are preserved on file save.
The command allows to navigate through XLS and XLSX documents, read, write and search for cell values and deal with a subset of Excel supported formats. The command takes advantage of the Apache POI 4.0.1 framework and as such, it is subject to its limitations. The POI libraries are placed in the <installDir>/libs/poi
folder and may be upgraded to a custom version eventually.
The command defines this set of actions:
- Open - opens an existing Excel document in read/write mode.
- Create - create a new Excel document and/or create a sheet.
- Select - select a sheet and/or select a cell and retrieve its value and format.
- Find - search the selected sheet for a cell by its value or a regular expression.
- Set - set sheet name or cell value (format).
- Copy - mark cells for copying.
- Paste - paste cells marked with Copy into an Excel document.
- Close - close and save the Excel document.
A typical workflow consists of the following logical steps:
- Open an existing Excel document ("Excel open") or create a new one ("Excel create"). If the script works with multiple documents at a time, each file must be tagged by an ID and all subsequent Excel commands must reference it. This step populates variables with the file name, path and structure.
- Select (open) or create a spreadsheet. Opening of an Excel document through "Excel open" automatically selects the first spreadsheet in the workbook unless the sheet is explicitly specified by the parameter. Creating a new document through "Excel create" automatically creates and selects one default spreadsheet with the default "Sheet1" name unless another name is explicitly specified. To change the spreadsheet selection later on call "Excel select". To create and select a new spreadsheet use "Excel create". Each change in spreadsheet selection populates the Sheet variable group describing the number of spreadsheets available and the name and number of the currently selected one.
- To read a cell called "Excel select" or search for a cell value using "Excel find". If the cell is found successfully, its value, type and format are provided in the form of script variables. If the cell contains a formula you may retrieve either its text or the resulting value depending on the "evaluate" parameter. Like sheets, the cell reference is cached and a sequence of commands operating over a single cell doesn't have to specify the coordinates (either row/cell number or reference) in each command instance.
- To modify a cell using the "Excel set" command. If the format is not explicitly specified, the command checks for the number or Boolean values and eventually defaults to string. If the coordinates point to a cell which doesn't exist, it is created.
- To select cells and paste them into an Excel document (the same or another one) use a sequence of "Excel copy" and "Excel paste" commands.
- To close a document and save or discard changes call the "Excel close" command. This step is optional and all documents are closed and eventually saved automatically when the script finishes.
Excel documents can be read from or written to shared network drives. A special attention should be paid to permissions. For example, to create a file in a shared folder called "data" on a network drive called "macbook.local" use:
Excel create file="\\\\macbook.local\\data\\myfile.xls"
A typical example showing both read and write operations follows. It opens an existing file Excel 2007 file, selects the second sheet and reads a value from the first cell and writes it to the second one for every row.
// Open the Excel file. Exit on any I/O error.
Excel open file="C:\data\test.xlsx"
if ({_EXIT_CODE} > 0) {
Exit {_EXIT_CODE} desc="{_EXCEL_ERROR}"
}
// Select the second sheet
Excel select sheet=1
// Iterate over all data lines
for (index=1; {index}<{_EXCEL_SHEET_ROWS}+1;index={index}+1) {
# Read value from the first cell
Excel select row={index} column="1"
# Set value of the second cell
Excel set row={index} column="2" value="{_EXCEL_CELL_VALUE}"
}
The Excel command populates these variables:
Who Creates and When | Variable Name | Description |
---|---|---|
Excel open, | _EXCEL_FILE | Document file path (full/absolute). |
_EXCEL_FILENAME | Document file name. | |
_EXCEL_OUTFILE | Output file path (full/absolute) if it was explicitly specified. | |
_EXCEL_OUTFILENAME | Output file name if it was explicitly specified. | |
_EXCEL_SHEET_COUNT | Number of spreadsheets available in the document. | |
Excel open | _EXCEL_SHEET_NAME | Name of the currently selected sheet. |
_EXCEL_SHEET_NUMBER | Number of the currently selected sheet. | |
_EXCEL_SHEET_ROWS | Number of rows available in the currently selected sheet. | |
Excel set | _EXCEL_CELL_COLUMN | Ordinary column number of the selected/found/modified cell. |
_EXCEL_CELL_ROW | Ordinary row number of the cell. | |
_EXCEL_CELL_TYPE | Cell type; one of [BLANK, BOOLEAN, ERROR, FORMULA, NUMERIC or STRING] | |
_EXCEL_CELL_VALUE | Cell value (as a string). Since 4.1.3 the value is formatted if the cell is configured so (date/number/currency format, ...) | |
_EXCEL_CELL_VALUE_RAW | Raw cell value (unformatted). Supported since 4.1.3. | |
_EXCEL_CELL_REF | Excel compatible cell reference, for example, "A1" or "J10". | |
_EXCEL_CELL_FORMAT | Cell number or date format string (since 6.2.3). | |
_EXCEL_CELL_WIDTH | Cell (column) width as a number of visible characters. The resulting pixel value is subject to the font and eventual cell margins. | |
_EXCEL_CELL_HEIGHT | Cell (row) height in points (since 6.3.3). The resulting pixel value is subject | |
_EXCEL_CELL_FOREGROUND _EXCEL_CELL_BACKGROUND | Cell font (foreground) and fill (background) colors in the HTML format (since 6.3.3). For example, red font on white background will show as "ff0000" and "ffffff". | |
All Excel commands | _EXCEL_ERROR | Text of the error thrown by the last Excel command execution or compilation. This variable is being created since v2.3 and allows to track I/O errors resulting from Excel processing. |
The command in general returns 0 on success or one of the following exit codes:
Exit Code | Pseudocode | Description |
---|---|---|
0 | SUCCESS | Successful completion. |
1 | FAILED_TO_OPEN | Failed to open the input file. Returned just by "Excel open". |
2 | FAILED_TO_SAVE | Failed to save to the file. Returned just by "Excel close". |
3 | FAILED_TO_CREATE | Failed to create a new document or sheet. Returned just by "Excel create". |
4 | SHEET_NOT_FOUND | The row and/or column parameters do not point to an existing cell. Returned by all cell-reading commands supporting "row" and "column". |
5 | CELL_NOT_FOUND | Failed to find a cell with the given coordinates or with the specified value ("Excel find"). |
Syntax and parameters of each action are described in details below.
SYNOPSIS
Excel open [file=<input_Excel _file>] [outfile=<output_Excel_file>] [id=<identifier>] [sheet=<sheet_name_or_index>] [password=<password>]
* Red colour indicates obligatory parameters
OPTIONS
file=<Excel_file>
- The Excel file to open. The relative path is resolved against the script directory (meaning the folder containing the calling script).
outfile=<output_Excel_file>
- Optional output file to save the Excel data to (regardless of whether it has been changed or not). When this parameter is specified, the input file specified by file is opened in read-only mode, its data is loaded into the memory and saved to outfile unless the file is closed in the discard mode (Excel close save=false). The input and output files must be of the same format (both must be either XLS or XLSX).
id=<identifier>
- An identifier (name) for the Excel document. This parameter doesn't have to be specified if the script opens/creates just one Excel document at a time. If multiple documents are being opened the identifier identifies the document in the subsequent Excel read/write commands.
sheet=<sheet_name_or_index>
- The spreadsheet to select. It may be specified either by its name or ordinary number starting from 1. This parameter is optional and if it is not present, the command selects (opens) the first spreadsheet by default.
password=<password>
- The password to unlock and/or lock the file (since 6.1). If the file is not protected and the password is specified it will become protected after on save.
RETURNS
The open command returns either 0 (SUCCESS) or 1 (FAILED_TO_OPEN). On success, it populates variables from the File and Sheet variable group.
EXAMPLES
Excel open file="data.xls"
- Open an MS Excel document located in the same directory as the script in read/write mode.
Excel open file="data.xls" outfile="newdata.xls"
- Open an MS Excel document located in the specified directory in the read-only mode. When the document is closed, save the content and eventually all changes into the specified output file in the script directory. If the output file exists it will be overwritten.
Excel create file=<Excel _file> [id=<identifier>] [sheet=<sheet_name>] [password=<password>]
Excel create sheet=<sheet_name> [id=<identifier>]
* Red colour indicates obligatory parameters
OPTIONS
file=<Excel_file>
- The Excel file to create. The relative path is resolved against the script directory (meaning the folder containing the calling script). If the file exists, it is overwritten.
id=<identifier>
- An identifier (name) for the created document. This parameter doesn't have to be specified if the script opens/creates just one Excel document at a time. If multiple documents are being opened the identifier identifies the document in the subsequent Excel read/write commands.
sheet=<sheet_name>
- Name of the spreadsheet to create. If the command is called with the file parameter specified to create a new file, the sheet parameter is optional. If it is not present, the command behaves the same way as MS Excel and creates a default spreadsheet called "Sheet1" in the newly created file.
password=<password>
- The password to lock the file on save (since 6.1).
RETURNS
The open command returns either 0 (SUCCESS) or 3 (FAILED_TO_CREATE) on failure, for example when a sheet of the specified name already exists. The command populates variables from the File and Sheet variable groups.
EXAMPLES
Excel create file="C:\Data\log.xls"
- Create a new Excel document in the memory and associate it with the specified file for output. The command will also create a sheet called "Sheet1" and select it.
Excel create file="C:\Data\log.xls" sheet="Data"
- Same as the previous example but the sheet will be named "Data".
Excel create sheet="Data"
- Create a new sheet named "Data" in the currently opened document.
Excel select [sheet=<sheet_name_or_index>] [id=<identifier>]
Excel select [row=<row_number>] [column=<column_number_or_id>] [sheet=<sheet_name_or_index>] [id=<identifier>]
Excel select [ref=<cell_id>] [sheet=<sheet_name_or_index>] [id=<identifier>]
* Red colour indicates obligatory parameters
OPTIONS
row=<row_number>
column=<column_number_or_id>
- Row and column ordinary numbers identifying the cell to select. Numbering starts at 1. The column may be in addition referenced by alphabetical ID used in MS Excel ("A"-"ZZ"). For example, "column=C" is equal to "column=3". The "row" and "column" parameters must be always present together. The parameters are mandatory in the first cell operation to specify the "current" cell. The reference is then cached and sequences of commands operating over a single cell do not have to repeat the "column"/"row" or "ref" coordinates.
ref=<cell_id>
- Cell ID compatible with MS Excel referencing, for example "A1" or "J10". It is provided as an alternative way of cell referencing. The parameter should not be used together with the row/column parameter pair.
sheet=<sheet_name_or_index>
- The spreadsheet to select (open). It may be specified either by its name or ordinary number starting from 1. If the command is being used just to select a sheet (the first form in the list above), this parameter is mandatory. Other command forms targeting a particular cell do not require it and if it is not present, the command will operate over the last selected sheet.
evaluate=<true|false>
- A flag controlling how FORMULA cells are handled. If it is "true" the cell value will be the calculated formula result. Otherwise, the cell value stored under the _EXCEL_CELL_VALUE will contain the formula text. If the targeted cell is not a FORMULA one, this parameter is ignored. The default value is "false".
id=<identifier_or_name>
- The document identifier to apply the operation to. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
RETURNS
The open command returns either 0 (SUCCESS), 4 (SHEET_NOT_FOUND) or 5 (CELL_NOT_FOUND). The command populates variables from the Sheet and Cell variable groups.
EXAMPLES
Excel select sheet="2"
- Select second sheet in the document and store its properties into the Sheet variables.
Excel select sheet="Data"
- Select sheet called "Data" in the document.
Excel select row="2" column="4"
- Select cell on the specified position in the current sheet and retrieve its properties into the Cell variables.
Excel select sheet="Results" ref="A5"
- Select the "Results" sheet and then select the cell on the specified position (fifth row, first column). Both the Sheet and Cell variable groups will be updated with the sheet/cell properties.
Excel find [value=<value>] [type=<type>] [sheet=<sheet_name_or_index>] [evaluate=<true|false>] [id=<identifier>]
Excel find [pattern=<regular_expression>] [type=<type>] [sheet=<sheet_name_or_index>] [evaluate=<true|false>] [id=<identifier>]
* Red colour indicates obligatory parameters
OPTIONS
value=<value>
- The string representation of the value to search for. The current sheet will be searched row by row for a cell having the specified value and the first matching one will be selected.
pattern=<regularExpression>
- Search for a cell value which matches a regular expression. The expression must comply with the Java Pattern specification.
type=<type>
- Type of cell to limit the search to. It must be one of \[BLANK, BOOLEAN, ERROR, FORMULA, NUMERIC, STRING\]. If the parameter is not specified all cell types are searched.
sheet=<sheet_name_or_index>
- The spreadsheet to select (open). It may be specified either by its name or ordinary number starting from 1. This parameter is optional and if it is not present the command will operate over the last selected sheet.
evaluate=<true|false>
- Flag controlling how FORMULA cells are handled. If it is "true" the cell value will be the calculated formula result. Otherwise the specified value will be compared to the formula text. The default value is "false".
id=<identifier_or_name>
- The document identifier to search. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
RETURNS
The open command returns either 0 (SUCCESS), 4 (SHEET_NOT_FOUND) or 5 (CELL_NOT_FOUND). If the cell is located successfully the command populates variables from the Sheet and Cell variable groups.
EXAMPLES
Excel find value="Test data"
if ({_EXIT_CODE} > 0) {
Warning "The \"Test data\" cell was not found."
Exit 1
}
Excel select row={_EXCEL_CELL_ROW}+1 column={_EXCEL_CELL_COLUMN}
- Search the current spreadsheet for a cell containing the text "Test data". If successful select the cell below the found one. If the search fails, record a warning and exit the script with the exit code of 1.
Excel find value="2" evaluate="true"
- Search the current sheet for any numeric cell containing the number of 2 or any text cell containing "2". As the evaluate flag is on, each cell of the FORMULA type will be calculated (evaluated) and compared to the value of 2.
Excel find sheet="Data" pattern="boo.*"
- Select sheet called "Data" in the document and search it for the first value starting with "boo".
Excel find type=FORMULA pattern="SUM\(.*\)"
- Search the current sheet for the first cell of FORMULA type containing the summary formula and retrieve its properties into the Cell variables. As the "evaluate" flag is not specified and defaults to false, the _EXCEL_CELL_VALUE will be populated with the formula text.
Excel find type=FORMULA pattern="SUM\(.*\)"
evaluate="true"
- Same as the previous example. The _EXCEL_CELL_VALUE will, however, contain the result of the formula, i.e. a number equal to the sum of the cells specified in the formula text.
Excel set [row=<row_number>] [column=<column_number_or_id>] [sheet=<sheet_name_or_index>] [id=<identifier>] [type=<type>] [value=<value>] [informat=<javaFormat>] [format=<excelFormat>]
Excel set [ref=<cell_id>] [sheet=<sheet_name_or_index>] [id=<identifier>] [type=<type>] [value=<value>] [informat=<javaFormat>] [format=<excelFormat>] [width=<columnWidth>] [height=<rowHeight>] [fg=<htmlColor>] [bg=<htmlColor>]
* Red colour indicates obligatory parameters
OPTIONS
row=<row_number>
column=<column_number_or_id>
- Row and column ordinary numbers identifying the cell to select. Numbering starts at 1. The column may be in addition referenced by alphabetical ID used in MS Excel ("A"-"ZZ"). For example, "column=C" is equal to "column=3". The "row" and "column" parameters must be always present together. The parameters are mandatory in the first cell operation to specify the "current" cell. The reference is then cached and sequences of commands operating over a single cell do not have to repeat the "column"/"row" or "ref" coordinates.
ref=<cell_id>
- The cell ID compatible with MS Excel, for example, "A1" or "J10". It is provided as an alternative way of cell referencing. The parameter should not been used together with the row/column parameter pair.
sheet=<sheet_name_or_index>
- The spreadsheet to select (open). It may be specified either by its name or ordinary number starting from 1. If the command is being used just to select a sheet (the first command syntax form in the list above), this parameter is mandatory. Other command forms targeting a particular cell do not require it and if it is not present, the command will operate over the last selected sheet.
type=<type>
- Type of cell to set. It must be one of [BLANK, BOOLEAN, ERROR, FORMULA, NUMERIC, STRING]. If the parameter is not specified, the type defaults to STRING.
value=<value>
- Cell value to set. It must be aligned with the cell type. For example, when the cell type is NUMERIC, the value should be a valid number. If the "type" parameter is not specified, the command guesses whether it is a number, a Boolean value or a string and sets the cell type accordingly. To override this behaviour specify the type explicitly.
The value may contain a numeric expression but the type must be explicitly set to NUMERIC to tell the compiler that numeric evaluation is needed. To create and populate a cell of the FORMULA type use the type="FORMULA" parameter and specify the formula with or without the leading equals character as the value. For example, both "SUM(A1:A4)" and "=SUM(A1:A4)" are valid formulas.
informat=<javaFormat>
- Input Java date format to parse the value with (since 6.2.3). This parameter indicates that the value parameter contains a date and/or time and should be parsed appropriately.
MS Excel stores date values to NUMERIC cells as the number of days elapsed since 1 January 1900. To populate a date cell you may calculate the number on your own and specify it in the value parameter. An easier approach is use a combination of value and informat to populate the date from a human readable date string. For example, the parameters of value="21 Jan 1900" and informat="dd MMM yyyy" will populate the cell with the number of 21.
To make MS Excel display the cell as a formatted number or a date/time use the format parameter.
format=<excelFormat>
- The cell number format as specified by the MS Excel documentation (since 6.2.3) It allows to change the appearance of numbers, including dates and times, without changing the actual number. The format does not affect the cell value that Excel uses to perform calculations.
width=<columnWidth>
- The target column width as a number of visible characters (since 6.3.3). The resulting pixel size on the screen is then subject to the font and eventual cell margins. The default Excel applied column width is set to 8.43 characters.
height=<rowHeight>
- The target row height in points (since 6.3.3).
fg=<htmlColor>
- The cell font (foreground) color to be set (since 6.3.3). The value may be an HTML-style 6-character RGB hexadecimal format with or without the leading hash character (white is '000000' of '#000000', black is 'ffffff' of '#ffffff') or a semicolon separated list of decimal RGB components ('0;0;0' for white, '255;255;255' for black).
bg=<htmlColor>
- The cell fill (background) color to be set (since 6.3.3). The value may be an HTML-style 6-character RGB hexadecimal format with or without the leading hash character (white is '000000' of '#000000', black is 'ffffff' of '#ffffff') or a semicolon separated list of decimal RGB components ('0;0;0' for white, '255;255;255' for black). Setting of a background color also resets the cell pattern to a solid color.
id=<identifier_or_name>
- The document identifier to apply the set operation to. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
RETURNS
The open command returns either 0 (SUCCESS), 4 (SHEET_NOT_FOUND) or 5 (CELL_NOT_FOUND). If the cell is located successfully the command sets the value and/or cell type and populates variables from the Sheet and Cell variable groups.
EXAMPLES
Excel set ref=A5 value="Test data"
- Set value of the A5 cell in the current sheet to "Test data". The cell type will be STRING.
Excel set row=A5 column="A" value="2" sheet="Results"
- Select the "Results" sheet and set value of the A1 cell to 2. As the value is apparently a numeric one the cell type will be automatically set to NUMERIC.
# Declare the variable as numeric to supress compiler error
Var _EXCEL_CELL_VALUE=1
Excel set row=1 column=A value="2"
Excel set value={_EXCEL_CELL_VALUE}+1 type=NUMERIC
- Set the A2 cell value to 2 and then increment it to 3. Declaration of the _EXCEL_CELL_VALUE is needed only to suppress the compiler error when trying to evaluate the value.
Excel set row=5 column=1 type=FORMULA value="SUM(A1:A4)"
Excel select evaluate=true
- Set value of the cell at A5 to the summary formula of the A1-A4 cells and get the result.
Excel copy [rows=<row(s)>] [columns=<column(s)>] [sheet=<sheet_name_or_index>] [id=<identifier>]
* Red colour indicates obligatory parameters
OPTIONS
rows=<row(s)>
columns=<column(s)>
- The row(s) and column(s) identifying the cells to be selected for the subsequent Excel paste operation. Rows can be specified as a single number ("1"), a range ("1-5") or a comma or semicolon separated list of numbers and/or ranges ("1;3;5-6"). Columns support the same syntax where letters may be used instead of numbers ("A;C;E-F"). Numbering starts from 1.
The "rows" and "columns" parameters must be always present together. The command selects all cells at the specified rows and columns. For example, the parameters of "rows=1-2" and "columns=A-C" will select 6 cells at A1, A2, B1, B2, C1 and C2. As empty positions are ignored the specified ranges may exceed the actual sheet size.
The command does not make any internal copy of the data from the selected cells. It merely saves a reference to the selection. If the selected cells get changed between the copy and paste operations the updated data will be copied.
sheet=<sheet_name_or_index>
- The spreadsheet to select the cells from. It may be specified either by its name or ordinary number starting from 1. If it is not specified the command will fall back to the last selected (active) sheet.
id=<identifier_or_name>
- An identifier of the document to copy from. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
RETURNS
The copy command returns either 0 (SUCCESS) or throws a syntax error when the operation can not be completed for a runtime error.
EXAMPLES
Excel open file="source.xls" id="source"
Excel copy rows=1-3 columns=A id="source"
Excel open file="target.xls" id="target"
Excel paste ref=B1 id="target"
- Copy cells A1, A2 and A3 from the source.xls file into the B1, B2 and B3 cells of the target.xls file.
Excel paste [row=<row_number>] [column=<column_number_or_id>] [sheet=<sheet_name_or_index>] [id=<identifier>]
Excel paste [ref=<cell_id>] [sheet=<sheet_name_or_index>] [id=<identifier>]
* Red colour indicates obligatory parameters
OPTIONS
row=<row_number>
column=<column_number_or_id>
- Target position to paste the cells previously selected with Excel copy to. Numbering starts from 1. For example, if cells A1 and A2 were copied and the target position is set to "row=2" and "column=3" the data will be pasted to the C2 and C3.
If a pasted cell contains a formula its references are shifted with regards to the target location. For example, if you copy a column of numbers starting at A1 and their sum ("=SUM(A1:A5)") and paste it into the B2 cell the formula gets updated to "=SUM(B2:B6)". Note that the command doesn't make any effort to recognize external references to other sheets or cells outside of the selected data which will lead to broken references. To copy such cells use the Excel set command.
ref=<cell_id>
- The target cell ID compatible with MS Excel to paste to, for example, "A1" or "J10". It is provided as an alternative way of cell referencing. The parameter should not be used together with the row/column parameter pair.
sheet=<sheet_name_or_index>
- The spreadsheet to paste to It may be specified either by its name or ordinary number starting from 1. If it is not specified the command will fall back to the last selected (active) sheet.
id=<identifier_or_name>
- An identifier of the document to copy from. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
RETURNS
The paste command returns either 0 (SUCCESS) or throws a syntax error when the operation can not be completed for a runtime problem.
EXAMPLES
See the Excel copy command for an example.
Excel close [id=<identifier>] [save=<true|false>]
* Red colour indicates obligatory parameters
OPTIONS
id=<identifier_or_name>
- The identifier of the document which is to be closed. This parameter doesn't have to be specified if the script opens/creates just one Excel document and no ID was specified in the open/create command.
save=<true|false>
- True saves the document to the file system, false discards any changes. The default value is "true" and the file will be saved automatically after the script execution finishes provided that there were any changes.
RETURNS
The open command returns either 0 (SUCCESS) or 2 (FAILED_TO_SAVE) on an I/O error. It also clears up all Excel specific variables from the context.
EXAMPLES
Excel open file=test.xls
...
Excel close
- Close the file. If the content has been modified, save the changes to the test.xls file.
Excel open file=test.xls outfile=test2.xls
...
Excel close
- Close the file. The content loaded from test.xls will be written to test2.xls regardless of whether it has been modified or not.
Excel open file=test.xls id="testfile"
...
Excel close id="testfile" save=false
- Close the file and discard any eventual changes. As the "testfile" ID was assigned to the file in "Excel open", it must be specified in the "Excel close" one as well as in any other Excel call between these two commands.