Qore FixedLengthUtil Module Reference
1.4
|
The FixedLengthUtil module provides functionality for parsing files with fixed length lines. This means that we have at least one line type and each line type is described as several data items with fixed length.
To use this module, use "%requires FixedLengthUtil"
in your code.
All the public symbols in the module are defined in the FixedLengthUtil namespace.
Currently the module provides the following classes:
Furthermore, the following specialized classes are implemented based on the above and are provided for convenience and backwards-compatibility:
Valid options are:
"date_format"
: the default date format for "date"
fields (see date formatting for the value in this case)"encoding"
: the output encoding for strings parsed or returned"eol"
: the end of line characters for parsing or generation"file_flags"
: additional writer File Open Constants; Qore::O_WRONLY | Qore::O_CREAT are used by default. Use eg. Qore::O_EXCL to ensure not to overwrite the target or Qore::O_TRUNC to replace any existing file"ignore_empty"
: if True then ignore empty lines"number_format"
: the default number format for "float"
or "number"
fields (see Qore::parse_number() and Qore::parse_float() for the value in these cases)"timezone"
: a string giving a time zone region name or an integer offset in seconds east of UTC"truncate"
: The flag controls whether to truncate an output field value if its bigger than its specified length. Default is "False"
."tab2space"
: Controls whether to replace tabs with spaces and its value determines how many spaces to output in place of one tab character.Fixed length specification hash is in the form of a hash, where each hash key is the name of a record, and each value is a record description hash describing the record; see the following example:
In the example above, "header"
, "line"
, and "trailer"
are record names, and the values of each key are record description hashes.
Each record will have a number of fields described in the record description hash. The record description hash keys represent the names of the fields, and the values are field specification hashes.
In the "header"
record in the example above, the fields are "flow_type"
, "record_type"
, and "number_of_records"
, and the values of each of those keys are field specification hashes for the given fields. As the "header"
and "trailer"
have equal line length, extra configuration is required to resolve the record type; in the example above this is configured using the "value"
key of the field specification hashes for the "flow_type"
records.
The field specification hash has the following format:
Key | Type | Description |
length | integer | the size of the field in bytes |
type | string | the type of data bound to the field Field Data Types |
format | string | a date mask if the type of the field is "date" ; see date formatting for more information |
timezone | string | override global timezone for current "date" field |
padding | string | set padding of the field "left" (default) or "right"; used only in writers; if not given then the default padding depends on the field's type: "int" fields get left padding (right justification) and all others get right padding (left justification) |
padding_char | string | a string with size 1 to use for padding . Default " " (space). Used only in writers |
value | string | the value to use to compare to input data when determining the record type; if "value" is defined for a field, then "regex" cannot be defined |
regex | string | the regular expression to use to apply to input data lines when determining the record type |
default | string | In writer the value is default output value when value is not specified in record data. |
truncate | boolean | The flag controls whether to truncate output field value if its bigger than specified length. Default is "False" . |
tab2space | integer | Controls whether to replace tabs with spaces and its value determines how many spaces to output in place of one tab character. |
The following values can be used as a field type:
"date"
"float"
"int"
"number"
"string"
If no record type resolution rules or logic is defined, then record types are resolved automatically based on their unique line lengths. If the record line lengths are not unique (i.e. two or more records have the same number of characters), then a rule must exist to resolve the record type.
Typically the value of the first field determines the record type, however any field in the record can be used to determine the record type or even multiple fields could be used. Record type detection configuration is supplied by the "value"
(field value equality test) or "regex"
(regular expression test) keys in the field specification hash for the record in question. If multiple fields in a record definintion have "value"
or "regex"
keys, then all fields must match the input data in order for the input line to match the record.
The above record type resolution logic is executed in FixedLengthAbstractIterator::identifyTypeImpl(), which executes any "regex"
or "value"
tests on the input line in the order of the field definitions in the record description hash.
Record type resolution is performed as follow:
"value"
: Matches the full value of the field; if an integer "value"
value is used, then integer comparisons are done, otherwise string comparisons are performed."regex"
: Matches the input line string starting at the first character in the field to the rest of the line (i.e. not truncated for the current record); this enables regular expression matching against multiple columns if needed.When there are no record-matching keys in the field hashes for any record and the input record character lengths are not unique, then FixedLengthAbstractIterator::identifyTypeImpl() must be overridden in a subclass to provide custom record matching logic.
"regex"
and "value"
keys in a field specification hash"regex"
and "value"
keys), then all fields with this configuration must match for the record to be matchedInput and output data are formatted in a hash with two mandatory keys:
"type"
: a string with name of the type"record"
: a hash with line data in field - value mapExample of reading:
Example of writing:
FileLocationHandler
module (issue 4456)