Qore CsvUtil Module Reference
1.7
|
the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated More...
Public Member Methods | |
constructor (AbstractLineIterator li, *hash opts) | |
creates the AbstractCsvIterator with an option hash in single-type mode More... | |
constructor (AbstractLineIterator li, hash spec, hash opts) | |
creates the AbstractCsvIterator with an option hash in multi-type mode More... | |
*list< string > | getHeaders () |
Returns the current record headers or NOTHING if no headers have been detected or saved yet. More... | |
*list< string > | getHeaders (string type) |
Returns a list of headers for the given record or NOTHING if the record is not recognized. More... | |
string | getQuote () |
Returns the current quote string. More... | |
string | getRawLine () |
Returns the current line 'as it is', i.e. the original string. More... | |
list< *string > | getRawLineValues () |
Returns the list of raw string values of the current line. More... | |
hash< auto > | getRecord (bool extended) |
Returns the current record as a hash. More... | |
hash< auto > | getRecord () |
Returns the current record as a hash. More... | |
auto | getRecordList () |
Returns the current record as a list. More... | |
*hash< string, AbstractDataField > | getRecordType () |
Returns the description of the record type, if any. | |
string | getSeparator () |
Returns the current separator string. More... | |
hash< auto > | getValue () |
Returns the current record as a hash. More... | |
string | identifyType (list< auto > rec) |
Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary. More... | |
int | index () |
Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped. More... | |
int | lineNumber () |
Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element. More... | |
auto | memberGate (string name) |
Returns the given column value for the current row. More... | |
bool | next () |
Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate. More... | |
peek () | |
Reads a single row without moving the index position. More... | |
Private Member Methods | |
list< *string > | getLineAndSplit () |
Read line split by separator/quote into list. | |
*string | identifyTypeImpl (list< auto > rec) |
Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty. More... | |
hash< auto > | parseLine () |
Parses a line in the file and returns a processed list of the fields. | |
prepareFieldsFromHeaders (*list headers) | |
match headers provided at csv header or in options, never called for multi-type because header_names is False | |
processCommonOptions (*hash opts, int C_OPTx) | |
process common options and and assing internal fields | |
processSpec (hash spec) | |
process specification and assing internal data for resolving | |
Private Attributes | |
const | Options |
valid options for the object (a hash for quick lookups of valid keys) | |
the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated
The AbstractCsvIterator class constructor takes an optional hash with possible keys given in the following table. Note that key names are case-sensitive, and data types are soft (conversions are made when possible).
AbstractCsvIterator Options
Option | Data Type | Description |
"date_format" | string | the default date format for "date" fields (see date formatting for the value in this case) |
"encoding" | string | the character encoding for the file (and for tagging string data read); if the value of this key is not a string then it will be ignored |
"eol" | string | the end of line character(s) (default: auto-detect); if the value of this key is not a string then it will be ignored |
"header_lines" | int | the number of headers lines in the file (must be > 0 if "header_names" is True) |
"header_names" | bool | if True then the object will parse the header names from the first header row, in this case "header_lines" must be > 0. In case of multi-type lines "header_names" is mandatory False. |
"header_reorder" | bool | if True (default value) then if "headers" are provided by options or read from file then data fields are reordered to follow headers. It has a major effect on the return value of AbstractCsvIterator::getRecordList() and also a minor effect on the hash result of AbstractCsvIterator::getRecord() when a program depends on the order of keys. If this value is False then fields not yet specified are pushed at the end of the field definition. |
"ignore_empty" | bool | if True (the default) then empty lines will be ignored; this option is processed with parse_boolean() |
"ignore_whitespace" | bool | if True (the default) then leading and trailing whitespace will be stripped from non-quoted fields; this option is processed with parse_boolean() |
"number_format" | string | the default format for "int" , "float" , and "number" fields as a string giving the thousands separator character followed by the decimal separator character (ex: ".," for continental-European-style numbers) |
"quote" | string | the field quote character (default: '"' ) |
"separator" | string | the string separating the fields in the file (default: "," ) |
"timezone" | string | the timezone to use when parsing dates (will be passed to Qore::TimeZone::constructor()) |
"tolwr" | bool | if True then all header names will be converted to lower case letters |
"verify_columns" | bool | if True (the default is False) then if a line is parsed with a different column or field count than other lines, a CSVFILEITERATOR-DATA-ERROR exception is thrown |
AbstractCsvIterator Single-type-only Options
Option | Data Type | Description |
"headers" | list of strings | list of header / column names for the data iterated; if this is present, then "header_names" must be False. |
"fields" | Hash | the keys are field names as given by the header_names or headers option (in case neither of these options are used, then field names are numbers starting with "0" ) and the values are either strings (one of Option Field Types giving the data type for the field) or a Option Field Hash describing the field; also sets headers if not set automatically with "header_names" ; if no field type is given, the default is "*string" ; note that invalid field names given in this option are ignored |
AbstractCsvIterator Multi-type-only Options
Option | Data Type | Description |
"extended_record" | Boolean | if True then get functions will use extended hash with "type" and "record" members to provide type to calling party, Default: False |
"date-format"
"ignore-empty"
"ignore-whitespace"
"header-names"
"header-lines"
"verify-columns"
Fields are defined in order how the data are expected by user program. In this order are data returned by get functions. There are two exception, the former "headers"
options sorts definition that data correspond to "headers"
field order and the later when header names are read from Csv file header.
AbstractCsvIterator Option Field Types
Name | Description |
"int" | the value will be unconditionally converted to an integer using the Qore::int() function |
"*int" | the value will be converted to NOTHING if empty, otherwise it will be converted to an integer using the Qore::int() function |
"float" | the value will be unconditionally converted to a floating-point value using the Qore::float() function |
"*float" | the value will be converted to NOTHING if empty, otherwise it will be converted to a floating-point value using the Qore::float() function |
"number" | the value will be unconditionally converted to an arbitrary-precision number value using the Qore::number() function |
"*number" | the value will be converted to NOTHING if empty, otherwise it will be converted to an arbitrary-precision number value using the Qore::number() function |
"string" | (the default) the value remains a string; no transformation is done on the input data |
"*string" | the value will be converted to NOTHING if empty, otherwise, it remains a string |
"date" | in this case dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below |
"*date" | the value will be converted to NOTHING if empty, otherwise dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below |
See here for an example of using the hash field description in the constructor().
AbstractCsvIterator Option Field Hash and Spec Hash Field specification is provided via options "fields" for old-style constructor or as separate parameter in new-style constructor supporting multi-type.
Key | Value Description |
"type" | one of the option type values giving the field type |
"format" | when used with "date" , this is a date/time format mask for parsing dates, when used with "int" , "float" , or "number" types, this is a number format as in format_number() |
"timezone" | used only with the "date" type; this value is passed to Qore::TimeZone::constructor() and the resulting timezone is used to parse the date (this value overrides any default time zone for the object; use only in the rare case that date/time values from different time zones are present in different columns of the same file) |
"code" | this is a closure or call reference that takes a single argument of the value (after formatting with any optional "type" formats) and returns the value that will be output for the field |
Extra AbstractCsvIterator Spec Hash Options
Key | Data Type | Value Description |
value | string | the value to use to compare to input data when determining the record type; if "value" is defined for a field, then "regex" cannot be defined (for iterator only) |
regex | string | the regular expression to use to apply to input data lines when determining the record type (for iterator only) |
header | string | field name as defined in Csv header line. It enables remapping from Csv to own name |
index | int | index of field in Csv file. It enables mapping when Csv has not header |
default | any | Default output value (for writers only) |
CsvUtil::AbstractCsvIterator::constructor | ( | AbstractLineIterator | li, |
*hash | opts | ||
) |
creates the AbstractCsvIterator with an option hash in single-type mode
li | source line iterator |
opts | a hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information |
ABSTRACTCSVITERATOR-ERROR | invalid or unknown option; invalid data type for option; "header-names" is True and "header_lines" is 0 or "headers" is also present; unknown field type |
creates the AbstractCsvIterator with an option hash in multi-type mode
li | source line iterator |
spec | a hash of field and type definition; see Option Field Hash for more information |
opts | a hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information |
Returns the current record headers or NOTHING if no headers have been detected or saved yet.
Returns a list of headers for the given record or NOTHING if the record is not recognized.
string CsvUtil::AbstractCsvIterator::getQuote | ( | ) |
Returns the current quote string.
string CsvUtil::AbstractCsvIterator::getRawLine | ( | ) |
Returns the current line 'as it is', i.e. the original string.
Returns the list of raw string values of the current line.
hash<auto> CsvUtil::AbstractCsvIterator::getRecord | ( | bool | extended | ) |
Returns the current record as a hash.
extended | specifies if result is an extended hash including "type" and "record" . |
"type"
: the record type"record"
: a hash of the current recordhash<auto> CsvUtil::AbstractCsvIterator::getRecord | ( | ) |
Returns the current record as a hash.
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a hash of the current recordauto CsvUtil::AbstractCsvIterator::getRecordList | ( | ) |
Returns the current record as a list.
When "extended_record" option is set then result is extended hash including "type" and "record".
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a list of field values for the current recordINVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
string CsvUtil::AbstractCsvIterator::getSeparator | ( | ) |
Returns the current separator string.
|
virtual |
Returns the current record as a hash.
"extended_record"
option is set, then the return value is a hash with the following keys:"type"
: the record type"record"
: a hash of the current recordINVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
Implements Qore::AbstractIterator.
Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary.
rec | Input line record to be identified |
ABSTRACTCSVITERATOR-ERROR | input line cannot be matched to a known record |
Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty.
rec | Input line record to be identified |
ABSTRACTCSVITERATOR-ERROR | input line cannot be matched to a known record |
int CsvUtil::AbstractCsvIterator::index | ( | ) |
Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped.
int CsvUtil::AbstractCsvIterator::lineNumber | ( | ) |
Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element.
auto CsvUtil::AbstractCsvIterator::memberGate | ( | string | name | ) |
Returns the given column value for the current row.
name | the name of the field (header name) in record |
INVALID-ITERATOR | this error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method |
ABSTRACTCSVITERATOR-FIELD-ERROR | invalid or unknown field name given |
|
virtual |
Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate.
This method will return True again after it returns False once if the file being iterated has data that can be iterated, otherwise it will always return False. The iterator object should not be used to retrieve a value after this method returns False.
Implements Qore::AbstractIterator.
CsvUtil::AbstractCsvIterator::peek | ( | ) |
Reads a single row without moving the index position.
this method can be used to read headers for example without reading any data