Qore CsvUtil Module Reference  1.10
CsvUtil::AbstractCsvIterator Class Reference

the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated More...

Inheritance diagram for CsvUtil::AbstractCsvIterator:

Public Member Methods

 constructor (AbstractLineIterator li, *hash< auto > opts)
 creates the AbstractCsvIterator with an option hash in single-type mode More...
 
 constructor (AbstractLineIterator li, hash< auto > spec, hash< auto > opts)
 creates the AbstractCsvIterator with an option hash in multi-type mode More...
 
*list< stringgetHeaders ()
 Returns the current record headers or NOTHING if no headers have been detected or saved yet. More...
 
*list< stringgetHeaders (string type)
 Returns a list of headers for the given record or NOTHING if the record is not recognized. More...
 
string getQuote ()
 Returns the current quote string. More...
 
string getRawLine ()
 Returns the current line 'as it is', i.e. the original string. More...
 
list< *stringgetRawLineValues ()
 Returns the list of raw string values of the current line. More...
 
hash< auto > getRecord ()
 Returns the current record as a hash. More...
 
hash< auto > getRecord (bool extended)
 Returns the current record as a hash. More...
 
auto getRecordList ()
 Returns the current record as a list. More...
 
*hash< string, AbstractDataField > getRecordType ()
 Returns the description of the record type, if any.
 
string getSeparator ()
 Returns the current separator string. More...
 
hash< auto > getValue ()
 Returns the current record as a hash. More...
 
string identifyType (list< auto > rec)
 Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary. More...
 
int index ()
 Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped. More...
 
int lineNumber ()
 Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element. More...
 
auto memberGate (string name)
 Returns the given column value for the current row. More...
 
bool next ()
 Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate. More...
 
 peek ()
 Reads a single row without moving the index position. More...
 

Private Member Methods

list< *stringgetLineAndSplit ()
 Read line split by separator/quote into list.
 
*string identifyTypeImpl (list< auto > rec)
 Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty. More...
 
hash< auto > parseLine ()
 Parses a line in the file and returns a processed list of the fields.
 
 prepareFieldsFromHeaders (*list< auto > headers)
 match headers provided at csv header or in options, never called for multi-type because header_names is False
 
 processCommonOptions (*hash< auto > opts, int C_OPTx)
 process common options and and assing internal fields
 
 processSpec (hash< auto > spec)
 process specification and assing internal data for resolving
 

Private Attributes

*string eol
 the eol marker, if any
 
const Options
 valid options for the object (a hash for quick lookups of valid keys)
 

Detailed Description

the AbstractCsvIterator class is an abstract base class that allows abstract CSV data to be iterated

AbstractCsvIterator Constructor Option Hash Overview

The AbstractCsvIterator class constructor takes an optional hash with possible keys given in the following table. Note that key names are case-sensitive, and data types are soft (conversions are made when possible).

AbstractCsvIterator Options

Option Data Type Description
"date_format" string the default date format for "date" fields (see date formatting for the value in this case)
"encoding" string the character encoding for the file (and for tagging string data read); if the value of this key is not a string then it will be ignored
"eol" string the end of line character(s) (default: auto-detect); if the value of this key is not a string then it will be ignored
"header_lines" int the number of headers lines in the file (if not present and "header_names" is True, assumed to be 1)
"header_names" bool if True then the object will parse the header names from </td>

the first header row, in this case if "header_lines" is not set explicitly, it will be assumed to be \ 1. In case of multi-type lines "header_names" must be False.

"header_reorder" bool if True (default value) then if "headers" are provided by options or read from file then data fields are reordered to follow headers. It has a major effect on the return value of AbstractCsvIterator::getRecordList() and also a minor effect on the hash result of AbstractCsvIterator::getRecord() when a program depends on the order of keys. If this value is False then fields not yet specified are pushed at the end of the field definition.
"ignore_empty" bool if True (the default) then empty lines will be ignored; this option is processed with parse_boolean()
"ignore_whitespace" bool if True (the default) then leading and trailing whitespace will be stripped from non-quoted fields; this option is processed with parse_boolean()
"number_format" string the default format for "int", "float", and "number" fields as a string giving the thousands separator character followed by the decimal separator character (ex: ".," for continental-European-style numbers)
"quote" string the field quote character (default: '"')
"separator" string the string separating the fields in the file (default: ",")
"timezone" string the timezone to use when parsing dates (will be passed to Qore::TimeZone::constructor())
"tolwr" bool if True then all header names will be converted to lower case letters
"verify_columns" bool if True (the default is False) then if a line is parsed with a different column or field count than other lines, a CSVFILEITERATOR-DATA-ERROR exception is thrown

AbstractCsvIterator Single-type-only Options

Option Data Type Description
"headers" list of strings list of header / column names for the data iterated; if this is present, then "header_names" must be False.
"fields" Hash the keys are field names as given by the header_names or headers option (in case neither of these options are used, then field names are numbers starting with "0") and the values are either strings (one of Option Field Types giving the data type for the field) or a Option Field Hash describing the field; also sets headers if not set automatically with "header_names"; if no field type is given, the default is "*string"; note that invalid field names given in this option are ignored

AbstractCsvIterator Multi-type-only Options

Option Data Type Description
"extended_record" Boolean if True then get functions will use extended hash with "type" and "record" members to provide type to calling party, Default: False
Note
the following options separated by dashes are still supported for backwards-compatibility:
  • "date-format"
  • "ignore-empty"
  • "ignore-whitespace"
  • "header-names"
  • "header-lines"
  • "verify-columns"

Option Field Types

Fields are defined in order how the data are expected by user program. In this order are data returned by get functions. There are two exception, the former "headers" options sorts definition that data correspond to "headers" field order and the later when header names are read from Csv file header.

AbstractCsvIterator Option Field Types

Name Description
"bool" the value will be unconditionally converted to a bool using the Qore::parse_boolean() function
"*bool" the value will be converted to NOTHING if empty, otherwise it will be converted to an bool using the Qore::parse_boolean() function
"int" the value will be unconditionally converted to an integer using the Qore::int() function
"*int" the value will be converted to NOTHING if empty, otherwise it will be converted to an integer using the Qore::int() function
"float" the value will be unconditionally converted to a floating-point value using the Qore::float() function
"*float" the value will be converted to NOTHING if empty, otherwise it will be converted to a floating-point value using the Qore::float() function
"number" the value will be unconditionally converted to an arbitrary-precision number value using the Qore::number() function
"*number" the value will be converted to NOTHING if empty, otherwise it will be converted to an arbitrary-precision number value using the Qore::number() function
"string" (the default) the value remains a string; no transformation is done on the input data
"*string" the value will be converted to NOTHING if empty, otherwise, it remains a string
"date" in this case dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below
"*date" the value will be converted to NOTHING if empty, otherwise dates are parsed directly with the Qore::date() function (and therefore are tagged automatically with the current time zone); to specify another date format, use the hash format documented below

Option Field Hash

See here for an example of using the hash field description in the constructor().

AbstractCsvIterator Option Field Hash and Spec Hash Field specification is provided via options "fields" for old-style constructor or as separate parameter in new-style constructor supporting multi-type.

Key Value Description
"type" one of the option type values giving the field type
"format" when used with "date", this is a date/time format mask for parsing dates, when used with "int", "float", or "number" types, this is a number format as in format_number()
"timezone" used only with the "date" type; this value is passed to Qore::TimeZone::constructor() and the resulting timezone is used to parse the date (this value overrides any default time zone for the object; use only in the rare case that date/time values from different time zones are present in different columns of the same file)
"code" this is a closure or call reference that takes a single argument of the value (after formatting with any optional "type" formats) and returns the value that will be output for the field

Extra AbstractCsvIterator Spec Hash Options

Key Data Type Value Description
value string the value to use to compare to input data when determining the record type; if "value" is defined for a field, then "regex" cannot be defined (for iterator only)
regex string the regular expression to use to apply to input data lines when determining the record type (for iterator only)
header string field name as defined in Csv header line. It enables remapping from Csv to own name
index int index of field in Csv file. It enables mapping when Csv has not header
default any Default output value (for writers only)

Member Function Documentation

◆ constructor() [1/2]

CsvUtil::AbstractCsvIterator::constructor ( AbstractLineIterator  li,
*hash< auto >  opts 
)

creates the AbstractCsvIterator with an option hash in single-type mode

Parameters
lisource line iterator
optsa hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information
Exceptions
ABSTRACTCSVITERATOR-ERRORinvalid or unknown option; invalid data type for option; "header-names" is True "headers" is also present; unknown field type

◆ constructor() [2/2]

CsvUtil::AbstractCsvIterator::constructor ( AbstractLineIterator  li,
hash< auto >  spec,
hash< auto >  opts 
)

creates the AbstractCsvIterator with an option hash in multi-type mode

Parameters
lisource line iterator
speca hash of field and type definition; see Option Field Hash for more information
optsa hash of optional options; see AbstractCsvIterator Constructor Option Hash Overview for more information

◆ getHeaders() [1/2]

*list<string> CsvUtil::AbstractCsvIterator::getHeaders ( )

Returns the current record headers or NOTHING if no headers have been detected or saved yet.

Example:
*list l = i.getHeaders();
Note
if headers are not saved against the object in the constructor(), then they are written to the object after the first call to next()

◆ getHeaders() [2/2]

*list<string> CsvUtil::AbstractCsvIterator::getHeaders ( string  type)

Returns a list of headers for the given record or NOTHING if the record is not recognized.

Example:
*list l = i.getHeaders(my_type);

◆ getQuote()

string CsvUtil::AbstractCsvIterator::getQuote ( )

Returns the current quote string.

Example:
string quote = i.getQuote();
Returns
the current quote string

◆ getRawLine()

string CsvUtil::AbstractCsvIterator::getRawLine ( )

Returns the current line 'as it is', i.e. the original string.

Example:
string s = i.getRawLine();
Returns
the current raw line, i.e. the original string before parsing
Since
CsvUtil 1.6.3

◆ getRawLineValues()

list<*string> CsvUtil::AbstractCsvIterator::getRawLineValues ( )

Returns the list of raw string values of the current line.

Example:
list<*string> l = i.getRawLineValues();
Returns
the list of raw string values of the current line. Parsing is done only to split the fields but not to intrepret their contents according to their types.
Since
CsvUtil 1.6.3

◆ getRecord() [1/2]

hash<auto> CsvUtil::AbstractCsvIterator::getRecord ( )

Returns the current record as a hash.

Example:
hash h = i.getRecord();
Returns
the current record as a hash; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method

◆ getRecord() [2/2]

hash<auto> CsvUtil::AbstractCsvIterator::getRecord ( bool  extended)

Returns the current record as a hash.

Example:
hash h = i.getRecord();
Parameters
extendedspecifies if result is an extended hash including "type" and "record".
Returns
the current record as a hash; if extended is True, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method

◆ getRecordList()

auto CsvUtil::AbstractCsvIterator::getRecordList ( )

Returns the current record as a list.

Example:
list l = i.getRecordList();

When "extended_record" option is set then result is extended hash including "type" and "record".

Returns
the current record as a list of field values; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a list of field values for the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method

◆ getSeparator()

string CsvUtil::AbstractCsvIterator::getSeparator ( )

Returns the current separator string.

Example:
string sep = i.getSeparator();
Returns
the current separator string

◆ getValue()

hash<auto> CsvUtil::AbstractCsvIterator::getValue ( )
virtual

Returns the current record as a hash.

Example:
hash h = i.getValue();
Returns
the current record as a hash; when the "extended_record" option is set, then the return value is a hash with the following keys:
  • "type": the record type
  • "record": a hash of the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method

Implements Qore::AbstractIterator.

◆ identifyType()

string CsvUtil::AbstractCsvIterator::identifyType ( list< auto >  rec)

Identify a fixed-length line type using identifyTypeImpl(); may be overridden if necessary.

Parameters
recInput line record to be identified
Returns
the name of the record corresponding to the input line
Exceptions
ABSTRACTCSVITERATOR-ERRORinput line cannot be matched to a known record

◆ identifyTypeImpl()

*string CsvUtil::AbstractCsvIterator::identifyTypeImpl ( list< auto >  rec)
private

Identify a input record, given the raw line string. This method performs a lookup to a precalculated table based on number of records (see constructor()). In case different criteria are needed, eg. when two line types in a spec have the same record number and no unique resolving rule are specified, this method needs to be overridden, otherwise it will throw an exception because the precalculated mapping will be empty.

Parameters
recInput line record to be identified
Returns
the record name or NOTHING if the input cannot be matched
Exceptions
ABSTRACTCSVITERATOR-ERRORinput line cannot be matched to a known record

◆ index()

int CsvUtil::AbstractCsvIterator::index ( )

Returns the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped.

Example:
int index = i.index();
Returns
the row index being iterated, which does not necessarily correspond to the line number when there are header rows and blank lines are skipped
See also
lineNumber()
Since
CsvUtil 1.1

◆ lineNumber()

int CsvUtil::AbstractCsvIterator::lineNumber ( )

Returns the current iterator line number in the file (the first line is line 1) or 0 if not pointing at a valid element.

Example:
while (i.next()) {
printf("+ line %d: %y\n", i.lineNumber(), i.getValue());
}
string printf(string fmt,...)
Returns
returns the current iterator line number in the data (the first line is line 1) or 0 if not pointing at a valid element
See also
index()
Since
CsvUtil 1.1

◆ memberGate()

auto CsvUtil::AbstractCsvIterator::memberGate ( string  name)

Returns the given column value for the current row.

Parameters
namethe name of the field (header name) in record
Returns
the value of the given header for the current record
Exceptions
INVALID-ITERATORthis error is thrown if the iterator is invalid; make sure that the next() method returns True before calling this method
ABSTRACTCSVITERATOR-FIELD-ERRORinvalid or unknown field name given

◆ next()

bool CsvUtil::AbstractCsvIterator::next ( )
virtual

Moves the current line / record position to the next line / record; returns False if there are no more lines to iterate.

This method will return True again after it returns False once if the file being iterated has data that can be iterated, otherwise it will always return False. The iterator object should not be used to retrieve a value after this method returns False.

Returns
False if there are no lines / records to iterate (in which case the iterator object is invalid and should not be used); True if successful (meaning that the iterator object is valid)
Note
that if headers are not given as an option to the constructor, then they are detected and set the first time AbstractCsvIterator::next() is run on a file (see getHeaders())

Implements Qore::AbstractIterator.

◆ peek()

CsvUtil::AbstractCsvIterator::peek ( )

Reads a single row without moving the index position.

this method can be used to read headers for example without reading any data