Qore Mapper Module Reference  1.4
Mapper Module

Mapper Module Introduction

This module provides classes that help with structured data mapping, meaning the transformation of data in one or more input formats to a different output format.

Classes provided by this module:

Mapper Examples

The following is an example map hash with comments:

const DataMap = (
# output field: "id" mapper from the "Id" element of any "^attributes^" hash in the input record
"id": "^attributes^.Id",
# output field: "name": maps from an input field with the same name (no translations are made)
"name": True,
# output field: "explicit_count": maps from the input "Count" field, if any value is present then it is converted to an integer
"explicit_count": ("type": "int", "name": "Count"),
# output field: "implicit_count": runs the given code on the input record and retuns the result, the code returns the number of "Products" sub-records
"implicit_count": int sub (any ignored, hash rec) { return rec.Products.size(); },
# output field: "order_date": converts the "OrderDate" string input field to a date in the specified format
"order_date": ("name": "OrderDate", "date_format": "DD.MM.YYYY HH:mm:SS.us"),
);

If this map is applied to the following data in the following way:

const MapInput = ((
"^attributes^": ("Id": 1),
"name": "John Smith",
"Count": 1,
"OrderDate": "02.01.2014 10:37:45.103948",
"Products": ((
"ProductName": "Widget 1",
"Quantity": 1,
),
)), (
"^attributes^": ("Id": 2),
"name": "Steve Austin",
"Count": 2,
"OrderDate": "04.01.2014 19:21:08.882634",
"Products": ((
"ProductName": "Widget X",
"Quantity": 4,
), (
"ProductName": "Widget 2",
"Quantity": 2,
),
)),
);
Mapper mapv(DataMap);
list l = mapv.mapAll(MapInput);
printf("%N\n", l);

The result will be:

list: (2 elements)
  [0]=hash: (5 members)
    id : 1
    name : "John Smith"
    explicit_count : 1
    implicit_count : 1
    order_date : 2014-01-02 10:37:45.103948 Thu +01:00 (CET)
  [1]=hash: (5 members)
    id : 2
    name : "Steve Austin"
    explicit_count : 2
    implicit_count : 2
    order_date : 2014-01-04 19:21:08.882634 Sat +01:00 (CET))

Mapper Specification Format

The mapper hash is made up of target (ie output) field names (note that dotted output field names result in a nested hash output unless the allow_output_dot option is set) as the key values assigned to field specifications as follows:

  • True: this is a shortcut meaning map from an input field with the same name
  • a string: giving the input field name directly (equivalent to a hash with the "name" key)
  • a closure or call reference: meaning map from a field of the same name an apply the given code to give the value for the mapping (equivalent to a hash with the "code" key); the closure or call reference must accept the following arguments:
    • any value: the input field value (with the same name as the output field; to use a different name, see the code hash option below)
    • hash rec: the current input record
  • a hash describing the mapping; the following keys are all optional (an empty hash means map from an input field with the same name with no translations):
    • "code": a closure or call reference to process the field data; cannot be used with the "constant" or "index" keys
    • "constant": the value of this key will be returned as a constant value; this key cannot be used with the "name", "struct", "code", "index" or "default" keys
    • "index": gives current index/count of the row. The initial int value is the start offset. So value 0 means that mapped values will be: 0, 1, ..., N; 1 means: 1, 2, ..., N; etc.
    • "date_format": gives the format for converting an input string to a date; see Date Formatting Codes for the format of this string; note that this also implies "type" = "date"
    • "default": gives a default value for the field in case no input or translated value is provided
    • "mand": assign to boolean True if the field is mandatory and an exception should be thrown if no input data is supplied
    • "maxlen": an integer giving the maximum output string field length in bytes
    • "name": the value of this key gives the name of the input field; only use this if the input record name is different than the output field name; note that if this value contains "." characters and the allow_dot option is not set (see Mapper Options), then the value will be treated like "struct" (the "struct" key value will be created automatically); cannot be used with the "constant" ior "index" keys
    • "number_format": gives the format for converting an input string to a number; see Qore::parse_number() for the format of this string; note that this also implies "type" = "number"
    • "runtime": a reference to Mapper Runtime Options current status. The value is key in the current runtime structure.
    • "struct": the value of this key gives the location of the input field in an input hash in dot notation, ex: "element.name" would look for the field's value in the "name" key of the "element" hash in the input record; cannot be used with the "constant" or "index" keys; this option is only necessary in place of the "name" option if the allow_dot option is set, otherwise use "name" instead
    • "trunc": assign to boolean True if the field should be truncated if over the maximum field length; this key can only be set to True if the "maxlen" key is also given
    • "type": this gives the output field type, can be:
      • "date": date/time field
      • "int": fields accepts only integer values (any non-integer values on input will cause an exception to be thrown when mapping; note: also "integer" is accepted as an alias for "int")
      • "number": field accepts only numeric values (any non-numeric values on input will cause an exception to be thrown when mapping); numeric values are left in their original types, any other type is converted to a arbitrary-precision numeric value
      • "string": field accepts string values; in this case any other value will be converted to a string in the output

Mapper Options

Mapper objects accept the following options in the option hash:

  • "allow_dot": if True (as evaluated by parse_boolean()) then field names with "." characters do not imply a structured internal element lookup; in this case input field names may have "." characters in them, use the "struct" key to use structured internal element loopups (see Mapper Specification Format "struct" docs for more info)
  • "allow_output_dot": if True (as evaluated by parse_boolean()) then output field names with "." characters do not imply a structured/hash output element; in this case output field names may have "." characters in them
  • "date_format": gives the global format for converting a string to a date; see Date Formatting Codes for the format of this string; this is applied to all fields of type "date" unless the field has a "date_format" value that overrides this global setting
  • "encoding": the output character encoding; if not present then "UTF-8" is assumed
  • "info_log": an optional info logging callback; must accept a string format specifier and sprintf()-style arguments
  • "input": an optional hash describing the input records where each key is a possible input field name (where dot notation indicates a multi-level hash) and each value is a hash describing the field with the following optional keys:
    • "desc": this gives the description of the input field
  • "input_log": an optional input data logging callback; must accept a hash giving the input data hash
  • "input_timezone": an optional string or integer (giving seconds east of UTC) giving the time zone for parsing input data (ex: "Europe/Prague"), if not set defaults to the current TimeZone (see Qore::TimeZone::get())
  • "name": the name of the mapper for use in logging and error strings
  • "number_format": gives the global format for converting a string to a number; see Qore::parse_number() for the format of this string; this is applied to all fields of type "number" unless the field has a "number_format" value that overrides this global setting
  • "output": an optional hash describing the output data structure; each hash key is a output field name (where dot notation indicates a multi-level hash) and each value is an optional hash describing the output field taking an optional "desc" key and a subset of mapper field hash keys as follows:
    • "desc": a description of the output field
    • "mand": True if the field is mandatory and an exception should be thrown if no input data is supplied
    • "maxlen": an integer giving the maximum length of a string field in bytes
    • "type": this gives the output field type, can be:
      • "date": date/time field
      • "int": fields accepts only integer values (any non-integer values on input will cause an exception to be thrown when mapping; note: also "integer" is accepted as an alias for "int")
      • "number": field accepts only numeric values (any non-numeric values on input will cause an exception to be thrown when mapping); numeric values are left in their original types, any other type is converted to a arbitrary-precision numeric value
      • "string": field accepts string values; in this case any other value will be converted to a string in the output
  • "output_log": an optional output data logging callback; must accept a hash giving the output data hash
  • "runtime": an initial runtime structure for Mapper Runtime Options
  • "timezone": an optional string or integer (giving seconds east of UTC) giving the time zone definition for output data (ex: "Europe/Prague"), if not set defaults to the current TimeZone (see Qore::TimeZone::get())
  • "trunc_all": if True (as evaluated by parse_boolean()) then any field without a "trunc" key (see Mapper Specification Format "trunc" description) will automatically be truncated if a "maxlen" attribute is set for the field
Note
  • if the "input" option is given, then only those defined fields can be referenced as input fields in the mapper hash; all possible input fields should be defined here if this option is used
  • if the "output" option is given, then only those defined fields can be referenced as output fields, additionally the types given in the output definition cannot be overridden in the mapper hash; all possible output fields should be defined here if this option is used

Mapper Runtime Options

Runtime options for Mapper objects allow the programmer to use constant values provided at runtime in the Mapper output.

For example, runtime options can be useful in the following cases:

  • storing one date/time value for all output hashes of the Mapper
  • using a value from a database sequence value for the lifetime of the Mapper object
Example:
hash mapv = (
"foo": ("constant": "bar"),
# ...
"date_begin": ("runtime": "start_date"), # references runtime option "start_date"
"group": ("runtime": "group_id"), # references runtime option "group_id"
);
hash opts = (
"timezone": "Europe/Prague",
# ...
"runtime": (
"start_date": now_us(), # set runtime option "start_date"
"group_id": 0, # set runtime option "group_id" to 0
),
);
Mapper m(mapv, opts); # runtime options are active now
m.mapData(input1); # output record hash date_begin = start_date = timestamp of the opts creation and group = 0

The runtime options are basically the same as setting constants in the mapper before providing runtime data to the mapper. As such, the runtime options can be changed only before the first input hash is processed by a Mapper.

Note that the Mapper::setRuntime() and Mapper::replaceRuntime() methods are deprecated - please use Mapper construction options to set runtime values instead. The methods are deprecated since runtime options duplicate existing functionality and are confusing and error-prone to use.

Release Notes

Mapper v1.4

  • added support for complex types

Mapper v1.3.1

  • fixed bugs handling mapper fields with no input records in list mode as passed from the TableMapper module (issue 1736)

Mapper v1.3

  • internal updates to allow for TableMapper insert performance improvements (issue 1626)

Mapper v1.2

  • significantly improved mapper performance with identity (i.e. 1:1) and constant mappings (issue 1620)

Mapper v1.1

  • implemented "constant" field tag giving a constant value for the output of a field
  • implemented structured output for dotted output field names and the "allow_output_dot" option to suppress this behavior
  • implemented "default" field tag giving a default value if no input value is specified
  • moved field length checks after all transformations have been applied
  • implemented a global "date_format" mapper option
  • implemented the "number_format" field option and a global option of the same name
  • fixed bugs in the "timezone" and "input_timezone" options, documented those options
  • changed the behavior of the "number" field type: now leaves numeric values in their original type, converts all other types to a number
  • removed the deprecated "crec" option
  • implemented the "input" option with input record validation
  • implemented the "output" option with output record validation
  • implemented the "info_log" option and removed the "trunc" option
  • added runtime option handling (Mapper Runtime Options):
  • implemented "index" field tag for current row index
  • improved the Mapper::Mapper::mapAll() method by adding support for hashes of lists to better support input from bulk DML (SQLStatement::fetchColumns())

Mapper v1.0

  • Initial release