Qore Mapper Module Reference  1.6
Mapper Module

Mapper Module Introduction

This module provides classes that help with structured data mapping, meaning the transformation of data in one or more input formats to a different output format.

Classes provided by this module:

Mapper Examples

The following is an example map hash with comments:

const DataMap = (
# output field: "id" mapper from the "Id" element of any "^attributes^" hash in the input record
"id": "^attributes^.Id",
# output field: "name": maps from an input field with the same name (no translations are made)
"name": True,
# output field: "explicit_count": maps from the input "Count" field, if any value is present then it is converted to an integer
"explicit_count": ("type": "int", "name": "Count"),
# output field: "implicit_count": runs the given code on the input record and retuns the result, the code returns the number of "Products" sub-records
"implicit_count": int sub (any ignored, hash rec) { return rec.Products.size(); },
# output field: "order_date": converts the "OrderDate" string input field to a date in the specified format
"order_date": ("name": "OrderDate", "date_format": "DD.MM.YYYY HH:mm:SS.us"),
);

If this map is applied to the following data in the following way:

const MapInput = ((
"^attributes^": ("Id": 1),
"name": "John Smith",
"Count": 1,
"OrderDate": "02.01.2014 10:37:45.103948",
"Products": ((
"ProductName": "Widget 1",
"Quantity": 1,
),
)), (
"^attributes^": ("Id": 2),
"name": "Steve Austin",
"Count": 2,
"OrderDate": "04.01.2014 19:21:08.882634",
"Products": ((
"ProductName": "Widget X",
"Quantity": 4,
), (
"ProductName": "Widget 2",
"Quantity": 2,
),
)),
);
Mapper mapv(DataMap);
list l = mapv.mapAll(MapInput);
printf("%N\n", l);
string printf(string fmt,...)

The result will be:

list: (2 elements)
  [0]=hash: (5 members)
    id : 1
    name : "John Smith"
    explicit_count : 1
    implicit_count : 1
    order_date : 2014-01-02 10:37:45.103948 Thu +01:00 (CET)
  [1]=hash: (5 members)
    id : 2
    name : "Steve Austin"
    explicit_count : 2
    implicit_count : 2
    order_date : 2014-01-04 19:21:08.882634 Sat +01:00 (CET))

Mapper Specification Format

The mapper hash is made up of target (ie output) field names (note that dotted output field names result in a nested hash output unless the allow_output_dot option is set) as the key values assigned to field specifications as follows:

  • True: this is a shortcut meaning map from an input field with the same name
  • a string: giving the input field name directly (equivalent to a hash with the "name" key)
  • a closure or call reference: meaning map from a field of the same name an apply the given code to give the value for the mapping (equivalent to a hash with the "code" key); the closure or call reference must accept the following arguments:
    • any value: the input field value (with the same name as the output field; to use a different name, see the code hash option below)
    • hash rec: the current input record
  • a hash describing the mapping; the following keys are all optional (an empty hash means map from an input field with the same name with no translations):
    • "code": a closure or call reference to process the field data; cannot be used with the "constant" or "index" keys
    • "constant": the value of this key will be returned as a constant value; this key cannot be used with the "name", "struct", "code", "index" or "default" keys
    • "index": gives current index/count of the row. The initial int value is the start offset. So value 0 means that mapped values will be: 0, 1, ..., N; 1 means: 1, 2, ..., N; etc.
    • "date_format": gives the format for converting an input string to a date; see Date Formatting Codes for the format of this string; note that this also implies "type" = "date"
    • "default": gives a default value for the field in case no input or translated value is provided
    • "mand": assign to boolean True if the field is mandatory and an exception should be thrown if no input data is supplied
    • "maxlen": an integer giving the maximum output string field length in bytes
    • "name": the value of this key gives the name of the input field; only use this if the input record name is different than the output field name; note that if this value contains "." characters and the allow_dot option is not set (see Mapper Options), then the value will be treated like "struct" (the "struct" key value will be created automatically); cannot be used with the "constant" ior "index" keys
    • "number_format": gives the format for converting an input string to a number; see Qore::parse_number() for the format of this string; note that this also implies "type" = "number"
    • "output_key_path": gives the output path for hash output values; each element in the list is a string key name
    • "runtime": a reference to Mapper Runtime Options current status. The value is key in the current runtime structure.
    • "struct": the value of this key gives the location of the input field in an input hash in dot notation, ex: "element.name" would look for the field's value in the "name" key of the "element" hash in the input record; cannot be used with the "constant" or "index" keys; this option is only necessary in place of the "name" option if the allow_dot option is set, otherwise use "name" instead
    • "trunc": assign to boolean True if the field should be truncated if over the maximum field length; this key can only be set to True if the "maxlen" key is also given
    • "type": this gives the output field type, can be:
      • "date": date/time field
      • "int": fields accepts only integer values (any non-integer values on input will cause an exception to be thrown when mapping; note: also "integer" is accepted as an alias for "int")
      • "number": field accepts only numeric values (any non-numeric values on input will cause an exception to be thrown when mapping); numeric values are left in their original types, any other type is converted to a arbitrary-precision numeric value
      • "string": field accepts string values; in this case any other value will be converted to a string in the output
      • "hash": field accepts hash values
      • "any": field accepts any value
    • "type_options": a hash of type options to set or override type options for the output field

Mapper Options

Mapper objects accept the following options in the option hash:

  • "info_log": an optional info logging callback; must accept a string format specifier and sprintf()-style arguments
  • "input": this should be a description of the input fields with type hash<string, DataProvider::AbstractDataField>, for backwards compatibility, this option also accepts a hash describing the input fields where each key is a possible input field name (and where dot notation indicates a multi-level hash) and each value is a hash describing the field with the following optional keys:
    • "desc": this gives the description of the input field

This option is mutually exclusive with the input_provider option

  • "input_log": an optional input data logging callback; must accept a hash giving the input data hash
  • "input_provider": gives the input provider with an AbstractDataProvider object which defines the type of input data and also the data itself. The use of this option enables the use of the Mapper::getOutputIterator() API. This option is mutually exclusive with the input option. If an "output_provider" is also provided, the Mapper::runAutonomous() method can be used to map from input to output in a single call
  • "input_provider_search": the search criteria for the input provider; see the where_cond option of AbstractDataProvider::searchRecords() for more information on this option
  • "input_request": the arguments for input providers using the request/response API
  • "input_request_options": any options to input providers using the request/response API
  • "input_response_error": a string indicating the input providers using the request/response API should use the given error response message for the record format
  • "input_search_options": the search options for the input provider; see the search_options option of AbstractDataProvider::searchRecords() for more information on this option
  • "name": the name of the mapper for use in logging and error strings
  • "output": this should be a description of the output fields with type hash<string, DataProvider::AbstractDataField>, for backwards compatibility, this option also accepts a hash describing the output data structure where each hash key is a output field name (and where dot notation indicates a multi-level hash) and each value is an optional hash describing the output field taking a subset of mapper field hash keys as follows:
    • "desc": a description of the output field
    • "mand": True if the field is mandatory and an exception should be thrown if no input data is supplied
    • "maxlen": an integer giving the maximum length of a string field in bytes
    • "type": this gives the output field type, can be:
      • "date": date/time field
      • "int": fields accepts only integer values (any non-integer values on input will cause an exception to be thrown when mapping; note: also "integer" is accepted as an alias for "int")
      • "number": field accepts only numeric values (any non-numeric values on input will cause an exception to be thrown when mapping); numeric values are left in their original types, any other type is converted to a arbitrary-precision numeric value
      • "string": field accepts string values; in this case any other value will be converted to a string in the output
      • "hash": field accepts hash values
      • "any": field accepts any value
  • "output_log": an optional output data logging callback; must accept a hash giving the output data hash
  • "output_nullable": set all output fields as nullable
  • "output_provider": gives the output provider with an AbstractDataProvider object which defines the type of output data and also location where the output data will be written. If this option is set, then every mapped record will be written to the output data provider automatically (unless output_provider_passive is set).
    • record-based output providers: The mapped output is written to the output provider as records. If the output provider supports transaction management. Mapper::commit() and Mapper::rollback() can be used.
    • request/reply output providers: The mapped output is used as the request data for the output provider and above request is made for each output record.

This option is mutually exclusive with the output option. If an "input_provider" is also provided, the Mapper::runAutonomous() method can be used to map from input to output in a single call

  • "output_provider_bulk": if this option is used with a record-based output provider, then bulk operations are used with the output provider, and the Mapper::flushOutput() method must be called after all mapping is done to flush the output buffer to the output provider at the end, or Mapper::discardOutput() must be called to discard any data left in the bulk output buffer if the results should be discarded (ex: the output provide requires transaction management and an error occurs causing the transaction to be rolled back)
  • "output_provider_passive": if this option is set and a record-based or request-reply output provider is set, then nothing will be written to the output provider when mapping; the output provider will only be used to provide type information for the output record
  • "output_provider_upsert": set to True if upsert operations instead of creation APIs should be used with the output provider. If output_provider_bulk is also set, this indicates if the AbstractDataProviderBulkOperation object will use upsert operations instead of insert operations
  • "runtime": an initial runtime structure for Mapper Runtime Options
  • "trunc_all": if True (as evaluated by parse_boolean()) then any field without a "trunc" key (see Mapper Specification Format "trunc" description) will automatically be truncated if a "maxlen" attribute is set for the field

The following deprecated options are also accepted:

  • "allow_dot": if True (as evaluated by parse_boolean()) then field names with "." characters do not imply a structured internal element lookup; in this case input field names may have "." characters in them, use the "struct" key to use structured internal element loopups (see Mapper Specification Format "struct" docs for more info)
  • "allow_output_dot": if True (as evaluated by parse_boolean()) then output field names with "." characters do not imply a structured/hash output element; in this case output field names may have "." characters in them
  • "date_format": gives the global format for converting a string to a date; see Date Formatting Codes for the format of this string; this is applied to all fields of type "date" unless the field has a "date_format" value that overrides this global setting
  • "encoding": the output character encoding; if not present then "UTF-8" is assumed
  • "input_timezone": an optional string or integer (giving seconds east of UTC) giving the time zone for parsing input data (ex: "Europe/Prague"), if not set defaults to the current TimeZone (see Qore::TimeZone::get())
  • "number_format": gives the global format for converting a string to a number; see Qore::parse_number() for the format of this string; this is applied to all fields of type "number" unless the field has a "number_format" value that overrides this global setting
  • "timezone": an optional string or integer (giving seconds east of UTC) giving the time zone definition for output data (ex: "Europe/Prague"), if not set defaults to the current TimeZone (see Qore::TimeZone::get())
Note
  • if the "input" option is given, then only those defined fields can be referenced as input fields in the mapper hash; all possible input fields should be defined here if this option is used
  • if the "output" option is given, then only those defined fields can be referenced as output fields, additionally the types given in the output definition cannot be overridden in the mapper hash; all possible output fields should be defined here if this option is used

Mapper Runtime Options

Runtime options for Mapper objects allow the programmer to use constant values provided at runtime in the Mapper output.

For example, runtime options can be useful in the following cases:

  • storing one date/time value for all output hashes of the Mapper
  • using a value from a database sequence value for the lifetime of the Mapper object
Example:
hash<auto> mapv = (
"foo": ("constant": "bar"),
# ...
"date_begin": ("runtime": "start_date"), # references runtime option "start_date"
"group": ("runtime": "group_id"), # references runtime option "group_id"
);
hash<auto> opts = (
"timezone": "Europe/Prague",
# ...
"runtime": (
"start_date": now_us(), # set runtime option "start_date"
"group_id": 0, # set runtime option "group_id" to 0
),
);
Mapper m(mapv, opts); # runtime options are active now
m.mapData(input1); # output record hash date_begin = start_date = timestamp of the opts creation and group = 0
date now_us()

The runtime options are basically the same as setting constants in the mapper before providing runtime data to the mapper. As such, the runtime options can be changed only before the first input hash is processed by a Mapper.

Note that the Mapper::setRuntime() and Mapper::replaceRuntime() methods are deprecated - please use Mapper construction options to set runtime values instead. The methods are deprecated since runtime options duplicate existing functionality and are confusing and error-prone to use.

Release Notes

Mapper v1.6

  • implemented options supporting suppressing data provider calls on input and output (issue 4462)

Mapper v1.5.9

  • fixed an error mapping bulk data with custom output field handlers (issue 4460)

Mapper v1.5.8

  • fixed an error mapping bulk data with custom output field handlers (issue 4460)

Mapper v1.5.7

  • fixed a bug where it was not possible to use a Mapper with an output provider only for the output data type (issue 4369)

Mapper v1.5.6

  • respect escaped dots in field names (.) when separating field names (issue 4315)

Mapper v1.5.5

  • fixed handling automatically-acquired input and output data structions with dots in field names; un-deprecated the allow_dot and allow_output_dot options (issue 4309)

Mapper v1.5.4

  • fixed a bug handling the struct key in mappers (issue 4189)

Mapper v1.5.3

  • fixed a bug handling external runtime keys with bulk input for keys that do not require the current input value (issue 3931)

Mapper v1.5.2

  • added the Mapper::mapAutoInput() method
  • added the following output_create_ignore_duplicates option
  • fixed a bug where mapper output data was not logged in case of an error in an output provider (issue 3909)
  • added support for mapper context in mapper field key handlers (issue 3893)
  • fixed Mapper::mapAuto() to return NOTHING with no input (issue 3872)
  • implemented support for nested mappers including the submappers option (issue 3414)

Mapper v1.5.1

Mapper v1.5

Mapper v1.4.1

  • fixed a bug where list values could not be passed as a value in non-bulk mode (issue 3611)
  • added support for types "any" and "hash" (issue 3453)
  • added support for dot notation in output fields for the "hash" output type (issue 3413)

Mapper v1.4

  • added support for complex types
  • fixed a bug in the STRING-TOO-LONG exception (issue 2405)

Mapper v1.3.1

  • fixed bugs handling mapper fields with no input records in list mode as passed from the TableMapper module (issue 1736)

Mapper v1.3

  • internal updates to allow for TableMapper insert performance improvements (issue 1626)

Mapper v1.2

  • significantly improved mapper performance with identity (i.e. 1:1) and constant mappings (issue 1620)

Mapper v1.1

  • implemented "constant" field tag giving a constant value for the output of a field
  • implemented structured output for dotted output field names and the "allow_output_dot" option to suppress this behavior
  • implemented "default" field tag giving a default value if no input value is specified
  • moved field length checks after all transformations have been applied
  • implemented a global "date_format" mapper option
  • implemented the "number_format" field option and a global option of the same name
  • fixed bugs in the "timezone" and "input_timezone" options, documented those options
  • changed the behavior of the "number" field type: now leaves numeric values in their original type, converts all other types to a number
  • removed the deprecated "crec" option
  • implemented the "input" option with input record validation
  • implemented the "output" option with output record validation
  • implemented the "info_log" option and removed the "trunc" option
  • added runtime option handling (Mapper Runtime Options):
  • implemented "index" field tag for current row index
  • improved the Mapper::Mapper::mapAll() method by adding support for hashes of lists to better support input from bulk DML (SQLStatement::fetchColumns())

Mapper v1.0

  • Initial release