Qore Programming Language Reference Manual 1.17.0
|
Methods in this pseudo-class can be executed on strings. More...
#include <Pseudo_QC_String.dox.h>
Public Member Methods | |
int | comparePartial (string ostr) |
Compares the beginning of the current string with a shorter string passed as an argument, returns -1, 0, or 1 if the argument string is less than, equal, or greater than the beginning of the current string; returns also -1 if the argument string is equal to the beginning of the current string but the argument string is longer than the current string. More... | |
bool | empty () |
Returns True if the string is empty, False if not. More... | |
string | encoding () |
Returns the name of the string's character encoding. More... | |
bool | equalPartial (string ostr) |
Compares the beginning of the current string with a shorter string passed as an argument for equality only, returns True if the string argument matches the beginning of the string, False if not. More... | |
bool | equalPartialPath (string ostr) |
Compares the beginning of the current string assumed to be a path with a shorter string passed as an argument for equality only, returns True if the string argument matches the beginning of the string where either both strings are the same size or the current string has a '/' or '?' character after the point where the argument string stops, False if not. More... | |
int | find (softstring substr, softint pos=0) |
Retrieves the character position of a substring within a string. More... | |
string | getDecoded (int code=CD_ALL) |
returns a string based on the string value, decoded as per the code argument More... | |
string | getEncoded (int code=CE_XHTML) |
returns a string based on the string value with encodings as per the code argument More... | |
*string | getLine (int offset=0, *string eol, bool trim=True, *reference< int > size) |
returns a string for the next line in the string buffer starting at the given offset (or at the beginning if no offset is given) More... | |
int | getUnicode (int offset=0) |
returns the Unicode code for the given character offset in the string More... | |
bool | intp () |
Returns True if the string can be converted to an integer, False if not, this depends on the first (or possibly second) character of the string, if it's 0 - 9 (possibly preceded by "-" ), then the method returns True. More... | |
bool | isDataAscii () |
returns True if the string is empty or has no characters with the high bit set (ie all characters < 128) More... | |
bool | isDataPrintableAscii () |
returns True if the string is empty or only contains printable non-control ASCII characters (ie all characters > 31 && < 127) More... | |
int | length () |
Returns the number of characters in the string; may not be equal to the byte length (returned by <string>::strlen() and <string>::size()) for multi-byte character encodings. More... | |
string | lwr () |
Returns the string in lower case. More... | |
bool | regex (string regex, int options=0) |
Returns True if the regular expression matches the string passed, otherwise returns False. More... | |
*list< *string > | regexExtract (string regex, int options=0) |
Returns a list of substrings in a string based on matching patterns defined by a regular expression. More... | |
int | rfind (softstring substr, softint pos=-1) |
Retrieves the character position of a substring within a string, starting the search from the end of the string. More... | |
int | size () |
Returns the number of bytes in the string (not including the terminating null character ('\0' ) More... | |
bool | sizep () |
Returns True since strings can return a non-zero size. More... | |
list< string > | split (string sep, bool with_separator=False) |
Splits a string into a list of components based on a separator string. More... | |
list< string > | split (string sep, string quote, bool trim_unquoted=False) |
Splits a string into a list of components based on a separator string and a quote character. More... | |
list< string > | splitRegex (string regex_sep, int options=0, bool with_separator=False) |
Splits a string into a list of components based on a separator regular expression. More... | |
list< string > | splitRegex (string regex_sep, bool with_separator=False) |
Splits a string into a list of components based on a separator regular expression. More... | |
int | strlen () |
Returns the number of bytes in the string (not including the terminating null character ('\0' ) More... | |
bool | strp () |
Returns True by default. More... | |
string | substr (softint start) |
Returns a portion of a string starting from an integer offset. More... | |
string | substr (softint start, softint len) |
Returns a portion of a string starting from an integer offset, with a length parameter. More... | |
string | toBase64 (softint maxlinelen=-1) |
Returns the base64-encoded representation of the string. More... | |
binary | toBinary () |
Returns a binary value with the string's data. More... | |
string | toHex () |
returns a string of hexadecimal digits corresponding to the contents of the string More... | |
int | toInt (int base=10) |
Converts the string to an integer value with respect to the base More... | |
string | toMD5 () |
Returns the MD5 message digest of the string as a hex string. More... | |
string | toSHA1 () |
Returns the SHA1 message digest of the string as a hex string. More... | |
string | toSHA224 () |
Returns the SHA-224 message digest (a variant of SHA-2) of the string as a hex string. More... | |
string | toSHA256 () |
Returns the SHA-256 message digest (a variant of SHA-2) of the string as a hex string. More... | |
string | toSHA384 () |
Returns the SHA-384 message digest (a variant of SHA-2) of the string as a hex string. More... | |
string | toSHA512 () |
Returns the SHA-512 message digest (a variant of SHA-2) of the string as a hex string. More... | |
int | typeCode () |
Returns Qore::NT_STRING. More... | |
string | unaccent () |
Returns a string with all accented characters removed. More... | |
string | upr () |
Returns the string in upper case. More... | |
bool | val () |
Returns False if the string is empty, True if not. More... | |
int | width () |
Returns the width of characters in the string; some unicode characters take up multiple spaces on output. More... | |
Public Member Methods inherited from <value> | |
bool | callp () |
Returns False; this method is reimplemented in other types and will return True if the given expression is a callable value (ie closures or call references) More... | |
bool | complexType () |
returns True if the value has a complex type, False if not More... | |
bool | empty () |
Returns True; this method will be reimplemented in container types where it may return False. More... | |
string | fullType (*bool with_namespaces) |
returns the full type name which differs from the simple type name in case of complex types and objects More... | |
bool | intp () |
Returns False; this method is reimplemented in other types and will return True if the given expression can be converted to an integer. More... | |
AbstractIterator | iterator () |
Returns an iterator object for the value; the default iterator object returned is SingleValueIterator. More... | |
int | lsize () |
Returns 1; the return value of this method should give the list size of the value, which is normally 1 for non-lists (except for NOTHING where the size will be 0) and the number of the elements in the list for lists; this method will be reimplemented in other types where it may return other values. More... | |
int | size () |
Returns zero; this method will be reimplemented in container types where it may return a non-zero value. More... | |
bool | sizep () |
Returns True if the type can return a non-zero size (True for containers including binary objects and strings, False for everything else) More... | |
bool | strp () |
Returns False; this method is reimplemented in other types and will return True if the given expression can be converted to a string. More... | |
bool | toBool () |
Returns the boolean representation of the value; the default is False. More... | |
float | toFloat () |
Returns the floating-point representation of the value; the default is 0.0. More... | |
int | toInt () |
Returns the integer representation of the value; the default is 0. More... | |
number | toNumber () |
Returns the arbitrary-precision numeric representation of the value; the default is 0. More... | |
string | toString () |
Returns the string representation of the value; the default is an empty string. More... | |
string | type () |
Returns the string type for the value. More... | |
int | typeCode () |
Returns the type code for the value. More... | |
bool | val () |
Returns False; this method is reimplemented in other types and will return True if the given expression has a non-empty value. More... | |
Methods in this pseudo-class can be executed on strings.
Compares the beginning of the current string with a shorter string passed as an argument, returns -1, 0, or 1 if the argument string is less than, equal, or greater than the beginning of the current string; returns also -1 if the argument string is equal to the beginning of the current string but the argument string is longer than the current string.
ostr | the partial string to compare the current string to |
bool <string>::empty | ( | ) |
string <string>::encoding | ( | ) |
Returns the name of the string's character encoding.
bool <string>::equalPartial | ( | string | ostr | ) |
Compares the beginning of the current string with a shorter string passed as an argument for equality only, returns True if the string argument matches the beginning of the string, False if not.
This pseudo-method is slightly faster than comparePartial() since the length of the substring can be used to determine if the strings can match or not.
ostr | the partial string to compare the current string to |
bool <string>::equalPartialPath | ( | string | ostr | ) |
Compares the beginning of the current string assumed to be a path with a shorter string passed as an argument for equality only, returns True if the string argument matches the beginning of the string where either both strings are the same size or the current string has a '/' or '?' character after the point where the argument string stops, False if not.
ostr | the partial string to compare the current string to |
int <string>::find | ( | softstring | substr, |
softint | pos = 0 |
||
) |
Retrieves the character position of a substring within a string.
The pos argument and the return value are in character positions; byte offsets may differ from the character offsets with multi-byte character encodings.
substr | the substring to find in the string; if the character encoding of this string does not match str, then it will be converted to str's character encoding before processing |
pos | the starting character position for the search |
ENCODING-CONVERSION-ERROR | this exception could be thrown if the string arguments have different character encodings and an error occurs during encoding conversion |
INVALID-ENCODING | this exception could be thrown if a character offset calculation fails due to invalid encoding of multi-byte character data |
returns a string based on the string value, decoded as per the code argument
code | a decoding bitfield argument; see String Concatenation Decoding Codes for more information |
returns a string based on the string value with encodings as per the code argument
code | an encoding bitfield argument; see String Concatenation Encoding Codes for more information |
returns a string for the next line in the string buffer starting at the given offset (or at the beginning if no offset is given)
offset | the offset in bytes from the beginning of the string; negative numbers give an offset from the end of the string |
eol | the optional end of line character(s) to use to detect lines in the buffer; if this string is not passed, then the end of line character(s) are detected automatically, and can be either "\n" , "\r" , or "\r\n" ; if this string is passed and has a different character encoding from this object's (as determined by the encoding parameter), then it will be converted to the string's character encoding |
trim | if True the string return values for the lines iterated will be trimmed of the eol bytes |
size | an optional reference to an integer that returns the number of bytes in the line including the end of line characters |
returns the Unicode code for the given character offset in the string
offset | the offset in characters in the string; negative numbers give offsets from the end of the string |
bool <string>::intp | ( | ) |
Returns True if the string can be converted to an integer, False if not, this depends on the first (or possibly second) character of the string, if it's 0 - 9 (possibly preceded by "-"
), then the method returns True.
"-"
), then the method returns Truebool <string>::isDataAscii | ( | ) |
bool <string>::isDataPrintableAscii | ( | ) |
returns True if the string is empty or only contains printable non-control ASCII characters (ie all characters > 31 && < 127)
int <string>::length | ( | ) |
Returns the number of characters in the string; may not be equal to the byte length (returned by <string>::strlen() and <string>::size()) for multi-byte character encodings.
string <string>::lwr | ( | ) |
Returns the string in lower case.
This pseudo-method operates on a very wide range of non-ASCII characters using a Unicode lookup table for mapping Latin, Cyrillic, Greek, Armenian, Georgian, etc characters.
Returns True if the regular expression matches the string passed, otherwise returns False.
Strings are converted to UTF-8 for pattern-matching; if any invalid encodings are encountered, an ENCODING-CONVERSION-ERROR
is raised
regex | the regular expression pattern |
options | regular expression options; see Regular Expression Constants for possible values |
REGEX-COMPILATION-ERROR | There was an error compiling the regular expression |
REGEX-OPTION-ERROR | the option argument contains invalid option bits |
ENCODING-CONVERSION-ERROR | this exception could be thrown if an encoding error is encountered when converting the given strings to UTF-8 |
Returns a list of substrings in a string based on matching patterns defined by a regular expression.
Strings are converted to UTF-8 for pattern-matching; if any invalid encodings are encountered, an ENCODING-CONVERSION-ERROR
is raised
regex | the regular expression to use for matching, elements should be given in parentheses |
options | regular expression options; see Regular Expression Constants for possible values |
REGEX-COMPILATION-ERROR | There was an error compiling the regular expression |
REGEX-OPTION-ERROR | the option argument contains invalid option bits |
ENCODING-CONVERSION-ERROR | this exception could be thrown if an encoding error is encountered when converting the given strings to UTF-8 |
int <string>::rfind | ( | softstring | substr, |
softint | pos = -1 |
||
) |
Retrieves the character position of a substring within a string, starting the search from the end of the string.
The pos argument and the return value are in character positions; byte offsets may differ from the character offsets with multi-byte character encodings.
substr | the substring to find in str; if the character encoding of this string does not match str, then it will be converted to str's character encoding before processing |
pos | the starting character position for the search, -1 means start from the end of the string |
ENCODING-CONVERSION-ERROR | this exception could be thrown if the string arguments have different character encodings and an error occurs during encoding conversion |
INVALID-ENCODING | this exception could be thrown if a character offset calculation fails due to invalid encoding of multi-byte character data |
int <string>::size | ( | ) |
Returns the number of bytes in the string (not including the terminating null character ('\0'
)
'\0'
)bool <string>::sizep | ( | ) |
Returns True since strings can return a non-zero size.
Splits a string into a list of components based on a separator string.
sep | the separator string; if the separator string is not found in the string to split, then a list with only one element containing the entire string argument is returned'; if this string has a different character encoding than str, then it will be converted to str's character encoding |
with_separator | include the separator string in every element |
ENCODING-CONVERSION-ERROR | this exception could be thrown if the string arguments have different character encodings and an error occurs during encoding conversion |
Splits a string into a list of components based on a separator string and a quote character.
The quote character can appear as the first part of a field, in which case it is assumed to designate the entire field. If instances of the quote character are found in the field preceded by a backquote character ("\"), then
these quote characters are included as part of the field's text and not treated as quote characters.
Also the separator character can appear as a part of a field with this variant.
This variant is useful for parsing CSV files, for example.
@par Code Flags:
@ref RET_VALUE_ONLY
@param sep the separator string; if the separator string is not found in the string to split, then a list with
only one element containing the entire string argument is returned'; if this string has a different
@ref character_encoding "character
encoding" than \a str, then it will be converted to <em>str</em>'s
@ref character_encoding "character encoding"
@param quote the quote character
@param trim_unquoted remove leading and trailing whitespace from unquoted fields
@return a list of each component of a string separated by a separator string, with the separator and any enclosing
quote characters removed
@par Example:
@code{.py}
# returns ("some", ", and commas, here is another one! ,", "here")
list<string> list = "some,'text with spaces, and commas, here is another one! ,',here".split(",", "'");
@endcode
@throw ENCODING-CONVERSION-ERROR this exception could be thrown if the string arguments have different
@ref character_encoding "character encodings" and an error occurs during encoding conversion
SPLIT-ERROR | field missing closing quote character; extra text following quoted field |
trim_unquoted
parameter Splits a string into a list of components based on a separator regular expression.
regex_sep | the separator regular expression; if the separator regular expression is not matched in the string to split, then a list with only one element containing the entire string argument is returned'; if this string has a different character encoding than str, then it will be converted to str's character encoding |
with_separator | include the separator string in every element |
ENCODING-CONVERSION-ERROR | this exception could be thrown if the string arguments have different character encodings and an error occurs during encoding conversion |
list< string > <string>::splitRegex | ( | string | regex_sep, |
int | options = 0 , |
||
bool | with_separator = False |
||
) |
Splits a string into a list of components based on a separator regular expression.
regex_sep | the separator regular expression; if the separator regular expression is not matched in the string to split, then a list with only one element containing the entire string argument is returned'; if this string has a different character encoding than str, then it will be converted to str's character encoding |
options | regular expression options; see Regular Expression Constants for possible values |
with_separator | include the separator string in every element |
ENCODING-CONVERSION-ERROR | this exception could be thrown if the string arguments have different character encodings and an error occurs during encoding conversion |
int <string>::strlen | ( | ) |
Returns the number of bytes in the string (not including the terminating null character ('\0'
)
'\0'
)bool <string>::strp | ( | ) |
string <string>::substr | ( | softint | start | ) |
Returns a portion of a string starting from an integer offset.
Arguments can be negative, giving offsets from the end of the string. All offsets are character positions, not byte positions.
start | The starting character for the substring where the first character is at offset 0; if the offset is negative, it designates the number of characters from the end of the string. If the offset is 0, then the entire string is returned. |
INVALID-ENCODING | this exception could be thrown if a character offset calculation fails due to invalid encoding of multi-byte character data |
string <string>::substr | ( | softint | start, |
softint | len | ||
) |
Returns a portion of a string starting from an integer offset, with a length parameter.
Arguments can be negative, giving offsets from the end of the string. All offsets are character positions, not byte positions.
start | The starting character for the substring where the first character is at offset 0; if the offset is negative, it designates the number of characters from the end of the string |
len | The maximum number of characters to copy; if this value is negative, the rest of the string from start will be copied to the substring, except without - len characters from the end of the string |
INVALID-ENCODING | this exception could be thrown if a character offset calculation fails due to invalid encoding of multi-byte character data |
string <string>::toBase64 | ( | softint | maxlinelen = -1 | ) |
Returns the base64-encoded representation of the string.
Implementation based on RFC-1421 and RFC-2045
maxlinelen | the maximum length of a line in the resulting output string in bytes; if this value is > 0 then output lines will be separated by CRLF characters |
binary <string>::toBinary | ( | ) |
Returns a binary value with the string's data.
string <string>::toHex | ( | ) |
returns a string of hexadecimal digits corresponding to the contents of the string
Converts the string to an integer value with respect to the base
If a character is out of range of the corresponding base only the substring preceeding the character is taken into account.
Base 0 means to interpret as a code literal, so that the actual base is 8, 10, or 16.
base | the base of the integer in the string; this value must be 0 or 2 - 36 inclusive or an INVALID-BASE exception will be thrown |
INVALID-BASE | the base is invalid; must be 0 or 2 - 36 inclusive |
UNSUPPORTED-ENCODING | only ASCII-compatible encodings are currently supported |
string <string>::toMD5 | ( | ) |
Returns the MD5 message digest of the string as a hex string.
The trailing null character is not included in the digest returned.
"5d41402abc4b2a76b9719d911017c592"
)MD5-DIGEST-ERROR | error calculating digest (should not normally happen) |
string <string>::toSHA1 | ( | ) |
Returns the SHA1 message digest of the string as a hex string.
The trailing null character is not included in the digest returned.
"aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d"
)SHA1-DIGEST-ERROR | error calculating digest (should not normally happen) |
string <string>::toSHA224 | ( | ) |
Returns the SHA-224 message digest (a variant of SHA-2) of the string as a hex string.
The trailing null character is not included in the digest returned.
"ea09ae9cc6768c50fcee903ed054556e5bfc8347907f12598aa24193"
)SHA224-DIGEST-ERROR | error calculating digest (should not normally happen) |
string <string>::toSHA256 | ( | ) |
Returns the SHA-256 message digest (a variant of SHA-2) of the string as a hex string.
The trailing null character is not included in the digest returned.
"2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
)SHA256-DIGEST-ERROR | error calculating digest (should not normally happen) |
string <string>::toSHA384 | ( | ) |
Returns the SHA-384 message digest (a variant of SHA-2) of the string as a hex string.
The trailing null character is not included in the digest returned.
"59e1748777448c69de6b800d7a33bbfb9ff1b463e44354c3553bcdb9c666fa90125a3c79f90397bdf5f6a13de828684f"
)SHA384-DIGEST-ERROR | error calculating digest (should not normally happen) |
string <string>::toSHA512 | ( | ) |
Returns the SHA-512 message digest (a variant of SHA-2) of the string as a hex string.
The trailing null character is not included in the digest returned.
"9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043"
)SHA512-DIGEST-ERROR | error calculating digest (should not normally happen) |
int <string>::typeCode | ( | ) |
Returns Qore::NT_STRING.
string <string>::unaccent | ( | ) |
Returns a string with all accented characters removed.
The returned string has the same encoding as the original input.
string <string>::upr | ( | ) |
Returns the string in upper case.
This pseudo-method operates on a very wide range of non-ASCII characters using a Unicode lookup table for mapping Latin, Cyrillic, Greek, Armenian, Georgian, etc characters.
bool <string>::val | ( | ) |
int <string>::width | ( | ) |
Returns the width of characters in the string; some unicode characters take up multiple spaces on output.