\hypertarget{character_encoding_character_encoding_overview}{}\doxysection{Overview}\label{character_encoding_character_encoding_overview}
The Qore language is character-\/encoding aware. All strings are assumed to have the default character encoding, unless the program explicitly specified another encoding for certain objects and operations. Every Qore string has a character encoding ID attached to it, so, when another encoding is required, the Qore language will attempt to do an encoding translation.

Qore uses the operating system\textquotesingle{}s {\ttfamily iconv} library functions to perform any encoding conversions.

Qore supports character encodings that are backwards compatible with 7-\/bit {\ttfamily ASCII}. This includes all {\ttfamily ISO-\/8859-\/$\ast$} character encodings, {\ttfamily UTF-\/8}, {\ttfamily KOIR-\/8}, {\ttfamily KOIU-\/8}, and {\ttfamily KOI7}, among others (see the table below\+: \mbox{\hyperlink{character_encoding_known_encodings}{Known Character Encodings}}).

However, mutibyte character encodings are currently only properly supported for {\ttfamily UTF-\/8}. For {\ttfamily UTF-\/8} strings, the length(), index(), rindex(), substr(), reverse(), the \mbox{\hyperlink{operators_splice}{splice operator}}, \mbox{\hyperlink{group__string__functions_string_formatting}{print formatting}} (regarding field lengths) functions and methods taking format strings, and regular expression operators and functions, all work with character offsets, which may be different than byte offsets. For all character encodings other than {\ttfamily UTF-\/8}, a 1 byte=1 character relationship is assumed.

Qore will accept any encoding name given to it, even if it is not a known encoding name or alias. In this case, Qore will tag the strings with this encoding, and pass this user-\/defined encoding name to the {\ttfamily iconv} library when encodings must be converted. This allows programmers to use encodings known by the system\textquotesingle{}s {\ttfamily iconv} library, but unknown to Qore. In this case, Qore will assume that the strings are backwards compatible with {\ttfamily ASCII}, meaning that that one character is represented by one byte and that the strings are null-\/terminated.

Note that when Qore matches an encoding name to a code or alias in the following table, the comparison is not case-\/sensitive.\hypertarget{character_encoding_known_encodings}{}\doxysection{Character Encodings Known to Qore}\label{character_encoding_known_encodings}
 \tabulinesep=1mm
\begin{longtabu}spread 0pt [c]{*{3}{|X[-1]}|}
\hline
{\bfseries{Code}} &{\bfseries{Aliases}} &{\bfseries{Description}}  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/1}  &{\ttfamily ISO88591}, {\ttfamily ISO8859-\/1}, {\ttfamily ISO-\/88591}, {\ttfamily ISO8859\+P1}, {\ttfamily ISO81}, {\ttfamily LATIN1}, {\ttfamily LATIN-\/1}  &latin-\/1, Western European character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/2}  &{\ttfamily ISO88592}, {\ttfamily ISO8859-\/2}, {\ttfamily ISO-\/88592}, {\ttfamily ISO8859\+P2}, {\ttfamily ISO82}, {\ttfamily LATIN2}, {\ttfamily LATIN-\/2}  &latin-\/2, Central European character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/3}  &{\ttfamily ISO88593}, {\ttfamily ISO8859-\/3}, {\ttfamily ISO-\/88593}, {\ttfamily ISO8859\+P3}, {\ttfamily ISO83}, {\ttfamily LATIN3}, {\ttfamily LATIN-\/3}  &latin-\/3, Southern European character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/4}  &{\ttfamily ISO88594}, {\ttfamily ISO8859-\/4}, {\ttfamily ISO-\/88594}, {\ttfamily ISO8859\+P4}, {\ttfamily ISO84}, {\ttfamily LATIN4}, {\ttfamily LATIN-\/4}  &latin-\/4, Northern European character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/5}  &{\ttfamily ISO88595}, {\ttfamily ISO8859-\/5}, {\ttfamily ISO-\/88595}, {\ttfamily ISO8859\+P5}, {\ttfamily ISO85}  &Cyrillic character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/6}  &{\ttfamily ISO88596}, {\ttfamily ISO8859-\/6}, {\ttfamily ISO-\/88596}, {\ttfamily ISO8859\+P6}, {\ttfamily ISO86}  &Arabic character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/7}  &{\ttfamily ISO88597}, {\ttfamily ISO8859-\/7}, {\ttfamily ISO-\/88597}, {\ttfamily ISO8859\+P7}, {\ttfamily ISO87}  &Greek character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/8}  &{\ttfamily ISO88598}, {\ttfamily ISO8859-\/8}, {\ttfamily ISO-\/88598}, {\ttfamily ISO8859\+P8}, {\ttfamily ISO88}  &Hebrew character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/9}  &{\ttfamily ISO88599}, {\ttfamily ISO8859-\/9}, {\ttfamily ISO-\/88599}, {\ttfamily ISO8859\+P9}, {\ttfamily ISO89}, {\ttfamily LATIN5}, {\ttfamily LATIN-\/5}  &latin-\/5, Turkish character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/10}  &{\ttfamily ISO885910}, {\ttfamily ISO8859-\/10}, {\ttfamily ISO-\/885910}, {\ttfamily ISO8859\+P10}, {\ttfamily ISO810}, {\ttfamily LATIN6}, {\ttfamily LATIN-\/6}  &latin-\/6, Nordic character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/11}  &{\ttfamily ISO885911}, {\ttfamily ISO8859-\/11}, {\ttfamily ISO-\/885911}, {\ttfamily ISO8859\+P11}, {\ttfamily ISO811}  &Thai character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/13}  &{\ttfamily ISO885913}, {\ttfamily ISO8859-\/13}, {\ttfamily ISO-\/885913}, {\ttfamily ISO8859\+P13}, {\ttfamily ISO813}, {\ttfamily LATIN7}, {\ttfamily LATIN-\/7}  &latin-\/7, Baltic rim character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/14}  &{\ttfamily ISO885914}, {\ttfamily ISO8859-\/14}, {\ttfamily ISO-\/885914}, {\ttfamily ISO8859\+P14}, {\ttfamily ISO814}, {\ttfamily LATIN8}, {\ttfamily LATIN-\/8}  &latin-\/8, Celtic character set  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/15}  &{\ttfamily ISO885915}, {\ttfamily ISO8859-\/15}, {\ttfamily ISO-\/885915}, {\ttfamily ISO8859\+P15}, {\ttfamily ISO815}, {\ttfamily LATIN9}, {\ttfamily LATIN-\/9}  &latin-\/9, Western European with euro symbol  \\\cline{1-3}
{\ttfamily ISO-\/8859-\/16}  &{\ttfamily ISO885916}, {\ttfamily ISO8859-\/16}, {\ttfamily ISO-\/885916}, {\ttfamily ISO8859\+P16}, {\ttfamily ISO816}, {\ttfamily LATIN10}, {\ttfamily LATIN-\/10}  &latin-\/10, Southeast European character set  \\\cline{1-3}
{\ttfamily KOI7}  &n/a &Russian\+: Kod Obmena Informatsiey, 7 bit characters  \\\cline{1-3}
{\ttfamily KOI8-\/R}  &{\ttfamily KOI8R}  &Russian\+: Kod Obmena Informatsiey, 8 bit  \\\cline{1-3}
{\ttfamily KOI8-\/U}  &{\ttfamily KOI8U}  &Ukrainian\+: Kod Obmena Informatsiey, 8 bit  \\\cline{1-3}
{\ttfamily US-\/\+ASCII}  &{\ttfamily ASCII}, {\ttfamily USASCII}  &7-\/bit ASCII character set  \\\cline{1-3}
{\ttfamily UTF-\/8}  &{\ttfamily UTF8}  &variable-\/width universal character set  \\\cline{1-3}
{\ttfamily UTF-\/16}  &{\ttfamily UTF16}  &variable-\/width universal character set based on a fundamental 2-\/byte character encoding; not backwards-\/compatible with ASCII and therefore not supported universally in Qore; it\textquotesingle{}s recommended to convert these strings to UTF-\/8 in Qore; do not use UTF-\/16 as the default character encoding in Qore  \\\cline{1-3}
{\ttfamily UTF-\/16\+BE}  &{\ttfamily UTF16\+BE}  &variable-\/width universal character set based on a fundamental 2-\/byte character encoding with big-\/endian encoding; not backwards-\/compatible with ASCII and therefore not supported universally in Qore; it\textquotesingle{}s recommended to convert these strings to UTF-\/8 in Qore; do not use UTF-\/16\+BE as the default character encoding in Qore  \\\cline{1-3}
{\ttfamily UTF-\/16\+LE}  &{\ttfamily UTF16\+LE}  &variable-\/width universal character set based on a fundamental 2-\/byte character encoding with little-\/endian encoding; not backwards-\/compatible with ASCII and therefore not supported universally in Qore; it\textquotesingle{}s recommended to convert these strings to UTF-\/8 in Qore; do not use UTF-\/16\+LE as the default character encoding in Qore  \\\cline{1-3}
{\ttfamily WINDOWS-\/874}  &{\ttfamily WINDOWS874}, {\ttfamily CP-\/874}, {\ttfamily CP874}  &Windows 874\+: character encoding for Latin/\+Thai, very similar to ISO-\/8859-\/11  \\\cline{1-3}
{\ttfamily WINDOWS-\/936}  &{\ttfamily WINDOWS936}, {\ttfamily CP-\/936}, {\ttfamily CP936}  &Windows 936\+: character encoding for simplified Chinese  \\\cline{1-3}
{\ttfamily WINDOWS-\/1250}  &{\ttfamily WINDOWS1250}, {\ttfamily CP-\/1250}, {\ttfamily CP1250}  &Windows 1250\+: character encoding for Central/\+Eastern European languages  \\\cline{1-3}
{\ttfamily WINDOWS-\/1251}  &{\ttfamily WINDOWS1251}, {\ttfamily CP-\/1251}, {\ttfamily CP1251}  &Windows 1251\+: character encoding for Cyrillic\+: Russian, Ukrainian, Balarusian, Bulgarian, Serbian Cyrillic, Macedonian, and others  \\\cline{1-3}
{\ttfamily WINDOWS-\/1252}  &{\ttfamily WINDOWS1252}, {\ttfamily CP-\/1252}, {\ttfamily CP1252}  &Windows 1252\+: character encoding for Western European languages\+: Spanish, French, German  \\\cline{1-3}
{\ttfamily WINDOWS-\/1253}  &{\ttfamily WINDOWS1253}, {\ttfamily CP-\/1253}, {\ttfamily CP1253}  &Windows 1253\+: character encoding for Greek  \\\cline{1-3}
{\ttfamily WINDOWS-\/1254}  &{\ttfamily WINDOWS1254}, {\ttfamily CP-\/1254}, {\ttfamily CP1254}  &Windows 1254\+: character encoding for Turkish  \\\cline{1-3}
{\ttfamily WINDOWS-\/1255}  &{\ttfamily WINDOWS1255}, {\ttfamily CP-\/1255}, {\ttfamily CP1255}  &Windows 1255\+: character encoding for Hebrew  \\\cline{1-3}
{\ttfamily WINDOWS-\/1256}  &{\ttfamily WINDOWS1256}, {\ttfamily CP-\/1256}, {\ttfamily CP1256}  &Windows 1256\+: character encoding for Arabic  \\\cline{1-3}
{\ttfamily WINDOWS-\/1257}  &{\ttfamily WINDOWS1257}, {\ttfamily CP-\/1257}, {\ttfamily CP1257}  &Windows 1257\+: character encoding for Baltic languages  \\\cline{1-3}
{\ttfamily WINDOWS-\/1258}  &{\ttfamily WINDOWS1258}, {\ttfamily CP-\/1258}, {\ttfamily CP1258}  &Windows 1258\+: character encoding for Vietnamese  \\\cline{1-3}
\end{longtabu}
\hypertarget{character_encoding_utf16_in_qore}{}\doxysubsection{UTF-\/16 Support in Qore}\label{character_encoding_utf16_in_qore}
UTF-\/16 is currently not well supported in Qore, because Qore\textquotesingle{}s string support is based on the assumption that all strings are backwards-\/compatible with ASCII, and UTF-\/16 is not due to the minimum 2-\/byte character width and the possibility of embedded null bytes.

It\textquotesingle{}s possible to generate string data in UTF-\/16 encoding (using \mbox{\hyperlink{group__string__functions_gab1555ebefd40327741bb93177a6c8ec0}{Qore\+::convert\+\_\+encoding()}}), however note that all strings so generated will be tagged with a BOM (byte order marker) at the beginning of the string data (this is performed by libiconv).

The following classes support parsing UTF-\/16 data by converting it to UTF-\/8 and processing the UTF-\/8 data\+:
\begin{DoxyItemize}
\item \mbox{\hyperlink{class_qore_1_1_data_line_iterator}{Qore\+::\+Data\+Line\+Iterator}}
\item \mbox{\hyperlink{class_qore_1_1_file_line_iterator}{Qore\+::\+File\+Line\+Iterator}}
\end{DoxyItemize}

The following classes support processing UTF-\/16 data natively\+:
\begin{DoxyItemize}
\item \mbox{\hyperlink{class_qore_1_1_buffered_stream_reader}{Qore\+::\+Buffered\+Stream\+Reader}}
\item \mbox{\hyperlink{class_qore_1_1_input_stream_line_iterator}{Qore\+::\+Input\+Stream\+Line\+Iterator}}
\item \mbox{\hyperlink{class_qore_1_1_stream_reader}{Qore\+::\+Stream\+Reader}}
\end{DoxyItemize}

Many string operations on UTF-\/16 data will provide invalid results due to the embedded nulls.

\begin{DoxyRefDesc}{Bug}
\item[\mbox{\hyperlink{bug__bug000001}{Bug}}]With the exception of the classes above that explicitly support UTF-\/16 data, BOMs are ignored and all UTF-\/16 data is assumed to be big-\/endian; little-\/endian UTF-\/16-\/encoded data, even with a correct BOM, will not be processed correctly in Qore (in this case use the {\ttfamily UTF-\/16\+LE} encoding specifically)\end{DoxyRefDesc}
\hypertarget{character_encoding_default_encoding}{}\doxysection{Default Character Encoding}\label{character_encoding_default_encoding}
The default character encoding for Qore is determined by environment variables.

First, the {\ttfamily QORE\+\_\+\+CHARSET} environment variable is checked. If it is set, then this character encoding will be the default character encoding for the process. If not, then the {\ttfamily LANG} environment variable is checked. If a character encoding is specified in the {\ttfamily LANG} environment variable, then it will be used as the default character encoding. Otherwise, if no character encoding can be derived from the environment, {\ttfamily UTF-\/8} is assumed.

Character encodings are automatically converted by the Qore language when necessary. Encoding conversion errors will cause a Qore exception to be thrown. The character encoding conversions supported by Qore depend on the operating system\textquotesingle{}s {\ttfamily iconv} library function.

\begin{DoxyNote}{Note}
The get\+\_\+default\+\_\+encoding() function will return the default encoding for the Qore process.
\end{DoxyNote}
\hypertarget{character_encoding_encoding_examples}{}\doxysection{Character Encoding Usage Examples}\label{character_encoding_encoding_examples}
The following is a non-\/exhaustive list of examples in Qore where character encoding processing is performed.

Character encodings can be explicitly performed with the convert\+\_\+encoding() function, and the encoding attached to a string can be checked with the get\+\_\+encoding() function. If you have a string with incorrect encoding and want to change the encoding tag of the string (without changing the actual bytes of the string), use the force\+\_\+encoding() function.

get\+\_\+default\+\_\+encoding() returns the default encoding for the Qore process.

The \mbox{\hyperlink{class_qore_1_1_s_q_l_1_1_datasource}{Qore\+::\+SQL\+::\+Datasource}}, \mbox{\hyperlink{class_qore_1_1_s_q_l_1_1_datasource_pool}{Qore\+::\+SQL\+::\+Datasource\+Pool}}, and \mbox{\hyperlink{class_qore_1_1_s_q_l_1_1_s_q_l_statement}{Qore\+::\+SQL\+::\+SQLStatement}} classes will translate character encodings to the encoding required by the database if necessary as well (this is actually the responsibility of the DBI driver for the database in question).

The \mbox{\hyperlink{class_qore_1_1_file}{Qore\+::\+File}} and \mbox{\hyperlink{class_qore_1_1_socket}{Qore\+::\+Socket}} classes translate character encodings to the encoding specified for the object if necessary, as well as tagging strings received or read with the object\textquotesingle{}s encoding.

The \mbox{\hyperlink{class_qore_1_1_h_t_t_p_client}{Qore\+::\+HTTPClient}} class will translate character encodings to the encoding specified for the object if necessary, as well as tag strings received with the object\textquotesingle{}s encoding. Additionally, if an HTTP server response specifies a specific encoding to use, the encoding of strings read from the server will be automatically set to this encoding as well. 