ML Reference
mlConvertUTF.h File Reference

Internal conversion file between UTF32, UTF-16, and UTF-8 adopted for usage in mlUtils. More...

#include "mlTypeDefs.h"

Go to the source code of this file.

Macros

Some fundamental constants
#define UNI_REPLACEMENT_CHAR   (static_cast<UTF32>(0x0000FFFD))
 Used instead of invalid characters on lenient conversion.
 
#define UNI_MAX_BMP   (static_cast<UTF32>(0x0000FFFF))
 
#define UNI_MAX_UTF16   (static_cast<UTF32>(0x0010FFFF))
 
#define UNI_MAX_UTF32   (static_cast<UTF32>(0x7FFFFFFF))
 

Typedefs

typedef MLuint32 UTF32
 Note : This file has been adapted to the purpose of ML and MeVisLab usage.
 
typedef MLuint16 UTF16
 at least 16 bits
 
typedef MLuint8 UTF8
 typically 8 bits
 

Enumerations

enum  ConversionResult { conversionOK , sourceExhausted , targetExhausted , sourceIllegal }
 Enum to describe conversion results. More...
 
enum  ConversionFlags { strictConversion = 0 , lenientConversion }
 Enum to describe conversion strictness. More...
 

Functions

Standard conversion routines.
ConversionResult ConvertUTF32toUTF16 (const UTF32 **sourceStart, const UTF32 *sourceEnd, UTF16 **targetStart, UTF16 *targetEnd, ConversionFlags flags)
 Converts UTF32 string to UTF16 string.
 
ConversionResult ConvertUTF16toUTF32 (const UTF16 **sourceStart, const UTF16 *sourceEnd, UTF32 **targetStart, UTF32 *targetEnd, ConversionFlags flags)
 Converts UTF16 string to UTF32 string.
 
ConversionResult ConvertUTF16toUTF8 (const UTF16 **sourceStart, const UTF16 *sourceEnd, UTF8 **targetStart, UTF8 *targetEnd, ConversionFlags flags)
 Converts UTF16 string to UTF8 string.
 
ConversionResult ConvertUTF8toUTF16 (const UTF8 **sourceStart, const UTF8 *sourceEnd, UTF16 **targetStart, UTF16 *targetEnd, ConversionFlags flags)
 Converts UTF8 string to UTF16 string.
 
ConversionResult ConvertUTF32toUTF8 (const UTF32 **sourceStart, const UTF32 *sourceEnd, UTF8 **targetStart, UTF8 *targetEnd, ConversionFlags flags)
 Converts UTF32 string to UTF8 string.
 
ConversionResult ConvertUTF8toUTF32 (const UTF8 **sourceStart, const UTF8 *sourceEnd, UTF32 **targetStart, UTF32 *targetEnd)
 Converts UTF8 string to UTF32 string.
 
MeVis Extra Code
ConversionResult ConvertUTF8toLatin1 (const UTF8 *sourceStart, char *targetStart, char *targetEnd, ConversionFlags flags)
 Convert UTF8 to latin1 chars.
 
ConversionResult CalculateNumCharsInUTF8 (const UTF8 *sourceStart, unsigned int *length, ConversionFlags flags)
 Calculate the number of chars in an UTF8 string (may be less than strlen, because of multibyte encoded chars).
 
ConversionResult CalculateUTF16BufferSizeForUTF8 (const UTF8 *sourceStart, unsigned int *length, ConversionFlags flags)
 Calculate the number of chars required in UTF16 encoding to fit the given UTF8 string (may be less than strlen, because of multibyte encoded chars), but more than CalculateNumCharsInUTF8() because of UTF16 surrogates.
 

Detailed Description

Internal conversion file between UTF32, UTF-16, and UTF-8 adopted for usage in mlUtils.

Definition in file mlConvertUTF.h.

Macro Definition Documentation

◆ UNI_MAX_BMP

#define UNI_MAX_BMP   (static_cast<UTF32>(0x0000FFFF))

Definition at line 136 of file mlConvertUTF.h.

◆ UNI_MAX_UTF16

#define UNI_MAX_UTF16   (static_cast<UTF32>(0x0010FFFF))

Definition at line 137 of file mlConvertUTF.h.

◆ UNI_MAX_UTF32

#define UNI_MAX_UTF32   (static_cast<UTF32>(0x7FFFFFFF))

Definition at line 138 of file mlConvertUTF.h.

◆ UNI_REPLACEMENT_CHAR

#define UNI_REPLACEMENT_CHAR   (static_cast<UTF32>(0x0000FFFD))

Used instead of invalid characters on lenient conversion.

Definition at line 135 of file mlConvertUTF.h.

Typedef Documentation

◆ UTF16

typedef MLuint16 UTF16

at least 16 bits

Definition at line 128 of file mlConvertUTF.h.

◆ UTF32

typedef MLuint32 UTF32

Note : This file has been adapted to the purpose of ML and MeVisLab usage.

That includes

  • changing file type to .cpp,
  • changes in documentation,
  • added tracing code,
  • checks for NULL pointers
  • added exception handlers to get more stability and information in case of crashes,
  • changes of signed integer to unsigned if sign is not needed,
  • removal of warnings and
  • source code reformatting. The following 4 definitions are compiler-specific. The C standard does not guarantee that wchar_t has at least 16 bits, so wchar_t is no less portable than unsigned short! All should be unsigned values to avoid sign extension during bit mask & shift operations. at least 32 bits

Definition at line 127 of file mlConvertUTF.h.

◆ UTF8

typedef MLuint8 UTF8

typically 8 bits

Definition at line 129 of file mlConvertUTF.h.

Enumeration Type Documentation

◆ ConversionFlags

Enum to describe conversion strictness.

Enumerator
strictConversion 

Strict conversion, error on conversion problem.

lenientConversion 

Non-strict conversion, use '?' or UNI_REPLACEMENT_CHAR instead.

Definition at line 154 of file mlConvertUTF.h.

◆ ConversionResult

Enum to describe conversion results.

Enumerator
conversionOK 

conversion successful

sourceExhausted 

partial character in source, but hit end

targetExhausted 

insuff. room in target for conversion

sourceIllegal 

source sequence is illegal/malformed

Definition at line 144 of file mlConvertUTF.h.

Function Documentation

◆ CalculateNumCharsInUTF8()

ConversionResult CalculateNumCharsInUTF8 ( const UTF8 * sourceStart,
unsigned int * length,
ConversionFlags flags )

Calculate the number of chars in an UTF8 string (may be less than strlen, because of multibyte encoded chars).

If any pointer is NULL then sourceIllegal is returned and *length is set to 0 if non NULL. The return value reports errors or success of conversion.

◆ CalculateUTF16BufferSizeForUTF8()

ConversionResult CalculateUTF16BufferSizeForUTF8 ( const UTF8 * sourceStart,
unsigned int * length,
ConversionFlags flags )

Calculate the number of chars required in UTF16 encoding to fit the given UTF8 string (may be less than strlen, because of multibyte encoded chars), but more than CalculateNumCharsInUTF8() because of UTF16 surrogates.

If any pointer is NULL then sourceIllegal is returned and *length is set to 0 if non NULL. The return value reports errors or success of conversion.

◆ ConvertUTF16toUTF32()

ConversionResult ConvertUTF16toUTF32 ( const UTF16 ** sourceStart,
const UTF16 * sourceEnd,
UTF32 ** targetStart,
UTF32 * targetEnd,
ConversionFlags flags )

Converts UTF16 string to UTF32 string.

For detailed parameter and function description see header comments in ConvertUTF.h.

◆ ConvertUTF16toUTF8()

ConversionResult ConvertUTF16toUTF8 ( const UTF16 ** sourceStart,
const UTF16 * sourceEnd,
UTF8 ** targetStart,
UTF8 * targetEnd,
ConversionFlags flags )

Converts UTF16 string to UTF8 string.

For detailed parameter and function description see header comments in ConvertUTF.h.

◆ ConvertUTF32toUTF16()

ConversionResult ConvertUTF32toUTF16 ( const UTF32 ** sourceStart,
const UTF32 * sourceEnd,
UTF16 ** targetStart,
UTF16 * targetEnd,
ConversionFlags flags )

Converts UTF32 string to UTF16 string.

For detailed parameter and function description see header comments in ConvertUTF.h.

◆ ConvertUTF32toUTF8()

ConversionResult ConvertUTF32toUTF8 ( const UTF32 ** sourceStart,
const UTF32 * sourceEnd,
UTF8 ** targetStart,
UTF8 * targetEnd,
ConversionFlags flags )

Converts UTF32 string to UTF8 string.

For detailed parameter and function description see header comments in ConvertUTF.h.

◆ ConvertUTF8toLatin1()

ConversionResult ConvertUTF8toLatin1 ( const UTF8 * sourceStart,
char * targetStart,
char * targetEnd,
ConversionFlags flags )

Convert UTF8 to latin1 chars.

If sourceStart is NULL then sourceIllegal is returned. If targetStart or targetEnd is NULL then targetExhausted is returned. The return value reports errors or success of conversion.

◆ ConvertUTF8toUTF16()

ConversionResult ConvertUTF8toUTF16 ( const UTF8 ** sourceStart,
const UTF8 * sourceEnd,
UTF16 ** targetStart,
UTF16 * targetEnd,
ConversionFlags flags )

Converts UTF8 string to UTF16 string.

For detailed parameter and function description see header comments in ConvertUTF.h.

◆ ConvertUTF8toUTF32()

ConversionResult ConvertUTF8toUTF32 ( const UTF8 ** sourceStart,
const UTF8 * sourceEnd,
UTF32 ** targetStart,
UTF32 * targetEnd )

Converts UTF8 string to UTF32 string.

For detailed parameter and function description see header comments in ConvertUTF.h.