ML Reference
mlConvertUTF.h File Reference

Internal conversion file between UTF32, UTF-16, and UTF-8 adopted for usage in mlUtils. More...

#include "mlTypeDefs.h"

Go to the source code of this file.

Macros

Some fundamental constants
#define UNI_REPLACEMENT_CHAR   (static_cast<UTF32>(0x0000FFFD))
 Used instead of invalid characters on lenient conversion. More...
 
#define UNI_MAX_BMP   (static_cast<UTF32>(0x0000FFFF))
 
#define UNI_MAX_UTF16   (static_cast<UTF32>(0x0010FFFF))
 
#define UNI_MAX_UTF32   (static_cast<UTF32>(0x7FFFFFFF))
 

Typedefs

typedef MLuint32 UTF32
 Note : This file has been adapted to the purpose of ML and MeVisLab usage. More...
 
typedef MLuint16 UTF16
 at least 16 bits More...
 
typedef MLuint8 UTF8
 typically 8 bits More...
 

Enumerations

enum  ConversionResult { conversionOK , sourceExhausted , targetExhausted , sourceIllegal }
 Enum to describe conversion results. More...
 
enum  ConversionFlags { strictConversion = 0 , lenientConversion }
 Enum to describe conversion strictness. More...
 

Functions

Standard conversion routines.
ConversionResult ConvertUTF32toUTF16 (const UTF32 **sourceStart, const UTF32 *sourceEnd, UTF16 **targetStart, UTF16 *targetEnd, ConversionFlags flags)
 Converts UTF-32 string to UTF-16 string. More...
 
ConversionResult ConvertUTF16toUTF32 (const UTF16 **sourceStart, const UTF16 *sourceEnd, UTF32 **targetStart, UTF32 *targetEnd, ConversionFlags flags)
 Converts UTF-16 string to UTF-32 string. More...
 
ConversionResult ConvertUTF16toUTF8 (const UTF16 **sourceStart, const UTF16 *sourceEnd, UTF8 **targetStart, UTF8 *targetEnd, ConversionFlags flags)
 Converts UTF-16 string to UTF-8 string. More...
 
ConversionResult ConvertUTF8toUTF16 (const UTF8 **sourceStart, const UTF8 *sourceEnd, UTF16 **targetStart, UTF16 *targetEnd, ConversionFlags flags)
 Converts UTF-8 string to UTF-16 string. More...
 
ConversionResult ConvertUTF32toUTF8 (const UTF32 **sourceStart, const UTF32 *sourceEnd, UTF8 **targetStart, UTF8 *targetEnd, ConversionFlags flags)
 Converts UTF-32 string to UTF-8 string. More...
 
ConversionResult ConvertUTF8toUTF32 (const UTF8 **sourceStart, const UTF8 *sourceEnd, UTF32 **targetStart, UTF32 *targetEnd)
 Converts UTF-8 string to UTF-32 string. More...
 
MeVis Extra Code
ConversionResult ConvertUTF8toLatin1 (const UTF8 *sourceStart, char *targetStart, char *targetEnd, ConversionFlags flags)
 Converts UTF-8 to latin1 chars. More...
 
ConversionResult CalculateNumCharsInUTF8 (const UTF8 *sourceStart, unsigned int *length, ConversionFlags flags)
 Calculates the number of chars in an UTF-8 string (may be less than strlen, because of multibyte encoded chars). More...
 
ConversionResult CalculateUTF16BufferSizeForUTF8 (const UTF8 *sourceStart, unsigned int *length, ConversionFlags flags)
 Calculates the number of chars required in UTF-16 encoding to fit the given UTF-8 string (may be less than strlen, because of multibyte encoded chars), but more than CalculateNumCharsInUTF8() because of UTF16 surrogates. More...
 

Detailed Description

Internal conversion file between UTF32, UTF-16, and UTF-8 adopted for usage in mlUtils.

Definition in file mlConvertUTF.h.

Macro Definition Documentation

◆ UNI_MAX_BMP

#define UNI_MAX_BMP   (static_cast<UTF32>(0x0000FFFF))

Definition at line 136 of file mlConvertUTF.h.

◆ UNI_MAX_UTF16

#define UNI_MAX_UTF16   (static_cast<UTF32>(0x0010FFFF))

Definition at line 137 of file mlConvertUTF.h.

◆ UNI_MAX_UTF32

#define UNI_MAX_UTF32   (static_cast<UTF32>(0x7FFFFFFF))

Definition at line 138 of file mlConvertUTF.h.

◆ UNI_REPLACEMENT_CHAR

#define UNI_REPLACEMENT_CHAR   (static_cast<UTF32>(0x0000FFFD))

Used instead of invalid characters on lenient conversion.

Definition at line 135 of file mlConvertUTF.h.

Typedef Documentation

◆ UTF16

typedef MLuint16 UTF16

at least 16 bits

Definition at line 128 of file mlConvertUTF.h.

◆ UTF32

typedef MLuint32 UTF32

Note : This file has been adapted to the purpose of ML and MeVisLab usage.

That includes

  • changing file type to .cpp,
  • changes in documentation,
  • added tracing code,
  • checks for NULL pointers
  • added exception handlers to get more stability and information in case of crashes,
  • changes of signed integer to unsigned if sign is not needed,
  • removal of warnings, and
  • source code reformatting. The following four definitions are compiler-specific. The C standard does not guarantee that wchar_t has at least 16 bits, so wchar_t is no less portable than unsigned short! All should be unsigned values to avoid sign extension during bit mask & shift operations. at least 32 bits

Definition at line 127 of file mlConvertUTF.h.

◆ UTF8

typedef MLuint8 UTF8

typically 8 bits

Definition at line 129 of file mlConvertUTF.h.

Enumeration Type Documentation

◆ ConversionFlags

Enum to describe conversion strictness.

Enumerator
strictConversion 

Strict conversion, error on conversion problem.

lenientConversion 

Non-strict conversion, use '?' or UNI_REPLACEMENT_CHAR instead.

Definition at line 154 of file mlConvertUTF.h.

◆ ConversionResult

Enum to describe conversion results.

Enumerator
conversionOK 

conversion successful

sourceExhausted 

partial character in source, but hit end

targetExhausted 

insufficient room in target for conversion

sourceIllegal 

source sequence is illegal/malformed

Definition at line 144 of file mlConvertUTF.h.

Function Documentation

◆ CalculateNumCharsInUTF8()

ConversionResult CalculateNumCharsInUTF8 ( const UTF8 sourceStart,
unsigned int *  length,
ConversionFlags  flags 
)

Calculates the number of chars in an UTF-8 string (may be less than strlen, because of multibyte encoded chars).

If any pointer is NULL, then sourceIllegal is returned and *length is set to 0 if non-NULL. The return value reports errors or success of conversion.

◆ CalculateUTF16BufferSizeForUTF8()

ConversionResult CalculateUTF16BufferSizeForUTF8 ( const UTF8 sourceStart,
unsigned int *  length,
ConversionFlags  flags 
)

Calculates the number of chars required in UTF-16 encoding to fit the given UTF-8 string (may be less than strlen, because of multibyte encoded chars), but more than CalculateNumCharsInUTF8() because of UTF16 surrogates.

If any pointer is NULL, then sourceIllegal is returned and *length is set to 0 if non-NULL. The return value reports errors or success of conversion.

◆ ConvertUTF16toUTF32()

ConversionResult ConvertUTF16toUTF32 ( const UTF16 **  sourceStart,
const UTF16 sourceEnd,
UTF32 **  targetStart,
UTF32 targetEnd,
ConversionFlags  flags 
)

Converts UTF-16 string to UTF-32 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.

◆ ConvertUTF16toUTF8()

ConversionResult ConvertUTF16toUTF8 ( const UTF16 **  sourceStart,
const UTF16 sourceEnd,
UTF8 **  targetStart,
UTF8 targetEnd,
ConversionFlags  flags 
)

Converts UTF-16 string to UTF-8 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.

◆ ConvertUTF32toUTF16()

ConversionResult ConvertUTF32toUTF16 ( const UTF32 **  sourceStart,
const UTF32 sourceEnd,
UTF16 **  targetStart,
UTF16 targetEnd,
ConversionFlags  flags 
)

Converts UTF-32 string to UTF-16 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.

◆ ConvertUTF32toUTF8()

ConversionResult ConvertUTF32toUTF8 ( const UTF32 **  sourceStart,
const UTF32 sourceEnd,
UTF8 **  targetStart,
UTF8 targetEnd,
ConversionFlags  flags 
)

Converts UTF-32 string to UTF-8 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.

◆ ConvertUTF8toLatin1()

ConversionResult ConvertUTF8toLatin1 ( const UTF8 sourceStart,
char *  targetStart,
char *  targetEnd,
ConversionFlags  flags 
)

Converts UTF-8 to latin1 chars.

If sourceStart is NULL, then sourceIllegal is returned. If targetStart or targetEnd is NULL, then targetExhausted is returned. The return value reports errors or success of conversion.

◆ ConvertUTF8toUTF16()

ConversionResult ConvertUTF8toUTF16 ( const UTF8 **  sourceStart,
const UTF8 sourceEnd,
UTF16 **  targetStart,
UTF16 targetEnd,
ConversionFlags  flags 
)

Converts UTF-8 string to UTF-16 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.

◆ ConvertUTF8toUTF32()

ConversionResult ConvertUTF8toUTF32 ( const UTF8 **  sourceStart,
const UTF8 sourceEnd,
UTF32 **  targetStart,
UTF32 targetEnd 
)

Converts UTF-8 string to UTF-32 string.

For detailed parameter and function description, see header comments in ConvertUTF.h.