Generic codecvt facet for various stateless encodings to UTF-16 and UTF-32 using wchar_t, char32_t and char16_t.
More...
template<typename CharType, typename CodecvtImpl, int CharSize = sizeof(CharType)>
class boost::locale::generic_codecvt< CharType, CodecvtImpl, CharSize >
Generic codecvt facet for various stateless encodings to UTF-16 and UTF-32 using wchar_t, char32_t and char16_t.
Implementations should derive from this class defining itself as CodecvtImpl and provide following members
- state_type - a type of special object that allows to store intermediate cached data, for example iconv_t descriptor
- state_type initial_state(generic_codecvt_base::initial_convertion_state direction) const - member function that creates initial state
- int max_encoding_length() const - a maximal length that one Unicode code point is represented, for UTF-8 for example it is 4 from ISO-8859-1 it is 1
- utf::code_point to_unicode(state_type& state, const char*& begin, const char* end) - extract first code point from the text in range [begin,end), in case of success begin would point to the next character sequence to be encoded to next code point, in case of incomplete sequence - utf::incomplete shell be returned, and in case of invalid input sequence utf::illegal shell be returned and begin would remain unmodified
- utf::len_or_error from_unicode(state_type &state, utf::code_point u, char* begin, const char* end) - convert a Unicode code point u into a character sequence at [begin,end). Return the length of the sequence in case of success, utf::incomplete in case of not enough room to encode the code point, or utf::illegal in case conversion can not be performed
For example implementation of codecvt for latin1/ISO-8859-1 character set
template<typename CharType>
{
public:
latin1_codecvt(
size_t refs = 0): boost::
locale::
generic_codecvt<CharType,latin1_codecvt<CharType> >(refs)
{
}
struct state_type {};
{
return state_type();
}
int max_encoding_length() const
{
return 1;
}
{
if(begin == end)
return *begin++;
}
char* begin, const char* end) const
{
if(u >= 256)
if(begin == end)
*begin = u;
return 1;
}
};
initial_convertion_state
Initial state for converting to or from Unicode code points, used by initial_state in derived classes...
Definition generic_codecvt.hpp:43
Generic codecvt facet for various stateless encodings to UTF-16 and UTF-32 using wchar_t,...
Definition generic_codecvt.hpp:151
constexpr code_point illegal
Special constant that defines illegal code point.
Definition utf.hpp:22
code_point len_or_error
Either a length/size or an error (illegal/incomplete)
Definition utf.hpp:27
uint32_t code_point
The integral type that can hold a Unicode code point.
Definition utf.hpp:19
constexpr code_point incomplete
Special constant that defines incomplete code point.
Definition utf.hpp:24
This is the main namespace that encloses all localization classes.
Definition boundary_point.hpp:13
When external tools used for encoding conversion, the state_type is useful to save objects used for conversions. For example, icu::UConverter can be saved in such a state for an efficient use:
template<typename CharType>
{
public:
icu_codecvt(std::string const &name,refs = 0):
{ ... }
using state_type = std::unique_ptr<UConverter,void (*)(UConverter*)>;
{
UErrorCode err = U_ZERO_ERROR;
return state_type(ucnv_safeClone(converter_,0,0,&err),ucnv_close);
}
{
UErrorCode err = U_ZERO_ERROR;
...
}
...
};