Notes C API探訪: NLS_translate(関数)(その2)

2021年8月19日 08:20

数回にわたってLMBCS文字列変換とNLS(National Language Services)の関数について紹介してきました。今回は、そのままでは使い勝手の悪いLMBCS変換関数をラップして、手軽に使えるようにします。

NLS_translateをラップ

まず、NLS_translate関数をラップしたnxpp::translateテンプレート関数を、次のようにしてみます。

// nxpp/include/nxpp/lmbcs/nxpp_translate.hpp

#ifndef NXPP_TRANSLATE_HPP
#define NXPP_TRANSLATE_HPP

#include "../nls/nxpp_nlssize.hpp"
#include "../nls/nxpp_nlsinfo.hpp"
#include <string>

#ifdef NT
#pragma pack(push, 1)
#endif

#include <global.h>
#include <nls.h>
#include <algorithm>

#ifdef NT
#pragma pack(pop)
#endif

namespace nxpp {

/**
* @brief キロバイトをバイトにします。
* @tparam T 数値型
* @param n キロバイト
* @return バイト
*/
template <typename T>
T KiBytesToBytes(T n) { return n * 1024; }

/**
* @brief 文字列を別の文字セットに変換する
* @tparam Source 変換元文字列のデータ型
* @tparam Target 変換先文字列のデータ型
* @param source 変換元文字列
* @param srcInfo 変換元文字セット情報
* @param targetInfo 変換先文字セット情報
* @param mode 変換モード(NLS_translate用フラグ)
* @param ratio 用意するサイズの最大比
*/
template <class Source, class Target>
Target translate(
   const Source &source,
   const nls::Info &srcInfo,
   const nls::Info &targetInfo,
   WORD mode,
   int ratio
   ) {
 // 変換元のバイト長
 int maxBytes = source.length();

 // 一度に変換するチャンクサイズを、8KBを境に1KBか8KBかを選択
 int maxChunkBytes = (maxBytes > KiBytesToBytes<int>(8))
     ? KiBytesToBytes<int>(8)
     : KiBytesToBytes<int>(1);

 // 処理中の先頭バイト位置
 int offsetPos = 0;

 // 変換済み文字列の保管先
 Target result;

 while (offsetPos < maxBytes) {

   // チャンクサイズの候補を計算
   int chunkBytes = std::min<int>(maxBytes - offsetPos, maxChunkBytes);

   // 切れ目に応じてチャンクサイズを調整
   chunkBytes = nls::adjustByteSize_w(
         source.data() + offsetPos,
         static_cast<WORD>(chunkBytes),
         srcInfo.get()
         );

   // 処理する文字列がない場合はループを抜ける。
   if (chunkBytes <= 0) break;

   // 変換した文字列を格納するバッファを準備する
   int bufSize = chunkBytes * ratio;
   std::string buffer(bufSize, '\0');

   // 変換する
   WORD len = static_cast<WORD>(bufSize);
   nls::Status status = NLS_translate(
           reinterpret_cast<BYTE*>(
             const_cast<char*>(source.data()) + offsetPos
           ),
           static_cast<WORD>(chunkBytes),
           reinterpret_cast<BYTE*>(buffer.data()),
           &len,
           mode,
           targetInfo.get()
           );
   if (!status) throw status;

   // 変換した文字列を結果に追加
   result += buffer.substr(0, static_cast<int>(len));

   // 処理した分だけオフセットの位置を移動
   offsetPos += chunkBytes;
 }
 return result;
}

テンプレート引数のSourceとTargetは、std::stringまたその派生型を想定しています。これらが持つ文字列長はint型(4バイト)ですが、NLS_translate関数が一度に処理できる文字列長はWORD型(2バイト)なので、その差分を埋めるために「チャンク」を用意して、チャンク単位で変換しながら、結果をつなげていきます。
チャンクにまとめるためには、LMBCSでは文字長がバラバラなのがネックになります。そこで、以前紹介したNLS_string_bytes/NLS_string_charsのラップ関数を利用して、正しい「切れ目」を算出しています。

Unicode(UTF-16)からLMBCSに変換

先ほどのテンプレートを利用して、UTF-16文字セットの文字列が格納されたstd::wstringからLMBCSに変換する場合、次のように実装できます。

// nxpp/include/nxpp/lmbcs/nxpp_translate.hpp(続き)

/**
* @brief UnicodeからLMBCS
* @tparam Target LMBCSデータ型
* @param unicode Unicode文字列
* @returns 変換後のLMBCS文字列
*/
template <class Target>
Target unicodeToLmbcs(const std::wstring &unicode) {
 std::string string(
       reinterpret_cast<const char*>(unicode.data()),
       unicode.length() * sizeof(wchar_t)
       );
 nls::LoadTypeInfo<NLS_CS_UNICODE> unicodeInfo;
 nls::LmbcsInfo lmbcsInfo;
 return translate<std::string, Target>(
       string,
       unicodeInfo,
       lmbcsInfo,
       NLS_NONULLTERMINATE | NLS_SOURCEISUNICODE | NLS_TARGETISLMBCS,
       NLS_MAXRATIO_XLATE_TO_LMBCS
       );
}

LMBCSからUnicode(UTF-16)に変換

また、LMBCSからstd::wstring(UTF-16文字セットの文字列)に変換する場合は、次のように実装できます。

// nxpp/include/nxpp/lmbcs/nxpp_translate.hpp(続き)

/**
* @brief LMBCSからUnicode
* @tparam Source LMBCSデータ型
* @param lmbcs LMBCS文字列
* @returns 変換後のUnicode文字列
*/
template <class Source>
std::wstring lmbcsToUnicode(const Source &lmbcs) {
 nls::LoadTypeInfo<NLS_CS_UNICODE> unicodeInfo;
 nls::LmbcsInfo lmbcsInfo;
 std::string string = translate<Source, std::string>(
       lmbcs,
       lmbcsInfo,
       unicodeInfo,
       NLS_NONULLTERMINATE | NLS_SOURCEISLMBCS | NLS_TARGETISUNICODE,
       NLS_MAXRATIO_XLATE_FROM_LMBCS
       );
 return std::wstring(
       reinterpret_cast<const wchar_t*>(string.data()),
       string.length() / sizeof(wchar_t)
       );
}

} // namespace nxpp

#endif // NXPP_TRANSLATE_HPP

まとめ

Notes C API探訪シリーズでは、今後C++側のデフォルト文字セットをUnicode(UTF-16)、文字列型をstd::wstringとしていきます。Qt SDKも絡みますので、QStringやQByteArrayとの変換についてもご紹介していけたらと思っています。

この記事が気に入ったらサポートをしてみませんか？