C++20 std::format part 3 – 自定義型別的處理

這系列應該是最後一篇了。在講完了基本的使用、還有一些其他的函式，最後這邊來講一下要怎麼針對自己定義的型別、或是官方沒支援的型別做處理、讓他可以用在 std::format 上。

不過 Heresy 這邊也必須講一下，Heresy 自己沒有找到針對這部分比較完整的說明、範例，所以其實這裡的東西大多是網路上找了很多範例、自己拼湊出來的，其實也不太能保證都正確，只能說在 MSVC 上可以正常運作。

如果哪裡有寫錯的話，也希望可以告知一下。

std::formatter 的基本概念

如果想要讓自已定義出來的型別可以支援 C++20 的 format 的話，基本上就是要針對 std::formatter<>（文件）定義出對應型別的特例。

而 std::formatter<> 的形式大致上會像下面這樣：

template<class T, class CharT = char>
struct formatter
{
  template<typename FormatParseContext>
  constexpr auto parse(FormatParseContext& ctx);
 
  template<typename FormatContext>
  auto format(const T& v, FormatContext& ctx);
};

其中 template 的 T 是對應到自己定義的型別，CharT 則是為了讓他可以對應 std::string（char）和 std::wstring（wchar_t）。

而這個類別必須要有 parse() 和 format() 兩個函式。

其中 parse() 是用來分析格式化字串（使用者撰寫的 {} 的內容）用的。
如果有要定義自己的格式化方法的話，分析的部分就是要寫在這個函式裡面；他的輸入引數型別應該會是 basic_format_parse_context<CT>（文件），主要的內容就是對應的格式化字串。

而如果沒有想要定義特別的格式的話，其實是可以透過繼承標準函式庫提供的 std::formatter<>、然後使用預設的版本。

format() 的部分，則就是要把要輸出的變數（v），針對自己的需求，輸出到 ctx.out() 所回傳的 output iterator；這邊 ctx 的型別是 std::basic_format_context<>（文件），由於還有 output iterator 的型別問題，所以直接用 template 會比較方便。

他實際上的運作概念，應該就是：

根據 std::format() 函式傳入的變數型別，建立出對應的 std::formatter<> 物件
將 std::format() 函式所取得的格式化字串依序取出要處理的段落，透過 format parse context 丟給 std::formatter<>::parse() 來處理
呼叫 std::formatter<>::format() 將最後的結果寫入到 format context 提供的 output iterator，然後再將最後的位置回傳、讓 std::format() 可以處理下一個變數

簡單的範例

下面就先來寫一個最簡單、針對 std::vector<int> 定義格式化方法的例子：

#include <format>
#include <iostream>
#include <vector>
 
template<>
struct std::formatter<std::vector<int>> : std::formatter<std::string>
{
  template<typename FormatContext>
  auto format(const std::vector<int>& v, FormatContext& ctx)
  {
    auto&& out = ctx.out();
    format_to(out, "( ");
    for (size_t uIdx = 0; uIdx < v.size(); ++uIdx)
    {
      if (uIdx != 0)
        format_to(out, ", ");
      format_to(out, "{}", v[uIdx]);
    }
    return format_to(out, " )");
  }
};
 
int main()
{
  std::vector<int> vData = {1, 2, 3, 4, 5};
  std::cout << std::format("{}", vData) << std::endl;
  // ( 1, 2, 3, 4, 5 )
}

可以看到，這邊是繼承自 std::formatter<std::string>、定義了 std::formatter<std::vector<int>>，而在裡面則是定義了對應 std::vector<int> 的 format() 函式。

這邊讓他去繼承 std::formatter<std::string> 的原因，只是單純要用 STL 針對字串預設的 parse() 函式，但是實際上卻不會真的去用到它 parse 的結果；所以在使用的時候，雖然可以指定標準的格式化設定，但是卻都不會有效果。

而在 format() 這個函式的部分，則就是用 () 包住，然後依序輸出 std::vector<int> 中每個數字了～

這邊也可以看到，這邊是透過 format_to() 把資料寫到 ctx.out() 所回傳的 output iterator、然後最後再把最後寫到哪裡回傳出來。

這樣的寫法，基本上應該是一個讓自己的型別支援 std::format 的寫法，但是相對地其實完全沒有格式化的設定空間就是了。

而如果想要有限度地使用預設的 parse() 的結果的話，其實也可以透過去呼叫 std::formatter<std::string> 的 format() 來做到一定的程度。

像下面是一個簡單修改的例子：

#include <format>
#include <iostream>
#include <vector>
 
template<>
struct std::formatter<std::vector<int>> : std::formatter<std::string>
{
  template<typename FormatContext>
  auto format(const std::vector<int>& v, FormatContext& ctx)
  {
    std::string s;
    auto OutIter = std::back_inserter(s);
    format_to(OutIter, "( ");
    for (size_t uIdx = 0; uIdx < v.size(); ++uIdx)
    {
      if (uIdx != 0)
        format_to(OutIter, ", ");
      format_to(OutIter, "{}", v[uIdx]);
    }
    format_to(OutIter, " )");
    return std::formatter<std::string>::format(s, ctx);
  }
};
 
int main()
{
  std::vector<int> vData = {1, 2, 3, 4, 5};
  std::cout << std::format("{:->20}", vData) << std::endl;
  // ---( 1, 2, 3, 4, 5 )
}

這邊變成先把資料寫到字串 s 裡面，然後再透過 std::formatter<std::string> 的 format() 來輸出。

這樣的寫法，就可以讓這邊輸出的結果支援針對字串的填滿和對齊效果了。

而如果是希望針對裡面的數值做格式的設定的話，則可以改成繼承 formatter<int> 來做：

#include <format>
#include <iostream>
#include <vector>
 
template<>
struct std::formatter<std::vector<int>> : std::formatter<int>
{
  template<typename FormatContext>
  auto format(const std::vector<int>& v, FormatContext& ctx)
  {
    auto OutIter = ctx.out();
    format_to(OutIter, "( ");
    for (size_t uIdx = 0; uIdx < v.size(); ++uIdx)
    {
      if (uIdx != 0)
        format_to(OutIter, ", ");
 
      ctx.advance_to(OutIter);
      OutIter = std::formatter<int>::format(v[uIdx], ctx);
    }
    return format_to(OutIter, " )");
  }
};
 
int main()
{
  std::vector<int> vData = {1, 2, 3, 4, 5};
  std::cout << std::format("{:+03d}", vData) << std::endl;
  // ( +01, +02, +03, +04, +05 )
}

這樣的話，就可以針對輸出的整數部分，透過標準的語法來做格式化調整了。

自己撰寫 parse 的例子

而如果希望自己定義格式化的設定參數和顯示方法呢？那基本上就要連同 parse() 這個函式一起寫了～

這邊就參考《C++20: Extend std::format for User-Defined Types》這篇文章的例子來做了一點修改，下面就是範例程式：

#include <format>
#include <iostream>
#include <vector>
 
template <typename Value>
struct std::formatter<std::vector<Value>>
{
  constexpr auto parse(format_parse_context& ctx)
  {
    if (ctx.begin() == ctx.end())
    {
      value_format = "{}";
      return ctx.end();
    }
    
    value_format = "{:";
    for (auto it = ctx.begin(); it != ctx.end(); ++it) {
      char c = *it;
      if (c == 'c')
        curly = true;
      else
        value_format += c;

      if (c == '}')
        return it;
    }
    return ctx.end();
  }
 
  template <typename FormatContext>
  auto format(const std::vector<Value>& v, FormatContext& ctx)
  {
    auto&& out = ctx.out();
    format_to(out, curly ? "{{" : "[");
    if (v.size() > 0)
      format_to(out, value_format, v[0]);
    for (int i = 1; i < v.size(); ++i)
      format_to(out, ", " + value_format, v[i]);
    return format_to(out, curly ? "}}" : "]");
  }
 
  bool        curly{ false };
  std::string value_format;
};

在這個範例裡面，還是針對一個 std::vector<> 做輸出，不過資料的類型改成用 template 的形式、讓他可以對應各種型別了。

而這邊，則是針對 std::vector<> 的格式化定義了一個「c」的 flag，如果有指定的話，就會用「{}」來包住所有數值、沒有指定的話，則是用「[]」。

parse

這邊 parse() 這個函式拿到的資料會是 format_parse_context 的物件 ctx，他基本上就是一個紀錄了格式化文字的物件，透過 iterator 從 begin() 掃到 end()、就可以一個一個字元去讀取它的內容了。

如果有必要，也可以用 string_view 把它封包成字串來看他的內容的話；以上面的例子來說，會像下面這樣：

std::string_view fmt{ ctx.begin(), ctx.end()};
// "+03dc}-{}"

在這邊可以看到，他是把對應的 {: 拿掉，從後面的格式化字串開始；但是相對地，他沒有自動根據 } 來截斷，還會包含後續所有的文字，所以需要自己去偵測是否已經到了結尾。

但是如果本來輸入的格式化字串裡面沒有「:」的話呢，那拿到的會是一個不包含結尾「}」的空字串，變成是兩種完全不同的情況了。而這也是 Heresy 在 parse() 一開始另外加入一段內容、去檢查 ctx.begin() 是否等於 ctx.end() 的原因了…
（該文章本來的範例如果把輸出格式改成「{}」會有問題）

這部分就不確定到底是 MSVC 實作的問題，還是標準就是這樣了？
（玩了一下 {fmt}，感覺應該也是類似的狀況就是了）

總之，回來看 parse() 的內容。
他這邊會去依序掃過透過 ctx 取得的格式化文字的每個字元，如果是自己定義的「c」的話，就將 curly 這個成員改成 true，否則就附加到 value_format 這個字串的後面；而如果取得的字元是代表結束的「}」的話，就結束函式、回傳最後的位置。

所以在 parse() 執行完後，value_format 的內容會是「{:+3d}」、而 curly 的值會是 true。

format

而到了 format() 裡面，則就是透過這兩個成員變數來調整輸出的格式了～在這邊，就是透過 value_format 來控制輸出的數值的格式；而也由於他是直接透過 format_to() 來輸出變數（這邊的範例是 int），所以也就能直接套用標準的格式化設定了～

相對地，如果是 std::vector<float> 的話，他也能支援對應的浮點數設定；所以感覺上這種寫法也算滿方便的～

但是要說缺點的話嗎…這樣的寫法由於格式化字串 value_format 會是一個在編譯階段才決定的字串，所以這個寫法應該沒辦法在需要編譯階段決定格式化字串的 {fmt} 中使用。

另外，如果在處理的過程發現使用者給了錯誤的資料，想要強制停止的話，這邊基本上就丟回一個 std::format_error 的例外。例如：

throw std::format_error("Unsupported data format");

而這邊錯誤的文字可以根據狀況自己定義。

同時支援 string 與 wstring

上面的例子都僅能適用於 std::string、而沒辦法用來產生 std::wstring。

而如果想要同時支援 std::string 和 std::wstring 的話，則還要再做一些修改。

template <typename Value, typename CharType>
struct std::formatter<std::vector<Value>, CharType>
{
  template<typename FormatParseContext>
  constexpr auto parse(FormatParseContext& ctx);
 
  template <typename FormatContext>
  auto format(const std::vector<Value>& v, FormatContext& ctx);
};

比較簡單的方法，應該就是在 std::formatter<> 這邊加上第二個 template 引數、代表要使用的字元型別（char 或 wchar_t）；而 parse() 的輸入引數型別則是從 format_parse_context 改成用 template 的形式。

如果想要更明確一點，也可以定義成：

constexpr auto parse(basic_format_parse_context<CharType>& ctx)

如此一來，就可以同時給 std::string 和 std::wstring 使用了。

不過由於這樣的修改，會變成是雖然是要輸出 wstring，但是內部函式還是有部分地方是在用 string 做處理的狀況；這會不會造成問題？還是得完全拆成兩個版本？這就不確定了。

chrono 的支援

C++ 20 format 除了有支援之前所說的基本型別外，還有特別針對 C++11 加入的時間日期函式庫、chrono 做特別做支援，基本上應該是所有 chrono 的型別，都有定義出對應的 std::fomatter<>、所以可以直接透過 std::format() 做格式化的輸出。

而透過他定義的格式化規格，也可以很簡單地將時間日期、轉換成我們需要的形式。
不過這邊由於它的定義也很多、很雜，所以這邊就不詳細介紹、請參考 C++ Reference 的文件（網頁）了。

下面算是簡單的時間點例子：

std::chrono::zoned_time now{ "Asia/Taipei", std::chrono::system_clock::now() };

std::cout << now << "\n";                // 2022-04-26 16:35:50.4445328 GMT+8
std::cout << std::format("{:%D (%A)}", now) << "\n";   // 04/26/22 (Tuesday)
std::cout << std::format("{:%D (%A)}", now) << "\n";   // 04/26/22 (Tuesday)
std::cout << std::format("{:%Y-%m-%d}", now) << "\n";  // 2022-04-26
std::cout << std::format("{:%T}", now) << "\n";        // 16:35:50.4445328

下面則是一個 duration 的例子：

auto du = std::chrono::minutes(72) + std::chrono::milliseconds(10);
std::cout << du << "\n";                              // 4320010ms
std::cout << std::format("{:%H:%M:%S}", du) << "\n";  // 01:12:00.010

在某些情況下，應該也會滿實用的。

這系列的文章應該就先這樣了。之後如果有必要的會，就再另外補充吧。

本系列目錄：

附註：

如果是要在 {fmt} 使用的話，那 std::formatter<> 的 parse() 會需要加上 constexpr、讓他變成可以在編譯階段完成的函式才行。

C++20 std::format part 3 – 自定義型別的處理

std::formatter 的基本概念

簡單的範例

自己撰寫 parse 的例子

parse

format

同時支援 string 與 wstring

chrono 的支援

Leave a Reply 取消回覆

Related Posts

儲存 C++ 的類別資料：Boost Serialization（part 2）

Ajax 簡介和學習小感

Boost Log 的 attribute 的簡易使用