Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<format>: Make formatter cast the argument to the correct type before constructing basic_format_arg #3723

Merged
merged 2 commits into from
May 31, 2023

Conversation

cpplearner
Copy link
Contributor

@cpplearner cpplearner commented May 21, 2023

Found while testing formatter<char, wchar_t>. Overload resolution seems to choose basic_format_arg(const int _Val), causing the char value to be formatted as an integer, rather than a character.

#include <format>
using namespace std;

struct test_char {};

template<> struct std::formatter<test_char, wchar_t> : formatter<char, wchar_t> {
    auto format(test_char, auto& ctx) const { return formatter<char, wchar_t>::format('a', ctx); }
};

auto s = std::format(L"{}", test_char{});
// expected: s == L"a"
// actual:   s == L"97"

PR #2768 fixed a similar issue for long (where overload resolution was ambiguous), by adding constructors to basic_format_arg. We could add another constructor to address this issue, and add yet another constructor to fix formatter<nullptr_t, CharT>, and still another constructor for basic_string with custom Traits. That's not too bad, but kinda duplicates the logic in _Format_arg_traits. So I choose to just use _Format_arg_traits.

@cpplearner cpplearner requested a review from a team as a code owner May 21, 2023 08:48
@CaseyCarter
Copy link
Member

CaseyCarter commented May 24, 2023

Found while testing formatter<char, wchar_t>. Overload resolution seems to choose basic_format_arg(const int _Val), causing the char value to be formatted as an integer, rather than a character.

This seems to be the Standard's intended behavior. [format.string.std]/21 says that "The available integer presentation types for integral types other than bool and charT are specified in Table 68." When charT is wchar_t, char is an integral type "other than bool and charT". Table 68 says the default type specifier for such arguments is d, so I think std::format("{}", 'a') should be L"97" (for sufficiently ASCII-ish character sets).

Am I missing something?

@CaseyCarter CaseyCarter added decision needed We need to choose something before working on this format C++20/23 format info needed We need more info before working on this and removed decision needed We need to choose something before working on this labels May 24, 2023
@StephanTLavavej
Copy link
Member

At the weekly maintainer meeting, @barcharcraz agreed with @CaseyCarter here - this may seem weird but it appears that the Standard mandates this behavior.

@cpplearner
Copy link
Contributor Author

cpplearner commented May 24, 2023

That's my first thought as well, but that makes formatter<char, wchar_t>

  1. behave differently from std::format,
  2. behave differently from other implementations (Both libstdc++ and libc++ produce a),
  3. especially problematic in WG21-P2286R8, where it makes much more sense for std::format(L"{}", std::tuple('a')) to produce ('a') and not (97). (Actually since the formatter for tuple uses formatter<char, wchar_t>::set_debug_format, set_debug_format sets the ? specifier, and ? is undefined for integers, this might technically be undefined behavior.)

@vitaut @mordante WDYT?

@vitaut
Copy link
Contributor

vitaut commented May 24, 2023

Might be some wording missing that says that char is converted to Char (the intended behavior). Formatting char as a number is, of course, wrong. If this is the case please open an LWG issue.

@vitaut
Copy link
Contributor

vitaut commented May 24, 2023

Actually the wording is there: https://eel.is/c++draft/format#arg-6.2. So the only thing that potentially needs clarification is connection to this nice table https://eel.is/c++draft/format.string.std#tab:format.type.char.

@barcharcraz
Copy link
Member

Actually the wording is there: https://eel.is/c++draft/format#arg-6.2. So the only thing that potentially needs clarification is connection to this nice table https://eel.is/c++draft/format.string.std#tab:format.type.char.

That table appears to apply only to charT (wchar_t here), but the wording for basic_format_arg's constructor may be sufficient.

@barcharcraz
Copy link
Member

if the other two implementations produce 'a' then I wouldn't be super opposed to just pre-emptively implementing that pending the resolution of whatever LWG issue addresses the wording, however I'm not sure I love the approach in this PR, I would strongly prefer keeping all the handling of these conversions in the basic_format_arg constructor, since that's where it's depicted in the standard and because that overload set is complicated and I'd like it in one place as much as possible.

@cpplearner
Copy link
Contributor Author

I would love to drop _Format_arg_traits::_Phony_basic_format_arg_constructor, and do the mapping in the real basic_format_arg constructors. But IIUC _Format_arg_store wants to get the mapped types (as well as the storage size) without constructing a basic_format_arg object, so it needs the "phony" constructors.

This PR makes _Phony_basic_format_arg_constructor the only place to do the mapping, and (hopefully) makes basic_format_arg always constructed with the converted value. This also makes the logic centralized, though not in the most desirable place.

@jovibor
Copy link
Contributor

jovibor commented May 25, 2023

Isn't it #2320 related?

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented May 25, 2023

since that's where it's depicted in the standard and because that overload set is complicated and I'd like it in one place as much as possible.

The complicated overload set is unified to a single contructor by LWG-3631.

Since the constructor (formerly contructors) is exposition-only, I'd like to keep the internal constructors of basic_format_arg and type-dispatch the arguments in other functions (or a single function), which is exactly being done in this PR.

Otherwise, we might need to write something like the following to initialize the specified variant member.

// ...
template <class _Ty>
explicit basic_format_arg(_Ty& _Val) noexcept
    requires (_Dispatched<_Ty> == _Basic_format_arg_type::_Int_type)
    : _Active_state(_Basic_format_arg_type::_Int_type), _Int_state(_Val) {}

template <class _Ty>
explicit basic_format_arg(_Ty& _Val) noexcept
    requires (_Dispatched<_Ty> == _Basic_format_arg_type::_UInt_type)
    : _Active_state(_Basic_format_arg_type::_UInt_type), _UInt_state(_Val) {}
// ...

@mordante
Copy link
Contributor

Actually the wording is there: https://eel.is/c++draft/format#arg-6.2. So the only thing that potentially needs clarification is connection to this nice table https://eel.is/c++draft/format.string.std#tab:format.type.char.

I agree, that's the reason why libc++ gives L"a" as result.

@CaseyCarter
Copy link
Member

CaseyCarter commented May 25, 2023

Aha! I've forgotten again that "format arg" in [format.string.std] means something like "the content of the variant member of the basic_format_arg initialized from the corresponding argument to a formatting function" (or "the object designated by the contained handle" when it's a handle) and isn't intended to refer to the user-provided argument itself. Silly me. I'll accept this is the intended behavior of the wording that I haven't yet figured out how to clarify.

@CaseyCarter CaseyCarter removed the info needed We need more info before working on this label May 25, 2023
@StephanTLavavej StephanTLavavej added the bug Something isn't working label May 26, 2023
@StephanTLavavej StephanTLavavej self-assigned this May 26, 2023
@StephanTLavavej
Copy link
Member

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@jovibor
Copy link
Contributor

jovibor commented May 27, 2023

I'm speculatively mirroring this to the MSVC-internal repo

Sorry for the dumb question.
But what's the difference between: "I'm mirroring this to the MSVC-internal repo" and do the same but speculatively?

@CaseyCarter
Copy link
Member

But what's the difference between: "I'm mirroring this to the MSVC-internal repo" and do the same but speculatively?

Stephan is speculating that someone will perform Final Review on this so we can merge it as part of a single batch merge into the internal repo. Pointing it out in the "notify me of changes" comment serves only to avoid confusion about why we'd be preparing to merge something that isn't in the "Ready to Merge" state.

@cpplearner
Copy link
Contributor Author

FWIW there's a latent merge conflict between this PR and #3726. This might be solved by merging with git merge -Xignore-space-change and then running clang-format on stl/inc/format.

@StephanTLavavej
Copy link
Member

I've pushed a merge with main to resolve the conflict (manually, since I am old).

@StephanTLavavej StephanTLavavej merged commit 6a98456 into microsoft:main May 31, 2023
@StephanTLavavej
Copy link
Member

Thanks for figuring out the best way to fix this bug! 🛠️ 🐞 😻

@cpplearner cpplearner deleted the format-arg branch May 31, 2023 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working format C++20/23 format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants