Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array syntax idea #1031

Open
MarcusJohnson91 opened this issue Jun 6, 2024 · 5 comments
Open

Array syntax idea #1031

MarcusJohnson91 opened this issue Jun 6, 2024 · 5 comments

Comments

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Jun 6, 2024

Name = PNGTests
NumCases = 1

[Case.0]
State = TestState_Enabled
Outcome = Outcome_Passed
Size = 138 # Num hexadecimal digits, divide by 2 to get the number of bytes.
Data = 0x[89504E470D0A1A0A0000000D4948445200000001000000010001000000376EF9240000000A49444154000078016360000000020001737501180000000049454E44AE426082]

No whitespace, commas, or anything except hexadecimal digits are allowed after 0x[ and before the closing ].

It makes parsing binary data stored in a toml file MUCH easier and faster to process.

I don’t see why binary and octal couldn’t have the same syntax. 0b[ and 0o[ respectively, but I have no need for such syntax.

@arp242
Copy link
Contributor

arp242 commented Jun 6, 2024

I assume what you want is that this:

data = 0x[89 50 4e]

will be treated like:

data = "\x89\x50\x4e"

Or something along these lines.

If I had a lot of binary data, I'd just put in in a string like:

data = '''
89 50 4E 47 0D 0A 1A 0A 00 00 00 0D
49 48 44 52 …
'''

And then write a custom parser in your language to deal with that. Should be easy enough in most languages.

Or use an array of numbers, or escapes like above.

This seems far too rare of a use case to add to TOML.

At the very least we'd need a few examples of real-world TOML files that see actual use where this feature would be useful.

@eksortso
Copy link
Contributor

eksortso commented Jun 9, 2024

Remember that parsers treat things like "\x89\x50\x4e" not as byte arrays. They're strings. In fact, `"\x89" on its own is two bytes when encoded as UTF-8.

The custom parser idea makes more sense in context, but to be honest, I need more context. We do see more computer-generated values, like hashes, in use cases where TOML is not touched by humans. But invariably that binary data is expressed as a hex string. It's similar to the proposed 0x[] syntax in that way. I'm not opposed to a byte array value type, but I'm skeptical that we need it everywhere.

Send us more use cases where byte arrays need their own special syntax. Why would a human-centric format need a value type that emits blobs of arbitrary binary data and skips the checks available to an intermediate string format?

@MarcusJohnson91
Copy link
Author

Where are you guys getting the idea that it’s a string?

granted, technically the entire TOML file is a string.

but beyond that it’s just back to back hex digits after the 0x[ and before the ].

No spaces, no escapes. Just hexadecimal digits.

@arp242
Copy link
Contributor

arp242 commented Jun 9, 2024

Remember that parsers treat things like "\x89\x50\x4e" not as byte arrays. They're strings. In fact, `"\x89" on its own is two bytes when encoded as UTF-8.

Oh yeah, of course 🤦 I've been dealing with a lot of these kind of (non-UTF8) strings this week and in context switching I just forgot TOML doesn't work like that.

@eksortso
Copy link
Contributor

Where are you guys getting the idea that it’s a string?

I did know that what you are talking about are byte arrays. My point was that in TOML, string values are quoted, so custom parsing would still be required to turn such strings into proper byte arrays. This was in response to @arp242, who already replied.

No spaces, no escapes. Just hexadecimal digits.

We need more use cases for binary arrays before we could proceed with creating new syntax, and I implore you and others to provide those use cases.

But even if we proceeded, I would not sign off on such a limited expression set. I would want to allow ignorable newlines and whitespace between the brackets as well, since human beings might want to use the syntax, and most likely such users would want to avoid long lines and unbroken sequences of hex digits, cmiiw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants