Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify UTF-8 for all our JSON #146

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Contributor

@wking wking commented Sep 3, 2015

I wish there was a cleaner reference for what UTF-8 was. But linking
Wikipedia seems too glib, and I can't find a more targetted link
than just dropping folks into a Unicode chapter (which is what
Wikipedia does):

The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

To get a place to say this for the runtime.json docs, I used config.md
as a template for the top-level header and blurb. The “Host-specific
container configuration” phrasing comes from bundle.md, which has:

The runtime.json file contains settings that are host specific...

While I was touching the config.md lead-in, I fixed “container” →
“bundle”.

I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

To get a place to say this for the runtime.json docs, I used config.md
as a template for the top-level header and blurb.  The "Host-specific
container configuration" phrasing comes from bundle.md, which has:

  The `runtime.json` file contains settings that are host specific...

While I was touching the config.md lead-in, I fixed 'container' ->
'bundle'.

[1]: https://en.wikipedia.org/wiki/UTF-8

Signed-off-by: W. Trevor King <wking@tremily.us>
# Host-specific container configuration

The bundle's top-level directory MUST contain a configuration file called `runtime.json` with [UTF-8][] [JSON][].
For now the canonical schema is defined in [`runtime_config.go`](runtime_config.go) and [runtime_config_linux.go](runtime_config.go), but this will be moved to a formal JSON schema over time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have the OS specific reference be more generic than pointing out linux individually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and "moved to a formal JSON schema over time" where and why this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Sep 04, 2015 at 11:44:48AM -0700, Vincent Batts wrote:

and "moved to a formal JSON schema over time" where and why this comment?

Both are just echoing the current host-independent phrasing 1. I
tried to make that clear in the commit message [2] and original PR
post with the:

“To get a place to say this for the runtime.json docs, I used
config.md as a template for the top-level header and blurb.”

text, but I'm happy to use different commit-message phrasing if that
would help make the source more obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that phasing. This feels like something to have roadmapped and discussed rather than alluding to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Sep 08, 2015 at 03:59:13PM -0700, Vincent Batts wrote:

This feels like something to have roadmapped and discussed rather
than alluding to.

Sounds reasonable to me. Shall I open the mailing-list thread, or
would you like to?

And is there an alternative phrasing you'd like to see here in the
meantime, so we can land the UTF-8 notes in this PR without waiting on
the desired-schema-language discussion? Copying the existing phrasing
from config.md seemed like the least opinionated approach, but maybe
just dropping the “For now the canonical schema…” line with it's
Go-file links would be better?

@jonboulle
Copy link
Contributor

peanut gallery, why not use the JSON RFC here? https://tools.ietf.org/html/rfc7159

@wking
Copy link
Contributor Author

wking commented Sep 9, 2015

On Tue, Sep 08, 2015 at 04:39:10PM -0700, Jonathan Boulle wrote:

peanut gallery, why not use the JSON RFC here?
https://tools.ietf.org/html/rfc7159

How does that help? RFC 7159 has 1:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default
encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
interoperable in the sense that they will be read successfully by
the maximum number of implementations; there are many
implementations that cannot successfully read texts in other
encodings (such as UTF-16 and UTF-32).

Implementations MUST NOT add a byte order mark to the beginning of a
JSON text. In the interests of interoperability, implementations
that parse JSON texts MAY ignore the presence of a byte order mark
rather than treating it as an error.

To test the Go handling in particular, ‘json.NewDecoder(os.Stdin)’
seems to be able to handle UTF-8 on stdin:

$ cat sample.json | go run test.go

but crashes with UTF-16LE:

$ cat sample.json | iconv -f UTF-8 -t UTF-16LE - | go run test.go
2015/09/08 20:35:00 invalid character '\x00' looking for beginning of object key string
exit status 1

I'd rather stay away from BOMs. I'd be ok with allowing runtimes to
optionally be configured to read the configuration JSON from other
encodings (e.g. UTF-16LE, …), but then we'd need some bundle-side
logic or bundle-introspection (e.g. BOM checks) to decide which
encoding to tell the runtime to use. It seems easier for everyone if
we just require UTF-8 (but I may be missing something obvious ;).

And I see nothing about UTF-8 on json.org or the ECMA-404 it
references 2, so we'd want to make sure to link the RFC for our JSON
definition if we go this route.

@philips
Copy link
Contributor

philips commented Sep 9, 2015

we should just put something in a glossary saying that when we say json in the spec we mean uft8 encoded json.

@wking
Copy link
Contributor Author

wking commented Sep 9, 2015

On Wed, Sep 09, 2015 at 09:46:41AM -0700, Brandon Philips wrote:

we should just put something in a glossary saying that when we say
json in the spec we mean uft8 encoded json.

I'd rather keep the glossary informative, and put normative stuff in
the spec itself.

@crosbymichael
Copy link
Member

+1 for what @philips said and we are currently working on a protobuf idea and discussing this during the meetings.

@wking
Copy link
Contributor Author

wking commented Sep 25, 2015

On Fri, Sep 25, 2015 at 01:43:33PM -0700, Michael Crosby wrote:

+1 for what @philips said…

Glossary entry it is. I'll bump #107.

… and we are currently working on a protobuf idea and discussing
this during the meetings.

This is about what authors provide in their bundles. I don't see
where protobuf comes in…

wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 25, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 25, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 5, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 8, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 8, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Dec 23, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Dec 23, 2015
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
I wish there was a cleaner reference for what UTF-8 was.  But [1]
seems too glib, and I can't find a more targetted link than just
dropping folks into a Unicode chapter (which is what [1] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

I'd rather put this normative requirement in the configuration-spec
files, but maintainer consensus was to put it in the glossary [2,3].

[1]: https://en.wikipedia.org/wiki/UTF-8
[2]: opencontainers#146 (comment)
[3]: opencontainers#146 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants