Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stores: test for duplicate keys, reserve keyword (yaml only now) #1203

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Apr 21, 2023

I started looking into sops, which I find to be a really nice idea and implementation. Kudos!
However, when testing some edgecase scenarios, I found some rough corners.

Specifically, it was possible to, when editing the file, add a sops key. This was then encrypted, and when trying to open it later on, I got

go run ./cmd/sops/ mytestfile2.yaml 
Error unmarshalling input yaml: yaml: unmarshal errors:
  line 12: mapping key "sops" already defined at line 7

This is a bit so-so UX wise, since the only way to recover is to remove the sops-line, and then reopen the file with the --ignore-mac enabled, and then save it again.

In general, it seems to me that sops should be strict about validation. In that sense, this PR:

  • Applies a check for disallowing duplicate keys in YAML
  • Tests for, but does not disallow, duplicate keys in json.
  • Applies a check for disallowing reserved keywords in YAML (only sops now).
  • Remaining small changes come from go fmt.

If you think this is a good idea, the reserved keyword should be applied for other formats too, and I can go ahead and make that change too.

@holiman
Copy link
Contributor Author

holiman commented Apr 21, 2023

Looking at #596 (comment), it seems that perhaps data should be a reserved key too (?)

@holiman holiman changed the title stores: test for duplicate keys, reseve keyword (yaml only now) stores: test for duplicate keys, reserve keyword (yaml only now) Apr 21, 2023
@felixfontein
Copy link
Contributor

This should fix #851.

@hiddeco hiddeco added this to the v3.9.0 milestone Jul 3, 2023
@hiddeco hiddeco changed the base branch from master to main July 6, 2023 20:57
@felixfontein
Copy link
Contributor

Looking at #596 (comment), it seems that perhaps data should be a reserved key too (?)

The README explicitly shows using data for other things in quite a few examples, so I guess it's not a good idea to disallow it globally.

I think #596 should probably be handled differently, like only treating it as binary if there is no other key than data and its value is a string. (I'm not sure which mechanism is used right now. Maybe detection is only based on the file name.)

@felixfontein
Copy link
Contributor

felixfontein commented Sep 16, 2023

I did look a bit into detection; the detection is done purely by filename, not by anything else.

Something that definitely can be improved is error handling in BinaryStore.EmitPlainFile, if data's value is not a string. That would produce a better readable error message instead of a panic (and that error message could mention --output-type so that users qiuckly get an idea what to do). (Edit: #1289 does that.)

Comment on lines 347 to 349
if err != nil {
t.Errorf("did not expect error, got %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
t.Errorf("did not expect error, got %v", err)
}
assert.Nil(t, err)

@@ -211,7 +211,9 @@ func (store Store) jsonFromTreeBranch(branch sops.TreeBranch) ([]byte, error) {

func (store Store) treeBranchFromJSON(in []byte) (sops.TreeBranch, error) {
dec := json.NewDecoder(bytes.NewReader(in))
dec.Token()
if _, err := dec.Token(); err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please document what exactly this is doing? I guess this is "Tests for, but does not disallow, duplicate keys in json.", but it is totally unclear to me how this code achieves that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is (as far as I can tell, been a while) not required, it's just a drive-by change that I made: Token() does return an error, and checking+returning it is correct.
Later on, the store.treeBranchFromJSONDecoder(dec) call will internally do the same thing

func (store Store) treeItemFromJSONDecoder(dec *json.Decoder) (sops.TreeItem, error) {
	var item sops.TreeItem
	key, err := dec.Token()
	if err != nil && err != io.EOF {
		return item, err
	}

So I guess I figured that we should do it on the very first instance too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that this one fails when err == io.EOF. Which I guess is OK. I would add a comment explaining that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add a comment, but I'm not sure what to write: I don't quite see why we need to document why we handle an error. Usually, motivation/explanation is required as for why one omits to handle an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After dwelling in sources of the JSON module, I now understand why this change is correct.

The main problem I have with this change is that it takes a long time to figure out what it does, and since it is totally unrelated to the commit message it appears in it is very confusing to figure out where it comes from.

It would be a lot better to move this to a separate commit (or even a separate PR) with a clear commit message (something like "Make sure errors returned when extracting the first token are handled").

Also there is another potential error lurking here: the first return value also needs to be checked whether it's a {. Otherwise LoadPlainFile produces "strange" error messages that aren't very helpful:

$ echo '[]' | sops --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: Expected JSON object key, got ] of type json.Delim instead
$ echo '[}' | sops --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character '}' looking for beginning of value
$ echo '["foo","bar"]' | sops --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character '}' after array element
$ echo '["foo":"bar"}' | sops --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character ':' after array element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what difference this makes. As far as I can tell this PR does not change the behaviour against master

This PR:

$  echo '[]' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: Expected JSON object key, got ] of type json.Delim instead
$  echo '[}' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character '}' looking for beginning of value
$  echo '["foo","bar"]' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: Expected JSON object key, got ] of type json.Delim instead
$  echo '["foo":"bar"}' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character ':' after array element

Master:

$  echo '[]' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: Expected JSON object key, got ] of type json.Delim instead
$  echo '[}' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character '}' looking for beginning of value
$ echo '["foo","bar"]' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: Expected JSON object key, got ] of type json.Delim instead
$ echo '["foo":"bar"}' | go run ./cmd/sops/ --encrypt --input-type json /dev/stdin
Error unmarshalling file: Could not unmarshal input data: invalid character ':' after array element

But sure I can remove this change if you want

Copy link
Contributor Author

@holiman holiman Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is another potential error lurking here: the first return value also needs to be checked whether it's a {.

It should not literally check that it's a {, since whitespace is also accepted. I think the error messages are pretty good.

Consider ["foo":"bar"}. If we say error: it must start with {, then I think we are mixing different layers, with different concerns.

  1. First of all, the input needs to be valid json. The value ["foo":"bar"} is not valid json, because a json list cannot contain key-value pairs, so the parse fails. The user will have to fix up his json, there are many tools to validate / lint json.
  2. Secondly, there are app-specific constraints on what is required of the json object: such as that it is a dict, not a list. But those come later, IMO.

If this app starts validating specific characters in the json stream, like "checked whether it's a {.", then it overreaches. And before you know it, you are basically taking on the maintenance of an in-house json parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not literally check that it's a {, since whitespace is also accepted. I think the error messages are pretty good.

Yes, it should for a literal {, since whitespace has already been processed by the decoder.

First of all, the input needs to be valid json. The value ["foo":"bar"} is not valid json, because a json list cannot contain key-value pairs, so the parse fails. The user will have to fix up his json, there are many tools to validate / lint json.

True, but did you also consider the [] example? Or tried out something like [{}]? The error messages are not really helpful, because the code already mixes the two layers: it parses valid JSON, and at the same time only parses JSON objects.

Secondly, there are app-specific constraints on what is required of the json object: such as that it is a dict, not a list. But those come later, IMO.

The function is called treeBranchFromJSON. It cannot process anything else than a JSON object.

In any case, this discusison IMO shows why this should not part of a totally unrelated commit. This has nothing to do with (not) checking for duplicate keys.

@holiman
Copy link
Contributor Author

holiman commented Sep 19, 2023

Rebased on main, a few tests seem to be failing, not quite sure why

@hiddeco
Copy link
Member

hiddeco commented Sep 19, 2023

You did not sign-off your commits, which is required to accept our DCO. Please refer to https://github.com/getsops/sops/pull/1203/checks?check_run_id=16924447836 for more information.

@holiman
Copy link
Contributor Author

holiman commented Sep 19, 2023

You did not sign-off your commits

Fixed (on my branch, seems to be taking github some time to propagate the changes here)

@@ -211,7 +211,9 @@ func (store Store) jsonFromTreeBranch(branch sops.TreeBranch) ([]byte, error) {

func (store Store) treeBranchFromJSON(in []byte) (sops.TreeBranch, error) {
dec := json.NewDecoder(bytes.NewReader(in))
dec.Token()
if _, err := dec.Token(); err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that this one fails when err == io.EOF. Which I guess is OK. I would add a comment explaining that.

stores/yaml/store_test.go Outdated Show resolved Hide resolved
stores/yaml/store_test.go Outdated Show resolved Hide resolved
stores/yaml/store_test.go Outdated Show resolved Hide resolved
@felixfontein
Copy link
Contributor

Do you want to fix the error handling in a separate PR, or should I do that?

Also you need to sign-off the commit (DCO check). Finally, it's probably better to squash b44f024 and 4bad36e are merged into one commit.

stores/json: use assert
stores/yaml: fix failing test (empty data)
stores/yaml: use assert in tests
unfix error handling and ignore error

Signed-off-by: Martin Holst Swende <martin@swende.se>
@holiman
Copy link
Contributor Author

holiman commented Sep 28, 2023

I squashed everything now.

Do you want to fix the error handling in a separate PR, or should I do that?

I'd prefer to leave that to you, thanks

`
s := new(Store)
_, err := s.LoadPlainFile([]byte(data))
assert.Nil(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we should extend this test to verify that re-serializing yields the same result.

assert.NotNil(t, err)
assert.Equal(t, `yaml: unmarshal errors:
line 3: mapping key "hello" already defined at line 2`, err.Error())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a test which looks at multi-document YAML streams (with collisions in later documents).

@hiddeco
Copy link
Member

hiddeco commented Sep 28, 2023

This is looking good to me, but I think we should release a patch first before moving ahead with changes which force a new minor.

@felixfontein
Copy link
Contributor

Do you want to fix the error handling in a separate PR, or should I do that?

I'd prefer to leave that to you, thanks

#1307.

@felixfontein
Copy link
Contributor

@hiddeco should we merge this now?

@enmanuelmoreira
Copy link

Any updates about this? the problem stills happening at version 3.8.1

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase to resolve the conflict, but other than this (and a tiny nit) it looks good to me.

Sorry for the wait @holiman, and thanks for your contribution 🍍

@@ -309,6 +316,12 @@ func (store *Store) LoadPlainFile(in []byte) (sops.TreeBranches, error) {
if err != nil {
return nil, fmt.Errorf("Error unmarshaling input YAML: %s", err)
}
// Prevent use of reserved keywords
for _, item := range branch {
if item.Key == "sops" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: given the string is also matched against on L281. It may be wise to move it to a const.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a constant for it: SopsMetadataKey (in stores/stores.go). That one should be used.

@felixfontein
Copy link
Contributor

@holiman can you rebase and use the constant?

@felixfontein felixfontein removed this from the v3.9.0 milestone Jun 26, 2024
@felixfontein felixfontein added this to the 3.10.0 milestone Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants