Bug Fix: Saving big payloads by cache CouchDB #1909

przemo22 · 2020-09-22T08:42:00Z

Type of change

Bug fix

Description

Saving big payloads (bigger than 64kb are not saved in cache)

Additional details

When saving big payloads more than 64kb, changes are updated/saved in CouchDB, but aren't
visible in cache (nothing changed). Then when we execute stub.getState("someID") nothing has been returned
or old revision (smaller than 64kb) will be returned.

Fastcache library has got methods for updating/saving/getting bigPayloads like: GetBig and SetBig.

Release Note

FAB-18245

mastersingh24 · 2020-09-22T10:01:20Z

@przemyslaw - Thanks for the PR. Looks like you need to sign-off on your commit - you likely just need to do:

git commit --amend -s
git push -f

But the change looks pretty good to me

przemo22 · 2020-09-22T10:04:49Z

Already done

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

cendhu · 2020-09-22T13:45:00Z

When saving big payloads more than 64kb, changes are updated/saved in CouchDB, but aren't
visible in cache (nothing changed). Then when we execute stub.getState("someID") nothing has been returned
or old revision (smaller than 64kb) will be returned.

Nice catch. We did in fact discuss about this 64 KB limit (listed in the comparison table indirectly -- https://docs.google.com/document/d/1Rxczkwni5oG6MBif1s0KN6_PR29tM4WP9IEOvbLMTu8/edit?usp=sharing). However, the change of size and returning of old version didn't strike.

@manish-sethi @denyeart what's your opinion on the maximum allowed size? Should we allow any arbitray size or keep an absolute maximum? A few larger size entries migh reduce the number of entries in the cache. As it is a workload dependent, we can make it configurable. If we want simplicity, we can go with any size (as done in this PR) without introducing any config parameter.

denyeart · 2020-09-22T13:59:08Z

@cendhu I would suggest keeping it simple as is done in this PR, with no new config options.
If there becomes demand for making it configurable, we can always add that later.

manish-sethi · 2020-09-22T14:35:37Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

@@ -64,9 +65,12 @@ func (c *cache) getState(chainID, namespace, key string) (*CacheValue, error) {
 	}

 	cacheKey := constructCacheKey(chainID, namespace, key)
-	valBytes := cache.Get(nil, cacheKey)
+	valBytes := cache.GetBig(nil, cacheKey)


To clear one doubt, I had a quick look into the code of this library and unfortunately, it turned out to be correct. Blindly looking for a key using GetBig API can cause unintended consequences. I was able to construct one example that leads to a panic.

func TestDoesItWork(t *testing.T) { c := fastcache.New(64 * 1024 * 1024) c.Set([]byte("key"), []byte("sixteen-byte-val")) fmt.Printf("[%s]\n", c.GetBig(nil, []byte("key"))) }

It appears that the library assumes that the application remembers what keys were stored using Set API V/s SetBig API.

In order to fix the bug, my suggestion would be... to check the value size, if greater than size limit then delete the existing key from cache if present

cendhu · 2020-09-22T14:02:01Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

+func setImplicitValue(c *fastcache.Cache, cacheKey []byte, value []byte) {
+	if cap(value) > fastCacheValueSizeLimit {
+		c.SetBig(cacheKey, value)
+	} else {
+		c.Set(cacheKey, value)
+	}
+}


what would happen if the oldValue was Big but the new value is not and vice-versa? Will the old big value be deleted before storing the new small value and vice-versa? It isn't clear from the documentation or code.

c := fastcache.New(64 * 1024 * 1024) c.Set([]byte("key1"), []byte("value1")) val = c.Get(nil, []byte("key1")) fmt.Printf("Get(): Key is set with small value: length [%d], value[%s]\n", len(val), string(val)) big := make([]byte, 2*64*1024) rand.Read(big) c.SetBig([]byte("key1"), big) val = c.GetBig(nil, []byte("key1")) fmt.Printf("GetBig(): Key is set with big value: length [%d]\n", len(val)) val = c.Get(nil, []byte("key1")) fmt.Printf("Get(): Key is set with big value: length [%d], value[%s]\n", len(val), string(val)) c.Set([]byte("key1"), []byte("value1")) val = c.Get(nil, []byte("key1")) fmt.Printf("Get(): Key is set with small value: length [%d], value[%s]\n", len(val), string(val)) val = c.GetBig(nil, []byte("key1")) fmt.Printf("GetBig(): Key is set with big value: length [%d]\n", len(val))

Here, after setting the big value to an existing key with a small value, the Get() does not return nil but some random bytes. Though the GetBig() is always used in this PR first before the Get(), there is a lot of unspecified assumption and hence, the code look fragile.

There are two options:

We can always use GetBig() and SetBig() irrespective of the size limit but need to benchmark the performance impact for value < 64 KB.

What @manish-sethi suggested here -- https://github.com/hyperledger/fabric/pull/1909/files#r492786264

I would prefer 2. If needed, we can quantify the impact of always using big APIs and make a separate PR if needed for 1.

Just to take a note, when looking for alternate ways, I would add one more approach in which we could remember in the cache itself if the key was saved using the SetBig api (in a separate namespace) and can evaluate the overheads compared to always using Big apis.

As far as this bug fix is concerned, my inclination also would be to keep it simple as we need to back port this as well. We can maintain the existing behavior (i.e., not storing bigger values) and just fix the bug.

I have already upgraded fastcache. Take a look in file changes. Currently we can use Has(<key>) method from fastcache and the problem is getting payload from cache(usage BigCache or Fastcache). So I changed this firstly by using GetBig() and if length of bytes is less than fastcache capacity then I use Get(). This will prevent from getting incorrect byte array.

Feel free for any advice/correction/refactor

This still does not solve the problem that I had highlighted in my previous comment. I'll encourage you to run locally the sample test that I posted in my comment to understand the problem with this fix. I am not sure if upgrading fastcache solves the issue. The problem is with the wrong usage of APIs. To elaborate the case, there could be a key that we previously set using the Set() api. Now, in your fix, you first look for the key using the GetBig() api and as highlighted in my test, this can lead to a panic crash if the value of the key (that is set previously) has certain properties.

For fixing the bug, as we discussed before, in the function putState, we can check the size of the value and if greater than the limit, don't call Set api and instead call Del api in order to remove the previous version of the key from the cache. This maintains the current behavior i.e., we don't store the big values in the cache and just fix the bug.

For allowing for bigger values later, we discussed a couple of options above and they can be tried and evaluated in a separate PR.

In the update PR, the usage of Has(), GetBig(), Get(), and comparing the valueSize and sizeLimit complicates the simple cache usage. Let's go with the following suggestion for a bug fix.

For fixing the bug, as we discussed before, in the function putState, we can check the size of the value and if greater than the limit, don't call Set api and instead call Del api in order to remove the previous version of the key from the cache. This maintains the current behavior i.e., we don't store the big values in the cache and just fix the bug.

As storing > 64 KB is a new feature, let's do that separately with less complication.

I've already understood that you won't take care about bigValues in cache.

You want to save them only in couchDB

When GetState has been called in statecouchDB try get from cache, if null get it directly from couchDb

Am I correct?

In connection with the new version I thought that using provided method like Has and HasGet will be nice.
You have seen changes in UpdateStates and getStatemethod

I did what @manish-sethi suggested to me.

Not ready yet

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

przemo22 · 2020-09-23T10:27:11Z

Locally tests passed. Any hints how to improve them on CI?

sykesm · 2020-09-23T12:39:14Z

Locally tests passed. Any hints how to improve them on CI?

Discover, report, and fix flakes. The issues you're hitting have probably been around for a long time.

https://jira.hyperledger.org/browse/FAB-17661?jql=statusCategory%20%20!%3D%20Done%20AND%20labels%20%3D%20ci-flake

wlahti · 2020-09-23T17:37:59Z

Locally tests passed. Any hints how to improve them on CI?

Discover, report, and fix flakes. The issues you're hitting have probably been around for a long time.

https://jira.hyperledger.org/browse/FAB-17661?jql=statusCategory%20%20!%3D%20Done%20AND%20labels%20%3D%20ci-flake

Actually, this is a newer flake and I'm the one guilty of introducing it: https://jira.hyperledger.org/browse/FAB-18238 Just created a PR with a fix: #1916

manish-sethi

Thanks @przemyslaw. The logic of deleting the existing key looks good overall. However, I noticed one issue when I looked closely. You would need to do this unconditionally. A few minor clean up comments as well.

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

manish-sethi · 2020-09-24T12:10:13Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

+	// If payload is bigger than fastcache can store then delete previous key.
+	// When GetState() has been called for deleted key then it will call database
+	// against fastcache.
+	if cap(valBytes) > fastCacheValueSizeLimit {


len(valBytes) instead of cap(valBytes)

Looking closely, unfortunately, even this is not going to be a sufficient condition because they way this library encodes the data internally, uint64(len(kvLenBuf) + len(k) + len(v)) is the length that matters not the length of the value only. Here, again kvLenBuf is a fixed size of 4. But I do not prefer to depend on so much of internal logic of a library. So, my suggestion would be to always delete the key irrespective of the size condition.

So, my suggestion would be to always delete the key irrespective of the size condition.

@przemyslaw A clarification on the above suggestion. I think @manish-sethi suggested to remove fastCacheValueSizeLimit altogether and all related checks associated with it. We will simply delete the existing entry before storing a new one without checking the size. Hence, before every cache.Set(), we can simply do cache.Del() (while cache.Has() is optional). For bigger values, cache.Set() would anyway ignore. I have verified this with @manish-sethi as I had some confusion. I am sharing it here so that the suggestion is explicit.

manish-sethi · 2020-09-24T12:10:50Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

+		if cache.Has(cacheKey) {
+			cache.Del(cacheKey)
+		}


This is a critical piece for this bug fix but this is not covered in the test

manish-sethi · 2020-09-24T12:12:38Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache_test.go

+
+	// test GetState
+	v, err := cache.getState("ch1", "ns1", "k1")
+	require.NoError(t, err)
+	require.Nil(t, v)
+


Add the data before so that line 96-98 in cache.go gets tested.

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go

przemo22 · 2020-09-24T12:52:45Z

@manish-sethi take a look one more time. I have found that my previous commit didn't work. Actual PR is ready for review

manish-sethi · 2020-09-24T13:04:30Z

@manish-sethi take a look one more time. I have found that my previous commit didn't work. Actual PR is ready for review

@przemyslaw - My review comments are applicable to your latest commit. If you can address these today then it will be able to back port this PR to previous release as well.

denyeart · 2020-09-24T13:47:52Z

@przemyslaw To clarify the timing, we are expecting to release a v2.2.1 tomorrow. If this PR can be merged in time, we can also quickly backport to release-2.2 so that it gets into the v2.2.1 release.

przemo22 · 2020-09-24T15:05:48Z

Corrections are ready for review (@manish-sethi, @cendhu ).
Can you give me feedback as soon as possible, please

manish-sethi

@przemyslaw - Thanks. This look good to me now. I have a few minor comments on the test code though.

manish-sethi · 2020-09-24T15:22:18Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache_test.go

+	v, err := cache.getState("ch1", "ns1", "k1")
+	require.NoError(t, err)
+	require.Nil(t, v)
+


This is not relevant for this test

manish-sethi · 2020-09-24T15:23:41Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache_test.go

+	expectedValue := &CacheValue{Value: []byte("value")}
+	require.NoError(t, cache.putState("ch1", "ns1", "k1", expectedValue))
+


After this, you could have added following to verify the initial condition for the test

v, err = cache.getState("ch1", "ns1", "k1") require.NoError(t, err) require.True(t, proto.Equal(expectedValue, v))

manish-sethi · 2020-09-24T15:25:45Z

core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache_test.go

+	require.NoError(t, cache.putState("ch1", "ns1", "k1", expectedValue1))
+


Similar to this, you could have added a test for the function cache.UpdateStates, as that is not covered in this test.

wlahti · 2020-09-24T15:46:50Z

Locally tests passed. Any hints how to improve them on CI?

Discover, report, and fix flakes. The issues you're hitting have probably been around for a long time.
https://jira.hyperledger.org/browse/FAB-17661?jql=statusCategory%20%20!%3D%20Done%20AND%20labels%20%3D%20ci-flake

Actually, this is a newer flake and I'm the one guilty of introducing it: https://jira.hyperledger.org/browse/FAB-18238 Just created a PR with a fix: #1916

FYI the fix for the integration test flake you hit (again) here was just merged into master. Rebase on master to pull in that commit and you should be good to go.

manish-sethi

Thanks @przemyslaw for your patience and consistent improvement on this PR.

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

manish-sethi · 2020-09-24T16:40:27Z

@Mergifyio backport release-2.2

mergify · 2020-09-24T16:41:16Z

Command backport release-2.2: failure

No backport have been created

Backport to branch release-2.2 failed

Cherry-pick of 646260e has failed:

On branch mergify/bp/release-2.2/pr-1909
Your branch is up to date with 'origin/release-2.2'.

You are currently cherry-picking commit 646260eb9.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:

	modified:   core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache.go
	modified:   core/ledger/kvledger/txmgmt/statedb/statecouchdb/cache_test.go
	deleted:    vendor/github.com/VictoriaMetrics/fastcache/.travis.yml
	modified:   vendor/github.com/VictoriaMetrics/fastcache/README.md
	modified:   vendor/github.com/VictoriaMetrics/fastcache/bigcache.go
	modified:   vendor/github.com/VictoriaMetrics/fastcache/fastcache.go
	modified:   vendor/github.com/VictoriaMetrics/fastcache/file.go
	modified:   vendor/github.com/VictoriaMetrics/fastcache/go.mod
	modified:   vendor/github.com/VictoriaMetrics/fastcache/go.sum
	deleted:    vendor/github.com/cespare/xxhash/README.md
	deleted:    vendor/github.com/cespare/xxhash/go.mod
	deleted:    vendor/github.com/cespare/xxhash/go.sum
	deleted:    vendor/github.com/cespare/xxhash/rotate.go
	deleted:    vendor/github.com/cespare/xxhash/rotate19.go
	new file:   vendor/github.com/cespare/xxhash/v2/.travis.yml
	renamed:    vendor/github.com/cespare/xxhash/LICENSE.txt -> vendor/github.com/cespare/xxhash/v2/LICENSE.txt
	new file:   vendor/github.com/cespare/xxhash/v2/README.md
	new file:   vendor/github.com/cespare/xxhash/v2/go.mod
	new file:   vendor/github.com/cespare/xxhash/v2/go.sum
	new file:   vendor/github.com/cespare/xxhash/v2/xxhash.go
	renamed:    vendor/github.com/cespare/xxhash/xxhash_amd64.go -> vendor/github.com/cespare/xxhash/v2/xxhash_amd64.go
	renamed:    vendor/github.com/cespare/xxhash/xxhash_amd64.s -> vendor/github.com/cespare/xxhash/v2/xxhash_amd64.s
	renamed:    vendor/github.com/cespare/xxhash/xxhash_other.go -> vendor/github.com/cespare/xxhash/v2/xxhash_other.go
	renamed:    vendor/github.com/cespare/xxhash/xxhash_safe.go -> vendor/github.com/cespare/xxhash/v2/xxhash_safe.go
	new file:   vendor/github.com/cespare/xxhash/v2/xxhash_unsafe.go
	deleted:    vendor/github.com/cespare/xxhash/xxhash.go
	deleted:    vendor/github.com/cespare/xxhash/xxhash_unsafe.go
	modified:   vendor/github.com/golang/snappy/AUTHORS
	modified:   vendor/github.com/golang/snappy/CONTRIBUTORS
	modified:   vendor/github.com/golang/snappy/decode.go
	new file:   vendor/github.com/golang/snappy/decode_arm64.s
	renamed:    vendor/github.com/golang/snappy/decode_amd64.go -> vendor/github.com/golang/snappy/decode_asm.go
	modified:   vendor/github.com/golang/snappy/decode_other.go
	modified:   vendor/github.com/golang/snappy/encode.go
	new file:   vendor/github.com/golang/snappy/encode_arm64.s
	renamed:    vendor/github.com/golang/snappy/encode_amd64.go -> vendor/github.com/golang/snappy/encode_asm.go
	modified:   vendor/github.com/golang/snappy/encode_other.go
	modified:   vendor/modules.txt

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	both modified:   go.mod
	both modified:   go.sum

* Bug Fix: Saving big payloads by cache CouchDB Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Update fastcache and use new API from fastcache Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Cache support only small payloads Signed-off-by: przemyslaw <przemo.wasala@gmail.com> Signed-off-by: manish <manish.sethi@gmail.com>

przemo22 requested a review from a team as a code owner September 22, 2020 08:42

Bug Fix: Saving big payloads by cache CouchDB

2bc4d95

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

manish-sethi requested changes Sep 22, 2020

View reviewed changes

cendhu reviewed Sep 22, 2020

View reviewed changes

Update fastcache and use new API from fastcache

fb10369

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

manish-sethi requested changes Sep 24, 2020

View reviewed changes

manish-sethi reviewed Sep 24, 2020

View reviewed changes

manish-sethi approved these changes Sep 24, 2020

View reviewed changes

Cache support only small payloads

da178bc

Signed-off-by: przemyslaw <przemo.wasala@gmail.com>

manish-sethi merged commit 646260e into hyperledger:master Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix: Saving big payloads by cache CouchDB #1909

Bug Fix: Saving big payloads by cache CouchDB #1909

przemo22 commented Sep 22, 2020 •

edited by denyeart

Loading

mastersingh24 commented Sep 22, 2020 •

edited

Loading

przemo22 commented Sep 22, 2020

cendhu commented Sep 22, 2020

denyeart commented Sep 22, 2020

manish-sethi Sep 22, 2020 •

edited

Loading

cendhu Sep 22, 2020

manish-sethi Sep 22, 2020

przemo22 Sep 23, 2020

manish-sethi Sep 23, 2020 •

edited

Loading

cendhu Sep 23, 2020

przemo22 Sep 24, 2020 •

edited

Loading

przemo22 commented Sep 23, 2020

sykesm commented Sep 23, 2020

wlahti commented Sep 23, 2020

manish-sethi left a comment

manish-sethi Sep 24, 2020

cendhu Sep 24, 2020

manish-sethi Sep 24, 2020

manish-sethi Sep 24, 2020

manish-sethi Sep 24, 2020

przemo22 commented Sep 24, 2020

manish-sethi commented Sep 24, 2020 •

edited

Loading

denyeart commented Sep 24, 2020

przemo22 commented Sep 24, 2020

manish-sethi left a comment

manish-sethi Sep 24, 2020

manish-sethi Sep 24, 2020

manish-sethi Sep 24, 2020

wlahti commented Sep 24, 2020

manish-sethi left a comment

manish-sethi commented Sep 24, 2020

mergify bot commented Sep 24, 2020

		expectedValue := &CacheValue{Value: []byte("value")}
		require.NoError(t, cache.putState("ch1", "ns1", "k1", expectedValue))

Bug Fix: Saving big payloads by cache CouchDB #1909

Bug Fix: Saving big payloads by cache CouchDB #1909

Conversation

przemo22 commented Sep 22, 2020 • edited by denyeart Loading

Type of change

Description

Additional details

Release Note

mastersingh24 commented Sep 22, 2020 • edited Loading

przemo22 commented Sep 22, 2020

cendhu commented Sep 22, 2020

denyeart commented Sep 22, 2020

manish-sethi Sep 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manish-sethi Sep 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

przemo22 Sep 24, 2020 • edited Loading

Choose a reason for hiding this comment

przemo22 commented Sep 23, 2020

sykesm commented Sep 23, 2020

wlahti commented Sep 23, 2020

manish-sethi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

przemo22 commented Sep 24, 2020

manish-sethi commented Sep 24, 2020 • edited Loading

denyeart commented Sep 24, 2020

przemo22 commented Sep 24, 2020

manish-sethi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlahti commented Sep 24, 2020

manish-sethi left a comment

Choose a reason for hiding this comment

manish-sethi commented Sep 24, 2020

mergify bot commented Sep 24, 2020

przemo22 commented Sep 22, 2020 •

edited by denyeart

Loading

mastersingh24 commented Sep 22, 2020 •

edited

Loading

manish-sethi Sep 22, 2020 •

edited

Loading

manish-sethi Sep 23, 2020 •

edited

Loading

przemo22 Sep 24, 2020 •

edited

Loading

manish-sethi commented Sep 24, 2020 •

edited

Loading