Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: concurrent checkTx #49

Merged
merged 18 commits into from
Jan 15, 2021
Merged

feat: concurrent checkTx #49

merged 18 commits into from
Jan 15, 2021

Conversation

jinsan-line
Copy link
Contributor

@jinsan-line jinsan-line commented Jan 11, 2021

Related with: https://github.com/line/link/issues/1151, Finschia/ostracon#160

Description

To optimize performance, we need to increase concurrency. As a first step for it, I implement concurrent checkTx. The key change is to remove app.mtx.Lock() from the abci local client. An application, as an abci server, is better to protect itself from concurrency than an abci client. W/ current implementation, the abci local client protects an abci server but it decreases concurrency.

CONTRACT:

Now, an application should protect itself from concurrent checkTx as an abci server that means it should be thread safe.
We'll also increase concurrency for other abci methods as much as possible in the future.

What should application protect?

In cosmos-sdk application, it should protect accounts from the concurrency.

Motivation and context

How has this been tested?

  • make test
  • make test-cover

Checklist:

  • I followed the contributing guidelines.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@jinsan-line jinsan-line self-assigned this Jan 11, 2021
@jinsan-line jinsan-line marked this pull request as draft January 11, 2021 06:56
@jinsan-line jinsan-line marked this pull request as ready for review January 11, 2021 08:54
@jinsan-line jinsan-line marked this pull request as draft January 11, 2021 10:07
tail3 := addr[len(addr)-3:]
tail := append([]byte{0}, tail3...)

addrKey := binary.BigEndian.Uint32(tail)
Copy link

@egonspace egonspace Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addrMtx has so many redundant instances not being used because a character of addr cannot cover all of 0~255.
I think we should use decoded bytes for mutex hash.

_, data, _ := bech32.Decode(addr) // one character of data has a value 0~31 (5 bits)
addrKey := (int(data[0]) << 10) | (int(data[1]) << 5) | int(data[2]) // it may be 0~32767

And I think 32k mutexs are enough.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for reference, I think it is not good to use the last 6 bytes of Addr as hash because it is checksum.

Copy link
Contributor Author

@jinsan-line jinsan-line Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egonspace

addrMtx has so many redundant instances not being used because a character of addr cannot cover all of 0~255.
I think we should use decoded bytes for mutex hash.

AFAIK, AccAddress is not encoded address but is decoded address.

And I think 32k mutexs are enough.

Please could you share the reason? I think we need a metric for conflict ratio. I'll investigate it w/ dashboard. But, as a basic approach rule, I think the team is already aligned to utilize memory to optimize performance because it is cheapest among cpu, memory, and storage. A Mutex is 8bytes so addrMtx uses 32MiB. I think 32MiB is not matter but conflict ratio is matter in terms of performance.

And for reference, I think it is not good to use the last 6 bytes of Addr as hash because it is checksum.

AFAIK, AccAddress is decoded and doesn't have checksum. And also I think checksum is better for key than sampling in terms of distribution and conflict possibility. I'll think it more. Thanks for inspiring me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm sorry I didn't know AccAddress was already decoded form. And then, it seems that code has no problem.
But isn't the memory allocated as mutex 128MB? Doesn't 1 << 24 mean 16777216? So 8*16777216/1024/1024 = 128.
Of course, I agree with the statement that it would be good to accumulate collision statistics and then judge them.

I don't think it's good to use checksum as a hash because I don't know how to calculate the checksum and I don't know if the variance is as even as hash.

Copy link
Contributor Author

@jinsan-line jinsan-line Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egonspace

Thanks, I mistook simple calculation. addrMtx uses 128MiB.

  • the number of mutex: 1 << 24 = 2 ^ 24 = (2 ^ 4) * (2 ^ 10) * (2 ^ 10) = 16M
  • mutex size: 8bytes
  • 8bytes * 16M = 128MiB.

I still believe it's not too big but, as I said, we should monitor with a practical experiment.

I don't think it's good to use checksum as a hash because I don't know how to calculate the checksum and I don't know if the variance is as even as hash.

As I said, I'm still thinking it but don't have personal conclusion yet.

Copy link
Contributor Author

@jinsan-line jinsan-line Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egonspace

The number of mutex

  • I set up sampleBytes as 2. Now, the number of mutex is 64k so accMtx uses 512KiB. (ec00f66)
  • With monitoring, if I found the conflict ratio is too high, I'll increase it at that time.

AccKey

  • AccAddress might be random number so I get personal conclusion that just sampling is enough.

If you have any idea, please let me know.
Thanks,

Copy link

@egonspace egonspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jinsan-line jinsan-line marked this pull request as ready for review January 12, 2021 08:24
Copy link
Contributor

@wetcod wetcod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kukugi kukugi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jinsan-line jinsan-line merged commit 2982988 into Finschia:feat/perf Jan 15, 2021
@jinsan-line jinsan-line deleted the concurrent-checktx branch January 15, 2021 04:58
jinsan-line added a commit that referenced this pull request Apr 22, 2021
* feat: implement new abci, `BeginRecheckTx()` and `EndRecheckTx()`

* test: fix tests

* refactor: decompose checkTx & runTx

* chore: protect app.checkState w/ RWMutex for simulate

* chore: remove unused var

* feat: account lock decorator

* chore: skip AccountLockDecorator if not checkTx

* chore: bump up tendermint

* chore: revise accountlock position

* chore: accountlock_test

* chore: revise accountlock covers `cache.Write()`

* chore: revise `sampleBytes` to `2`

* fix: test according to `sampleBytes`

* chore: revise `getUniqSortedAddressKey()` and add `getAddressKey()`

* chore: revise `how to sort` not to use `reflection`

* chore: bump up tendermint

* test: check `sorted` in `TestGetUniqSortedAddressKey()`

* chore: move `accountLock` from `anteTx()` to `checkTx()`
# Conflicts:
#	baseapp/abci.go
#	baseapp/baseapp.go
#	baseapp/baseapp_test.go
#	baseapp/helpers.go
#	go.mod
#	go.sum
#	x/bank/bench_test.go
#	x/mock/test_utils.go
@jinsan-line jinsan-line mentioned this pull request Apr 22, 2021
9 tasks
jinsan-line added a commit that referenced this pull request Apr 22, 2021
* chore: bump up ostracon, iavl and tm-db

* feat: concurrent checkTx (#49)

* feat: implement new abci, `BeginRecheckTx()` and `EndRecheckTx()`

* test: fix tests

* refactor: decompose checkTx & runTx

* chore: protect app.checkState w/ RWMutex for simulate

* chore: remove unused var

* feat: account lock decorator

* chore: skip AccountLockDecorator if not checkTx

* chore: bump up tendermint

* chore: revise accountlock position

* chore: accountlock_test

* chore: revise accountlock covers `cache.Write()`

* chore: revise `sampleBytes` to `2`

* fix: test according to `sampleBytes`

* chore: revise `getUniqSortedAddressKey()` and add `getAddressKey()`

* chore: revise `how to sort` not to use `reflection`

* chore: bump up tendermint

* test: check `sorted` in `TestGetUniqSortedAddressKey()`

* chore: move `accountLock` from `anteTx()` to `checkTx()`
# Conflicts:
#	baseapp/abci.go
#	baseapp/baseapp.go
#	baseapp/baseapp_test.go
#	baseapp/helpers.go
#	go.mod
#	go.sum
#	x/bank/bench_test.go
#	x/mock/test_utils.go

* fix: make it buildable

* fix: tests

* fix: gasWanted & gasUsed are always `0` (#51)

* fix: gasWanted & gasUsed is always `0`

* chore: error log for general panic
# Conflicts:
#	baseapp/baseapp.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants