Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: index backfill fails to complete with default settings #130939

Open
andrewbaptist opened this issue Sep 18, 2024 · 3 comments
Open

sql: index backfill fails to complete with default settings #130939

andrewbaptist opened this issue Sep 18, 2024 · 3 comments
Labels
A-schema-changes branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@andrewbaptist
Copy link
Collaborator

andrewbaptist commented Sep 18, 2024

After creating a cluster and attempting to add an index, the index can get stuck continually failing due to memory settings. The only way to make progress is to manually change the bulkio.index_backfill.batch_size setting, however that should not be required and is not clear from the UI.

Create a 12 node cluster:

roachprod create -n12 $CLUSTER 
roachprod put $CLUSTER artifacts/cockroach
roachprod start --store-count 2 $CLUSTER
roachprod ssh $CLUSTER:1 "./cockroach workload init kv {pgurl:1}"
roachprod ssh $CLUSTER:1 "./cockroach workload run kv  --duration=600s --max-block-bytes=10000 --min-block-bytes=10000 --concurrency=100 {pgurl:1-12}"

Attempt to create an index on the cluster

CREATE INDEX i ON kv.kv (k, v)

Note that this will never complete and instead get stuck on step 2.

To unstick it, set the backfill batch size (this can be run either before the index creation or while it is stuck).

SET CLUSTER SETTING bulkio.index_backfill.batch_size = 5000;

The index creation will now complete successfully.

Jira issue: CRDB-42302

@andrewbaptist andrewbaptist added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-changes T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Sep 18, 2024
Copy link

blathers-crl bot commented Sep 18, 2024

Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Sep 18, 2024
As part of the index backfill perturbation testing, three bugs were
found in the backfill process and replication. Until those are fixed,
parts of the test are disabled.

Informs: cockroachdb#130934
Informs: cockroachdb#130902
Informs: cockroachdb#130939

Epic: none

Release note: None
@rafiss
Copy link
Collaborator

rafiss commented Sep 18, 2024

Note that this will never complete and instead get stuck on step 2.

Could you clarify a bit more? What does "step 2" refer to?


Which version were you using when this occurred? We recently merged #128201, which is meant to address a problem similar to what you may have seen.

@andrewbaptist
Copy link
Collaborator Author

This is on master (from today) so it has that fix. Step 2 refers to the step listed in the jobs table (2/7). I didn't very carefully look at the details about why it was stuck but it is quick to run using the steps above.

@andrewbaptist andrewbaptist added the branch-master Failures and bugs on the master branch. label Sep 18, 2024
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Sep 18, 2024
As part of the index backfill perturbation testing, three bugs were
found in the backfill process and replication. Until those are fixed,
parts of the test are disabled.

Informs: cockroachdb#130934
Informs: cockroachdb#130902
Informs: cockroachdb#130939

Epic: none

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Sep 18, 2024
During index backfill, the nodes can run out of memory with the default
backfill batch size. This commit reduces the size by 10x to prevent the
memory issue.

Informs: cockroachdb#130939

Epic: none

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

2 participants