Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Crash caused by numpy.vectorize #21936

Merged
merged 1 commit into from
Oct 26, 2022

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Oct 25, 2022

SUMMARY

We (Airbnb) has a user report an error where in SQL Lab an aysnc query would run for infinitum when the row limit was increased. The issue was the Celery worker crashed with the following error:

WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

It turns out the root cause was a call to numpy.vectorize function which per here can consume copious amounts of memory. The fix was merely to un-vectorize the logic using a iterator per the Numpy documentation, acknowledging that there may be a performance hit.

Note this is the only place in the code base which uses the numpy.vectorize function.

$ git grep "vectorize"
superset/result_set.py:    vstringify = np.vectorize(stringify)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley marked this pull request as ready for review October 25, 2022 17:43

with np.nditer(result, flags=["refs_ok"], op_flags=["readwrite"]) as it:
for obj in it:
obj[...] = stringify(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never come across the ... in the context of Numpy before, but this logic was lifted from their official documentation.

Update result_set.py

Update result_set.py
@john-bodley john-bodley force-pushed the john-bodley--fix-crash-numpy-vectorize branch from 8c1f001 to 83d8739 Compare October 25, 2022 18:20
@codecov
Copy link

codecov bot commented Oct 25, 2022

Codecov Report

Merging #21936 (8c1f001) into master (1388f21) will increase coverage by 0.15%.
The diff coverage is 100.00%.

❗ Current head 8c1f001 differs from pull request most recent head 83d8739. Consider uploading reports for the commit 83d8739 to get more accurate results

@@            Coverage Diff             @@
##           master   #21936      +/-   ##
==========================================
+ Coverage   66.85%   67.00%   +0.15%     
==========================================
  Files        1807     1807              
  Lines       69190    69315     +125     
  Branches     7402     7402              
==========================================
+ Hits        46258    46447     +189     
+ Misses      21021    20957      -64     
  Partials     1911     1911              
Flag Coverage Δ
hive 53.04% <100.00%> (?)
javascript 53.38% <ø> (ø)
mysql 78.44% <100.00%> (+0.06%) ⬆️
postgres 78.50% <100.00%> (+0.06%) ⬆️
presto 52.95% <100.00%> (+0.12%) ⬆️
python 81.55% <100.00%> (+0.26%) ⬆️
sqlite 76.98% <100.00%> (+0.06%) ⬆️
unit 51.16% <0.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/result_set.py 97.82% <100.00%> (+0.04%) ⬆️
superset/utils/core.py 90.32% <0.00%> (+0.10%) ⬆️
superset/connectors/sqla/models.py 91.01% <0.00%> (+0.17%) ⬆️
superset/db_engine_specs/base.py 89.69% <0.00%> (+0.31%) ⬆️
superset/models/core.py 91.40% <0.00%> (+1.29%) ⬆️
superset/db_engine_specs/hive.py 87.35% <0.00%> (+16.20%) ⬆️
superset/db_engines/hive.py 85.18% <0.00%> (+85.18%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I assume this is stringifying a query result that contains lots of massive objects or something? Anyway, IMO In the long run this whole data passing-through logic should be refactored to stream through data in chunks using some proper binary format rather than processing the whole thing in memory and then dumping it all as one massive JSON blob. So until that happens (never? 😄) I think we need to prioritize stability over performance. If this does cause an unacceptable perf hit we can probably try to figure out a reasonable threshold that determines whether to use vectorization vs row iteration.

@john-bodley
Copy link
Member Author

@villebro, per your comment,

I assume this is stringifying a query result that contains lots of massive objects or something?

Yes this was the case. The type in question was an array of string and it barfed when the row limit was increased from 10,000 to 100,000 records.

@john-bodley john-bodley merged commit 059e53a into master Oct 26, 2022
@john-bodley john-bodley deleted the john-bodley--fix-crash-numpy-vectorize branch October 26, 2022 22:06
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Oct 26, 2022
@john-bodley john-bodley mentioned this pull request Nov 3, 2022
9 tasks
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Nov 4, 2022
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XS 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants