fix: Crash caused by numpy.vectorize #21936

john-bodley · 2022-10-25T17:40:59Z

SUMMARY

We (Airbnb) has a user report an error where in SQL Lab an aysnc query would run for infinitum when the row limit was increased. The issue was the Celery worker crashed with the following error:

WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).

It turns out the root cause was a call to numpy.vectorize function which per here can consume copious amounts of memory. The fix was merely to un-vectorize the logic using a iterator per the Numpy documentation, acknowledging that there may be a performance hit.

Note this is the only place in the code base which uses the numpy.vectorize function.

$ git grep "vectorize"
superset/result_set.py:    vstringify = np.vectorize(stringify)

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

john-bodley · 2022-10-25T18:16:07Z

superset/result_set.py

+
+    with np.nditer(result, flags=["refs_ok"], op_flags=["readwrite"]) as it:
+        for obj in it:
+            obj[...] = stringify(obj)


I've never come across the ... in the context of Numpy before, but this logic was lifted from their official documentation.

Update result_set.py Update result_set.py

codecov · 2022-10-25T20:32:10Z

Codecov Report

Merging #21936 (8c1f001) into master (1388f21) will increase coverage by 0.15%.
The diff coverage is 100.00%.

❗ Current head 8c1f001 differs from pull request most recent head 83d8739. Consider uploading reports for the commit 83d8739 to get more accurate results

@@            Coverage Diff             @@
##           master   #21936      +/-   ##
==========================================
+ Coverage   66.85%   67.00%   +0.15%     
==========================================
  Files        1807     1807              
  Lines       69190    69315     +125     
  Branches     7402     7402              
==========================================
+ Hits        46258    46447     +189     
+ Misses      21021    20957      -64     
  Partials     1911     1911

Flag	Coverage Δ
hive	`53.04% <100.00%> (?)`
javascript	`53.38% <ø> (ø)`
mysql	`78.44% <100.00%> (+0.06%)`	⬆️
postgres	`78.50% <100.00%> (+0.06%)`	⬆️
presto	`52.95% <100.00%> (+0.12%)`	⬆️
python	`81.55% <100.00%> (+0.26%)`	⬆️
sqlite	`76.98% <100.00%> (+0.06%)`	⬆️
unit	`51.16% <0.00%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/result_set.py	`97.82% <100.00%> (+0.04%)`	⬆️
superset/utils/core.py	`90.32% <0.00%> (+0.10%)`	⬆️
superset/connectors/sqla/models.py	`91.01% <0.00%> (+0.17%)`	⬆️
superset/db_engine_specs/base.py	`89.69% <0.00%> (+0.31%)`	⬆️
superset/models/core.py	`91.40% <0.00%> (+1.29%)`	⬆️
superset/db_engine_specs/hive.py	`87.35% <0.00%> (+16.20%)`	⬆️
superset/db_engines/hive.py	`85.18% <0.00%> (+85.18%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

villebro

LGTM - I assume this is stringifying a query result that contains lots of massive objects or something? Anyway, IMO In the long run this whole data passing-through logic should be refactored to stream through data in chunks using some proper binary format rather than processing the whole thing in memory and then dumping it all as one massive JSON blob. So until that happens (never? 😄) I think we need to prioritize stability over performance. If this does cause an unacceptable perf hit we can probably try to figure out a reasonable threshold that determines whether to use vectorization vs row iteration.

john-bodley · 2022-10-26T22:06:13Z

@villebro, per your comment,

I assume this is stringifying a query result that contains lots of massive objects or something?

Yes this was the case. The type in question was an array of string and it barfed when the row limit was increased from 10,000 to 100,000 records.

(cherry picked from commit 059e53a)

pull-request-size bot added the size/XS label Oct 25, 2022

john-bodley marked this pull request as ready for review October 25, 2022 17:43

pull-request-size bot added size/S and removed size/XS labels Oct 25, 2022

john-bodley commented Oct 25, 2022

View reviewed changes

pull-request-size bot added size/XS and removed size/S labels Oct 25, 2022

fix: Crash caused by numpy.vectorize

83d8739

Update result_set.py Update result_set.py

john-bodley force-pushed the john-bodley--fix-crash-numpy-vectorize branch from 8c1f001 to 83d8739 Compare October 25, 2022 18:20

john-bodley requested review from betodealmeida, ktmud and villebro October 25, 2022 22:56

villebro approved these changes Oct 26, 2022

View reviewed changes

john-bodley merged commit 059e53a into master Oct 26, 2022

john-bodley deleted the john-bodley--fix-crash-numpy-vectorize branch October 26, 2022 22:06

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Oct 26, 2022

fix: Crash caused by numpy.vectorize (apache#21936)

7427d31

(cherry picked from commit 059e53a)

john-bodley mentioned this pull request Nov 3, 2022

fix: JSON serializers #22029

Merged

9 tasks

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Nov 4, 2022

fix: Crash caused by numpy.vectorize (apache#21936)

d13f568

(cherry picked from commit 059e53a)

mistercrunch added the 🚢 2.1.3 label Feb 18, 2024

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Crash caused by numpy.vectorize #21936

fix: Crash caused by numpy.vectorize #21936

john-bodley commented Oct 25, 2022 •

edited

Loading

john-bodley Oct 25, 2022

codecov bot commented Oct 25, 2022 •

edited

Loading

villebro left a comment

john-bodley commented Oct 26, 2022

fix: Crash caused by numpy.vectorize #21936

fix: Crash caused by numpy.vectorize #21936

Conversation

john-bodley commented Oct 25, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

john-bodley Oct 25, 2022

Choose a reason for hiding this comment

codecov bot commented Oct 25, 2022 • edited Loading

Codecov Report

villebro left a comment

Choose a reason for hiding this comment

john-bodley commented Oct 26, 2022

john-bodley commented Oct 25, 2022 •

edited

Loading

codecov bot commented Oct 25, 2022 •

edited

Loading