Fix signals.py mistake #438

locke4 · 2023-11-25T23:29:22Z

This fix helps prevent #428 from resulting in the entire container being killed by the host. The timeout still occurs on the web browser, but importantly the delete task does complete whereas before the gunicorn worker timeout interrupts the task. Reloading the tubesync homepage after the timeout will now show the source as deleted. I've done this by ensuring that the gunicorn workers have a much longer timeout before being killed (10mins instead of 30 seconds), and by forcing the workers to be restarted. I think the key change is increasing the timeout but I think both are helpful in long-running applications (months of uptime).

As can be seen from the logs below, it's the gunicorn worker (pid 358) that ultimately causes the application to crash completely and the delete task to be interrupted. First is log from meeb/tubesync:latest, 2nd log has this fix included (I've shortened the log to show the key entries only). I understand the root cause (excessive memory usage) is still unaddressed, but this should hopefully reduce the severity of the issue.

Also fixes a mistake I made in signals.py which prevented the filter_text from working properly.

2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [316] [INFO] Starting gunicorn 21.2.0
2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [316] [INFO] Listening at: http://127.0.0.1:8080 (316)
2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [316] [INFO] Using worker: sync
2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [358] [INFO] Booting worker with pid: 358
2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [359] [INFO] Booting worker with pid: 359
2023-11-25 23:14:22 [2023-11-25 23:14:22 +0000] [366] [INFO] Booting worker with pid: 366
[...]
2023-11-25 23:14:39 172.17.0.1 - - [25/Nov/2023:23:14:39 +0000] "GET /source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e HTTP/1.1" 200 2341 "http://localhost:4848/source/592c2f4b-3873-4c3b-9373-7c16037b8a9e" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
2023-11-25 23:14:54 2023-11-25 23:14:54,579 [tubesync/INFO] Deleting tasks for media: Arma 3 | Alpha Squad Command [Live Altis Operations I&A]
[...]
2023-11-25 23:15:11 [2023-11-25 23:15:11 +0000] [316] [CRITICAL] WORKER TIMEOUT (pid:358)
2023-11-25 23:15:12 172.17.0.1 - - [25/Nov/2023:23:15:12 +0000] "POST /source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e HTTP/1.1" 502 552 "http://localhost:4848/source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
2023-11-25 23:15:12 2023/11/25 23:15:12 [error] 342#342: *1 upstream prematurely closed connection while reading response header from upstream, client: 172.17.0.1, server: _, request: "POST /source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e HTTP/1.1", upstream: "http://127.0.0.1:8080/source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e", host: "localhost:4848", referrer: "http://localhost:4848/source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e"
2023-11-25 23:15:12 [2023-11-25 23:15:12 +0000] [316] [ERROR] Worker (pid:358) was sent SIGKILL! Perhaps out of memory?
2023-11-25 23:15:12 [2023-11-25 23:15:12 +0000] [369] [INFO] Booting worker with pid: 369


2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [315] [INFO] Starting gunicorn 21.2.0
2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [315] [INFO] Listening at: http://127.0.0.1:8080 (315)
2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [315] [INFO] Using worker: sync
2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [357] [INFO] Booting worker with pid: 357
2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [358] [INFO] Booting worker with pid: 358
2023-11-25 23:23:23 [2023-11-25 23:23:23 +0000] [361] [INFO] Booting worker with pid: 361
[...]
2023-11-25 23:24:58 172.17.0.1 - - [25/Nov/2023:23:24:58 +0000] "GET /source-delete/592c2f4b-3873-4c3b-9373-7c16037b8a9e HTTP/1.1" 200 2340 "http://localhost:4848/source/592c2f4b-3873-4c3b-9373-7c16037b8a9e" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
[...]
2023-11-25 23:25:12 2023-11-25 23:25:12,871 [tubesync/INFO] Deleting tasks for media: Arma 3 | Alpha Squad Command [Live Altis Operations I&A]
[...]
2023-11-25 23:26:18 Failed to retrieve tasks. Database unreachable.
[...]
2023-11-25 23:26:29 2023-11-25 23:26:29,234 [tubesync/INFO] Deleting tasks for media: Jellyfish - aka - what happens when you didnt start Youtube with the intention of making a channel
2023-11-25 23:26:29 2023-11-25 23:26:29,329 [tubesync/INFO] Deleting tasks for source: Luetin09

meeb · 2023-11-26T05:55:22Z

Thanks for the PR! I'll comment inline.

tubesync/tubesync/gunicorn.py

tubesync/sync/signals.py

locke4 · 2023-11-27T21:35:36Z

I've reverted the change so this PR can go through with just the fix for signals.py. Will tinker with my branch but short term fix, I've put another 16Gb of RAM in my server 😀

meeb · 2023-11-28T04:11:44Z

Any chance you could squash / rebase this down to a single commit?

meeb · 2023-11-28T17:05:10Z

Thanks!

…loading and upload date being checked, resolves #440, #183, related to #438

locke4 marked this pull request as ready for review November 25, 2023 23:30

meeb reviewed Nov 26, 2023

View reviewed changes

tubesync/tubesync/gunicorn.py Outdated Show resolved Hide resolved

meeb reviewed Nov 26, 2023

View reviewed changes

tubesync/sync/signals.py Show resolved Hide resolved

locke4 requested a review from meeb November 27, 2023 21:35

locke4 marked this pull request as draft November 28, 2023 08:41

locke4 closed this Nov 28, 2023

locke4 force-pushed the main branch from 28647c6 to 33b4711 Compare November 28, 2023 08:46

Update signals.py

2d6f485

locke4 reopened this Nov 28, 2023

locke4 marked this pull request as ready for review November 28, 2023 08:48

locke4 changed the title ~~Fix gunicorn worker timeout crashes and signals.py mistake~~ Fix signals.py mistake Nov 28, 2023

meeb merged commit 45c1256 into meeb:main Nov 28, 2023

meeb added a commit that referenced this pull request Nov 30, 2023

rework skip logic check, prevent race condition between metadata down…

e54a762

…loading and upload date being checked, resolves #440, #183, related to #438

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix signals.py mistake #438

Fix signals.py mistake #438

locke4 commented Nov 25, 2023 •

edited

Loading

meeb commented Nov 26, 2023

locke4 commented Nov 27, 2023

meeb commented Nov 28, 2023

meeb commented Nov 28, 2023

Fix signals.py mistake #438

Fix signals.py mistake #438

Conversation

locke4 commented Nov 25, 2023 • edited Loading

meeb commented Nov 26, 2023

locke4 commented Nov 27, 2023

meeb commented Nov 28, 2023

meeb commented Nov 28, 2023

locke4 commented Nov 25, 2023 •

edited

Loading