Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: key components health check #2510

Merged
merged 2 commits into from
Jul 3, 2024
Merged

feat: key components health check #2510

merged 2 commits into from
Jul 3, 2024

Conversation

zzzming
Copy link
Collaborator

@zzzming zzzming commented Jul 3, 2024

Since /health is also supported by uvicorn, so until the langflow's router is initialized. /health returns the status of uvicorn. That does not reflect the readiness of the langflow.

This PR introduces a new endpoint /health_check that also test key services like chat, session, and variable services that make sure cache , DB, or kubernetes backend are working. It offers more reliable service type of check rather a simple route based health check. Only a few selected services to save the health check time.

Once this PR is merged and released, Kubernetes's probe should use the /health_check endpoint.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 3, 2024
@zzzming zzzming enabled auto-merge (squash) July 3, 2024 18:13
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good, Ming. Thank you!

src/backend/base/langflow/main.py Show resolved Hide resolved
logger.exception(e)

if response.has_error():
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=response.model_dump())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we dump this, should the parameter include the error?

e.g.

   try:
        create_default_folder_if_it_doesnt_exist(session, uuid.UUID(user_id))
        response.folder = "ok"
    except Exception as e:
        logger.exception(e)
        response.folder = e.message()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I will do another PR to improve that.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 3, 2024
@zzzming zzzming merged commit a2ec87b into main Jul 3, 2024
25 checks passed
@zzzming zzzming deleted the fix/buildFlow branch July 3, 2024 19:42
Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This portion of the code needs to be changed. This makes the health check not idempotent and not able to run in parallel, which is problematic in many scenarios. For example, in k8s liveness and readiness probe could run in parallel.

It's ok to write/read to the cache but there's no need to verify the correctness, we should just check the availability.

        await chat.set_cache("health_check", str(test_id))
        if v := await chat.get_cache("health_check"):
            if v.get("result") == str(test_id):
                response.chat = "ok"
            else:
                logger.error("chat service get incorrect value")
  1. I don't see the value of testing the variable service. If variables are on the db, you're testing the DB (again!) and if they are on k8s it's very expensive and not worth it (if you're on k8s, k8s is up..)

Also you're leaking the variable in case the deletion fails.

I would completely remove the variable section.

  1. Again for the folder section, there's no need to test the DB connectivity again.

ogabrielluiz pushed a commit to yaitec/langflow that referenced this pull request Jul 9, 2024
nicoloboschi pushed a commit to datastax/ragstack-ai-langflow that referenced this pull request Jul 10, 2024
key components health check

(cherry picked from commit a2ec87b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants