Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove vllm dependency when using ray to run vllm #1637

Merged
merged 2 commits into from
Jan 10, 2024

Conversation

jieguangzhou
Copy link
Collaborator

Description

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make unit-testing and make integration-testing successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (34830a7) 80.33% compared to head (672407b) 80.04%.
Report is 1371 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1637      +/-   ##
==========================================
- Coverage   80.33%   80.04%   -0.30%     
==========================================
  Files          95      116      +21     
  Lines        6602     8230    +1628     
==========================================
+ Hits         5304     6588    +1284     
- Misses       1298     1642     +344     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if not ray.is_initialized():
ray.init(address=self.ray_address, runtime_env=runtime_env)

LLM = ray.remote(LLM).remote
if self.vllm_kwargs.get('tensor_parallel_size') == 1:
# must set num_gpus to 1 to avoid error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be an assertion during config level so that user knows that num_gpus should be assigned to 1
else User might be in false hope of
num_gpus = 4 and tensor_parallel_size = 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still a bit unclear about this. Are you suggesting that we should inform users about this behavior? Should this information be communicated through documentation or some specific configuration settings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert ray_config.get('num_gpus') == 1 when self.vllm_kwargs.get('tensor_parallel_size') == 1, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! something like this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once done we can merge this pr :) @jieguangzhou

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do it later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I changed to printing a warning with a description and helping the user set it to 1.
The reason for not using assertions directly is because users will still reset num_gpus, which can increase fault tolerance.
@kartik4949

self.ray_config["num_gpus"] = 1
LLM = ray.remote(**self.ray_config)(_VLLMCore).remote
else:
# Don't know why using config will block the process, need to figure out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this when 'tensor_parallel_size'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tensor_parallel_size is greater than one, the built-in ray integration of vllm is used, so the task is run directly on the ray cluster, and the ray configuration is managed by vllm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be regarded as how many GPUs are used to share this model.

Copy link
Collaborator Author

@jieguangzhou jieguangzhou Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current manual configuration of Ray might lead to conflicts with the configuration of vLLM. When I have time, I will continue to examine the vLLM Ray-related code. The usage method on the vLLM official website involves starting a Ray cluster locally or connecting as a worker. They do not utilize the ray_address parameter; using this parameter can lead to a deadlock bug.
Therefore, I made some adaptations on the non-vllm side to make it compatible with remote multi-card and single-card

Issues related to:

@jieguangzhou jieguangzhou force-pushed the feat/vllm-dependency branch 2 times, most recently from 2f546e6 to 672407b Compare January 10, 2024 07:04
@jieguangzhou jieguangzhou merged commit e6f2752 into superduper-io:main Jan 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove vllm dependency when using ray to run vllm
3 participants