Remove vllm dependency when using ray to run vllm #1637

jieguangzhou · 2024-01-03T11:25:34Z

Description

Related Issues

Checklist

Is this code covered by new or existing unit tests or integration tests?
Did you run make unit-testing and make integration-testing successfully?
Do new classes, functions, methods and parameters all have docstrings?
Were existing docstrings updated, if necessary?
Was external documentation updated, if necessary?

Additional Notes or Comments

codecov-commenter · 2024-01-03T11:31:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (34830a7) 80.33% compared to head (672407b) 80.04%.
Report is 1371 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1637      +/-   ##
==========================================
- Coverage   80.33%   80.04%   -0.30%     
==========================================
  Files          95      116      +21     
  Lines        6602     8230    +1628     
==========================================
+ Hits         5304     6588    +1284     
- Misses       1298     1642     +344

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kartik4949 · 2024-01-04T10:05:05Z

superduperdb/ext/llm/vllm.py

            if not ray.is_initialized():
                ray.init(address=self.ray_address, runtime_env=runtime_env)

-            LLM = ray.remote(LLM).remote
+            if self.vllm_kwargs.get('tensor_parallel_size') == 1:
+                # must set num_gpus to 1 to avoid error


may be an assertion during config level so that user knows that num_gpus should be assigned to 1
else User might be in false hope of
num_gpus = 4 and tensor_parallel_size = 1

I’m still a bit unclear about this. Are you suggesting that we should inform users about this behavior? Should this information be communicated through documentation or some specific configuration settings?

assert ray_config.get('num_gpus') == 1 when self.vllm_kwargs.get('tensor_parallel_size') == 1, right?

yes! something like this

once done we can merge this pr :) @jieguangzhou

Ok, will do it later

Done, I changed to printing a warning with a description and helping the user set it to 1.
The reason for not using assertions directly is because users will still reset num_gpus, which can increase fault tolerance.
@kartik4949

kartik4949 · 2024-01-04T10:06:52Z

superduperdb/ext/llm/vllm.py

+                self.ray_config["num_gpus"] = 1
+                LLM = ray.remote(**self.ray_config)(_VLLMCore).remote
+            else:
+                # Don't know why using config will block the process, need to figure out


Can you explain this when 'tensor_parallel_size'

When tensor_parallel_size is greater than one, the built-in ray integration of vllm is used, so the task is run directly on the ray cluster, and the ray configuration is managed by vllm.

This can be regarded as how many GPUs are used to share this model.

The current manual configuration of Ray might lead to conflicts with the configuration of vLLM. When I have time, I will continue to examine the vLLM Ray-related code. The usage method on the vLLM official website involves starting a Ray cluster locally or connecting as a worker. They do not utilize the ray_address parameter; using this parameter can lead to a deadlock bug.
Therefore, I made some adaptations on the non-vllm side to make it compatible with remote multi-card and single-card

Issues related to:

Loading Model through Multi-Node Ray Cluster Fails vllm-project/vllm#881

vllm hangs when reinitializing ray vllm-project/vllm#1058

…ng ray

jieguangzhou requested a review from kartik4949 January 3, 2024 13:24

jieguangzhou force-pushed the feat/vllm-dependency branch from c32f2c0 to 7c94047 Compare January 3, 2024 16:23

jieguangzhou requested a review from blythed January 3, 2024 16:41

jieguangzhou linked an issue Jan 3, 2024 that may be closed by this pull request

Remove vllm dependency when using ray to run vllm #1639

Closed

kartik4949 requested changes Jan 5, 2024

View reviewed changes

Remove vllm dependency when using ray to run vllm

5dd01f9

jieguangzhou force-pushed the feat/vllm-dependency branch 2 times, most recently from 2f546e6 to 672407b Compare January 10, 2024 07:04

Optimize the logic about the num_gpus parameter when loading vllm usi…

672407b

…ng ray

kartik4949 approved these changes Jan 10, 2024

View reviewed changes

jieguangzhou merged commit e6f2752 into superduper-io:main Jan 10, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove vllm dependency when using ray to run vllm #1637

Remove vllm dependency when using ray to run vllm #1637

jieguangzhou commented Jan 3, 2024

codecov-commenter commented Jan 3, 2024 •

edited

Loading

kartik4949 Jan 4, 2024

jieguangzhou Jan 6, 2024

jieguangzhou Jan 8, 2024

kartik4949 Jan 9, 2024

kartik4949 Jan 9, 2024

jieguangzhou Jan 9, 2024

jieguangzhou Jan 10, 2024

kartik4949 Jan 4, 2024

jieguangzhou Jan 5, 2024

jieguangzhou Jan 5, 2024

jieguangzhou Jan 5, 2024 •

edited

Loading

Remove vllm dependency when using ray to run vllm #1637

Remove vllm dependency when using ray to run vllm #1637

Conversation

jieguangzhou commented Jan 3, 2024

Description

Related Issues

Checklist

Additional Notes or Comments

codecov-commenter commented Jan 3, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Jan 3, 2024 •

edited

Loading

jieguangzhou Jan 5, 2024 •

edited

Loading