Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL Training pipeline on 5-min data #1415

Merged
merged 13 commits into from
Jan 18, 2023
Merged

RL Training pipeline on 5-min data #1415

merged 13 commits into from
Jan 18, 2023

Conversation

lihuoran
Copy link
Contributor

Description

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@github-actions github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Jan 11, 2023
@@ -218,6 +223,7 @@ def fit(self, vessel: TrainingVesselBase, ckpt_path: Path | None = None) -> None
with _wrap_context(vessel.train_seed_iterator()) as iterator:
vector_env = self.venv_from_iterator(iterator)
self.vessel.train(vector_env)
del vector_env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my experiments, memory will not be normally released after each training round without these explicit del operation, and that will cause OOM finally. I am not 100% sure about the mechanism here, but the del does work. Any better ideas?
CC @you-n-g

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihuoran
Then please add some comments here.
(It is a little counterintuitive based on my understanding of the mechanism of Python.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing on this. Memory and subprocess leak only reproducible when using GPU.

qlib/rl/trainer/trainer.py Outdated Show resolved Hide resolved
qlib/rl/trainer/callbacks.py Outdated Show resolved Hide resolved
@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]):
----------
penalty
The penalty for large volume in a short time.
zoom
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "scale"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihuoran Why is zoom/ scale necessary if we have hyperparameters like learning rate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be renamed to "scale".
@you-n-g In the AAAI 2021's open-source project, they use a scaled reward. I personally think adding this parameter would be more flexible.

qlib/rl/contrib/train_onpolicy.py Outdated Show resolved Hide resolved
qlib/rl/contrib/train_onpolicy.py Outdated Show resolved Hide resolved
qlib/rl/contrib/train_onpolicy.py Outdated Show resolved Hide resolved
qlib/rl/data/pickle_styled.py Outdated Show resolved Hide resolved
@@ -21,10 +21,13 @@ class PAPenaltyReward(Reward[SAOEState]):
----------
penalty
The penalty for large volume in a short time.
zoom
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lihuoran Why is zoom/ scale necessary if we have hyperparameters like learning rate?

qlib/rl/trainer/vessel.py Outdated Show resolved Hide resolved
@you-n-g
Copy link
Collaborator

you-n-g commented Jan 18, 2023

@lihuoran Could you please check the CI errors?

@lihuoran lihuoran merged commit d8fc9ae into main Jan 18, 2023
@you-n-g you-n-g deleted the huoran/baostock_rl branch January 18, 2023 08:22
@you-n-g you-n-g added documentation Improvements or additions to documentation enhancement New feature or request and removed waiting for triage Cannot auto-triage, wait for triage. documentation Improvements or additions to documentation labels Jan 29, 2023
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Workflow runnable

* CI

* Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging.

* Train experiment successful

* Refine handler & provider

* CI issues

* Resolve PR comments

* Resolve PR comments

* CI issues

* Fix test issue

* Black
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Workflow runnable

* CI

* Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging.

* Train experiment successful

* Refine handler & provider

* CI issues

* Resolve PR comments

* Resolve PR comments

* CI issues

* Fix test issue

* Black
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Workflow runnable

* CI

* Slight changes to make the workflow runnable. The changes of handler/provider should be reverted before merging.

* Train experiment successful

* Refine handler & provider

* CI issues

* Resolve PR comments

* Resolve PR comments

* CI issues

* Fix test issue

* Black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants