DDG-DA paper code (#743)

* Merge data selection to main * Update trainer for reweighter * Typos fixed. * update data selection interface * successfully run exp after refactor some interface * data selection share handler & trainer * fix meta model time series bug * fix online workflow set_uri bug * fix set_uri bug * updawte ds docs and delay trainer bug * docs * resume reweighter * add reweighting result * fix qlib model import * make recorder more friendly * fix experiment workflow bug * commit for merging master incase of conflictions * Successful run DDG-DA with a single command * remove unused code * asdd more docs * Update README.md * Update & fix some bugs. * Update configuration & remove debug functions * Update README.md * Modfify horizon from code rather than yaml * Update performance in README.md * fix part comments * Remove unfinished TCTS. * Fix some details. * Update meta docs * Update README.md of the benchmarks_dynamic * Update README.md files * Add README.md to the rolling_benchmark baseline. * Refine the docs and link * Rename README.md in benchmarks_dynamic. * Remove comments. * auto download data Co-authored-by: wendili-cs <wendili.academic@qq.com> Co-authored-by: demon143 <785696300@qq.com>
microsoft · Jan 10, 2022 · cf35562 · cf35562
1 parent 184ce34
commit cf35562
Show file tree

Hide file tree

Showing 52 changed files with 2,440 additions and 455 deletions.
diff --git a/README.md b/README.md
@@ -11,6 +11,7 @@
 Recent released features
 | Feature | Status |
 | --                      | ------    |
+| Meta-Learning-based framework & DDG-DA  | [Released](https://github.com/microsoft/qlib/pull/743) on Jan 10, 2022 | 
 | Planning-based portfolio optimization | [Released](https://github.com/microsoft/qlib/pull/754) on Dec 28, 2021 | 
 | Release Qlib v0.8.0 | [Released](https://github.com/microsoft/qlib/releases/tag/v0.8.0) on Dec 8, 2021 |
 | ADD model | [Released](https://github.com/microsoft/qlib/pull/704) on Nov 22, 2021 |
@@ -50,9 +51,12 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
   - [Data Preparation](#data-preparation)
   - [Auto Quant Research Workflow](#auto-quant-research-workflow)
   - [Building Customized Quant Research Workflow by Code](#building-customized-quant-research-workflow-by-code)
-- [**Quant Model(Paper) Zoo**](#quant-model-paper-zoo)
-  - [Run a single model](#run-a-single-model)
-  - [Run multiple models](#run-multiple-models)
+- [Main Challenges & Solutions in Quant Research](#main-challenges--solutions-in-quant-research)
+  - [Forecasting: Finding Valuable Signals/Patterns](#forecasting-finding-valuable-signalspatterns)
+    - [**Quant Model (Paper) Zoo**](#quant-model-paper-zoo)
+      - [Run a Single Model](#run-a-single-model)
+      - [Run Multiple Models](#run-multiple-models)
+  - [Adapting to Market Dynamics](#adapting-to-market-dynamics)
 - [**Quant Dataset Zoo**](#quant-dataset-zoo)
 - [More About Qlib](#more-about-qlib)
 - [Offline Mode and Online Mode](#offline-mode-and-online-mode)
@@ -69,7 +73,6 @@ Your feedbacks about the features are very important.
 | --                      | ------    |
 | Point-in-Time database | Under review: https://github.com/microsoft/qlib/pull/343 |
 | Orderbook database | Under review: https://github.com/microsoft/qlib/pull/744 |
-| Meta-Learning-based data selection | Under review: https://github.com/microsoft/qlib/pull/743 |
 
 # Framework of Qlib
 
@@ -280,8 +283,18 @@ Qlib provides a tool named `qrun` to run the whole workflow automatically (inclu
 ## Building Customized Quant Research Workflow by Code
 The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. [Here](examples/workflow_by_code.ipynb) is a demo for customized Quant research workflow by code.
 
+# Main Challenges & Solutions in Quant Research
+Quant investment is an very unique scenario with lots of key challenges to be solved.
+Currently, Qlib provides some solutions for several of them.
 
-# [Quant Model (Paper) Zoo](examples/benchmarks)
+## Forecasting: Finding Valuable Signals/Patterns
+Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios.
+However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.
+
+An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in `Qlib`
+
+
+### [Quant Model (Paper) Zoo](examples/benchmarks)
 
 Here is a list of models built on `Qlib`.
 - [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](examples/benchmarks/XGBoost/)
@@ -308,7 +321,7 @@ Your PR of new Quant models is highly welcomed.
 
 The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).
 
-## Run a single model
+### Run a single model
 All the models listed above are runnable with ``Qlib``. Users can find the config files we provide and some details about the model through the [benchmarks](examples/benchmarks) folder. More information can be retrieved at the model files listed above.
 
 `Qlib` provides three different ways to run a single model, users can pick the one that fits their cases best:
@@ -318,7 +331,7 @@ All the models listed above are runnable with ``Qlib``. Users can find the confi
 - Users can use the script [`run_all_model.py`](examples/run_all_model.py) listed in the `examples` folder to run a model. Here is an example of the specific shell command to be used: `python run_all_model.py run --models=lightgbm`, where the `--models` arguments can take any number of models listed above(the available models can be found  in [benchmarks](examples/benchmarks/)). For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
     - **NOTE**: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of `tensorflow==1.15.0`)
 
-## Run multiple models
+### Run multiple models
 `Qlib` also provides a script [`run_all_model.py`](examples/run_all_model.py) which can run multiple models for several iterations. (**Note**: the script only support *Linux* for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)
 
 The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as `IC` and `backtest` results will be generated and stored.
@@ -330,6 +343,14 @@ python run_all_model.py run 10
 
 It also provides the API to run specific models at once. For more use cases, please refer to the file's [docstrings](examples/run_all_model.py). 
 
+## [Adapting to Market Dynamics](examples/benchmarks_dynamic)
+
+Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
+So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.
+
+Here is a list of solutions built on `Qlib`.
+- [Rolling Retraining](examples/benchmarks_dynamic/baseline/)
+- [DDG-DA on pytorch (Wendi, et al. AAAI 2022)](examples/benchmarks_dynamic/DDG-DA/)
 
 # Quant Dataset Zoo
 Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`:

diff --git a/docs/component/meta.rst b/docs/component/meta.rst
@@ -0,0 +1,68 @@
+.. _meta:
+
+=================================
+Meta Controller: Meta-Task & Meta-Dataset & Meta-Model
+=================================
+.. currentmodule:: qlib
+
+
+Introduction
+=============
+``Meta Controller`` provides guidance to ``Forecast Model``, which aims to learn regular patterns among a series of forecasting tasks and use learned patterns to guide forthcoming forecasting tasks. Users can implement their own meta-model instance based on ``Meta Controller`` module.
+
+Meta Task
+=============
+
+A `Meta Task` instance is the basic element in the meta-learning framework. It saves the data that can be used for the `Meta Model`. Multiple `Meta Task` instances may share the same `Data Handler`, controlled by `Meta Dataset`. Users should use `prepare_task_data()` to obtain the data that can be directly fed into the `Meta Model`.
+
+.. autoclass:: qlib.model.meta.task.MetaTask
+    :members:
+
+Meta Dataset
+=============
+
+`Meta Dataset` controls the meta-information generating process. It is on the duty of providing data for training the `Meta Model`. Users should use `prepare_tasks` to retrieve a list of `Meta Task` instances.
+
+.. autoclass:: qlib.model.meta.dataset.MetaTaskDataset
+    :members:
+
+Meta Model
+=============
+
+General Meta Model
+------------------
+`Meta Model` instance is the part that controls the workflow. The usage of the `Meta Model` includes:
+1. Users train their `Meta Model` with the `fit` function. 
+2. The `Meta Model` instance guides the workflow by giving useful information via the `inference` function.
+
+.. autoclass:: qlib.model.meta.model.MetaModel
+    :members:
+
+Meta Task Model
+------------------
+This type of meta-model may interact with task definitions directly. Then, the `Meta Task Model` is the class for them to inherit from. They guide the base tasks by modifying the base task definitions. The function `prepare_tasks` can be used to obtain the modified base task definitions.
+
+.. autoclass:: qlib.model.meta.model.MetaTaskModel
+    :members:
+
+Meta Guide Model
+------------------
+This type of meta-model participates in the training process of the base forecasting model. The meta-model may guide the base forecasting models during their training to improve their performances.
+
+.. autoclass:: qlib.model.meta.model.MetaGuideModel
+    :members:
+
+
+Example
+=============
+``Qlib`` provides an implementation of ``Meta Model`` module, ``DDG-DA``, 
+which adapts to the market dynamics. 
+
+``DDG-DA`` includes four steps:
+
+1. Calculate meta-information and encapsulate it into ``Meta Task`` instances. All the meta-tasks form a ``Meta Dataset`` instance.
+2. Train ``DDG-DA`` based on the training data of the meta-dataset.
+3. Do the inference of the ``DDG-DA`` to get guide information.
+4. Apply guide information to the forecasting models to improve their performances.
+
+The `above example <https://github.com/microsoft/qlib/tree/main/examples/benchmarks_dynamic/DDG-DA>`_ can be found in ``examples/benchmarks_dynamic/DDG-DA/workflow.py``.
diff --git a/docs/index.rst b/docs/index.rst
@@ -36,10 +36,11 @@ Document Structure
    :caption: COMPONENTS:
 
    Workflow: Workflow Management <component/workflow.rst>
-   Data Layer: Data Framework&Usage <component/data.rst>
+   Data Layer: Data Framework & Usage <component/data.rst>
    Forecast Model: Model Training & Prediction <component/model.rst>
    Portfolio Management and Backtest <component/strategy.rst>
    Nested Decision Execution: High-Frequency Trading <component/highfreq.rst>
+   Meta Controller: Meta-Task & Meta-Dataset & Meta-Model <component/meta.rst>
    Qlib Recorder: Experiment Management <component/recorder.rst>
    Analysis: Evaluation & Results Analysis <component/report.rst>
    Online Serving: Online Management & Strategy & Tool <component/online.rst>

diff --git a/examples/benchmarks/Linear/workflow_config_linear_Alpha158.yaml b/examples/benchmarks/Linear/workflow_config_linear_Alpha158.yaml
@@ -22,7 +22,6 @@ data_handler_config: &data_handler_config
         - class: CSRankNorm
           kwargs:
               fields_group: label
-    label: ["Ref($close, -2) / Ref($close, -1) - 1"]
 port_analysis_config: &port_analysis_config
     strategy:
         class: TopkDropoutStrategy

diff --git a/examples/benchmarks/TFT/tft.py b/examples/benchmarks/TFT/tft.py
@@ -209,7 +209,6 @@ def fit(self, dataset: DatasetH, MODEL_FOLDER="qlib_tft_model", USE_GPU_ID=0, **
         fixed_params = self.data_formatter.get_experiment_params()
         params = self.data_formatter.get_default_model_params()
 
-        # Wendi: 合并调优的参数和非调优的参数
         params = {**params, **fixed_params}
 
         if not os.path.exists(self.model_folder):

diff --git a/examples/benchmarks_dynamic/DDG-DA/README.md b/examples/benchmarks_dynamic/DDG-DA/README.md
@@ -0,0 +1,27 @@
+# Introduction
+This is the implementation of `DDG-DA` based on `Meta Controller` component provided by `Qlib`.
+
+## Background
+In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work.
+
+Therefore, we propose a novel method `DDG-DA`, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data.
+
+## Dataset
+The data in the paper are private. So we conduct experiments on Qlib's public dataset.
+Though the dataset is different, the conclusion remains the same. By applying `DDG-DA`, users can see rising trends at the test phase both in the proxy models' ICs and the performances of the forecasting models.
+
+## Run the Code
+Users can try `DDG-DA` by running the following command:
+```bash
+    python workflow.py run_all
+```
+
+The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `forecast_model` parameter when `DDG-DA` initializes. For example, users can try `LightGBM` forecasting models by running the following command:
+```bash
+    python workflow.py --forecast_model="gbdt" run_all
+```
+
+
+## Results
+
+The results of other methods in Qlib's public dataset can be found [here](../)
diff --git a/examples/benchmarks_dynamic/DDG-DA/requirements.txt b/examples/benchmarks_dynamic/DDG-DA/requirements.txt
@@ -0,0 +1 @@
+torch==1.10.0