Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine docs #71

Merged
merged 3 commits into from
Dec 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions docs/component/backtest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Introduction

.. note::

``Intraday Trading`` uses ``Order Executor`` to trade and execute orders output by ``Interday Strategy``. ``Order Executor`` is a component in `Qlib Framework <../introduction/introduction.html#framework>`_, which can execute orders. ``Vwap Executor`` and ``Close Executor`` is supported by ``Qlib`` now. In the future, ``Qlib`` will support ``HighFreq Executor`` also.
``Intraday Trading`` uses ``Order Executor`` to trade and execute orders output by ``Portfolio Strategy``. ``Order Executor`` is a component in `Qlib Framework <../introduction/introduction.html#framework>`_, which can execute orders. ``VWAP Executor`` and ``Close Executor`` is supported by ``Qlib`` now. In the future, ``Qlib`` will support ``HighFreq Executor`` also.



Expand All @@ -32,34 +32,34 @@ The simple example of the default strategy is as follows.
# pred_score is the prediction score
report, positions = backtest(pred_score, topk=50, n_drop=0.5, verbose=False, limit_threshold=0.0095)

To know more about backtesting with a specific ``Strategy``, please refer to `Strategy <strategy.html>`_.
To know more about backtesting with a specific ``Strategy``, please refer to `Portfolio Strategy <strategy.html>`_.

To know more about the prediction score `pred_score` output by ``Interday Model``, please refer to `Interday Model: Model Training & Prediction <model.html>`_.
To know more about the prediction score `pred_score` output by ``Forecast Model``, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.

Prediction Score
-----------------

The `prediction score` is a pandas DataFrame. Its index is <instrument(str), datetime(pd.Timestamp)> and it must
The `prediction score` is a pandas DataFrame. Its index is <datetime(pd.Timestamp), instrument(str)> and it must
contains a `score` column.

A prediction sample is shown as follows.

.. code-block:: python

instrument datetime score
SH600000 2019-01-04 -0.505488
SZ002531 2019-01-04 -0.320391
SZ000999 2019-01-04 0.583808
SZ300569 2019-01-04 0.819628
SZ001696 2019-01-04 -0.137140
... ...
SZ000996 2019-04-30 -1.027618
SH603127 2019-04-30 0.225677
SH603126 2019-04-30 0.462443
SH603133 2019-04-30 -0.302460
SZ300760 2019-04-30 -0.126383

``Interday Model`` module can make predictions, please refer to `Interday Model: Model Training & Prediction <model.html>`_.
datetime instrument score
2019-01-04 SH600000 -0.505488
2019-01-04 SZ002531 -0.320391
2019-01-04 SZ000999 0.583808
2019-01-04 SZ300569 0.819628
2019-01-04 SZ001696 -0.137140
... ...
2019-04-30 SZ000996 -1.027618
2019-04-30 SH603127 0.225677
2019-04-30 SH603126 0.462443
2019-04-30 SH603133 -0.302460
2019-04-30 SZ300760 -0.126383

``Forecast Model`` module can make predictions, please refer to `Forecast Model: Model Training & Prediction <model.html>`_.

Backtest Result
------------------
Expand Down
51 changes: 41 additions & 10 deletions docs/component/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,18 @@ Qlib Format Data
------------------

We've specially designed a data structure to manage financial data, please refer to the `File storage design section in Qlib paper <https://arxiv.org/abs/2009.11189>`_ for detailed information.
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data
Such data will be stored with filename suffix `.bin` (We'll call them `.bin` file, `.bin` format, or qlib format). `.bin` file is designed for scientific computing on finance data.

``Qlib`` provides two different off-the-shelf dataset, which can be accessed through this `link <https://github.com/microsoft/qlib/blob/main/qlib/contrib/data/handler.py>`_:

======================== ================= ================
Dataset US Market China Market
======================== ================= ================
Alpha360 √ √

Alpha158 √ √
======================== ================= ================


Qlib Format Dataset
--------------------
Expand All @@ -45,7 +56,7 @@ In addition to China-Stock data, ``Qlib`` also includes a US-Stock dataset, whic

python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us

After running the above command, users can find china-stock and us-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.
After running the above command, users can find china-stock and us-stock data in ``Qlib`` format in the ``~/.qlib/csv_data/cn_data`` directory and ``~/.qlib/csv_data/us_data`` directory respectively.

``Qlib`` also provides the scripts in ``scripts/data_collector`` to help users crawl the latest data on the Internet and convert it to qlib format.

Expand All @@ -54,8 +65,7 @@ When ``Qlib`` is initialized with this dataset, users could build and evaluate t
Converting CSV Format into Qlib Format
-------------------------------------------

``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert data in CSV format into `.bin` files (Qlib format).

``Qlib`` has provided the script ``scripts/dump_bin.py`` to convert **any** data in CSV format into `.bin` files (``Qlib`` format) as long as they are in the correct format.

Users can download the demo china-stock data in CSV format as follows for reference to the CSV format.

Expand Down Expand Up @@ -130,9 +140,21 @@ After conversion, users can find their Qlib format data in the directory `~/.qli

In the convention of `Qlib` data processing, `open, close, high, low, volume, money and factor` will be set to NaN if the stock is suspended.

China-Stock Mode & US-Stock Mode
Multiple Stock Modes
--------------------------------

``Qlib`` now provides two different stock modes for users: China-Stock Mode & US-Stock Mode. Here are some different settings of these two modes:

============== ================= ================
Region Trade Unit Limit Threshold
============== ================= ================
China 100 0.099

US 1 None
============== ================= ================

The `trade unit` defines the unit number of stocks can be used in a trade, and the `limit threshold` defines the bound set to the percentage of ups and downs of a stock.

- If users use ``Qlib`` in china-stock mode, china-stock data is required. Users can use ``Qlib`` in china-stock mode according to the following steps:
- Download china-stock in qlib format, please refer to section `Qlib Format Dataset <#qlib-format-dataset>`_.
- Initialize ``Qlib`` in china-stock mode
Expand Down Expand Up @@ -206,15 +228,21 @@ Data Loader
QlibDataLoader
---------------

The ``QlibDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from the data source.
The ``QlibDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from the ``Qlib`` data source.

StaticDataLoader
---------------

The ``StaticDataLoader`` class in ``Qlib`` is such an interface that allows users to load raw data from file or as provided.


Interface
------------

Here are some interfaces of the ``QlibDataLoader`` class:

.. autoclass:: qlib.data.dataset.loader.QlibDataLoader
:members: load, load_group_df
.. autoclass:: qlib.data.dataset.loader.DataLoader
:members:

API
-----------
Expand All @@ -234,7 +262,7 @@ DataHandlerLP

In addition to use ``Data Handler`` in an automatic workflow with ``qrun``, ``Data Handler`` can be used as an independent module, by which users can easily preprocess data (standardization, remove NaN, etc.) and build datasets.

In order to achieve so, ``Qlib`` provides a base class `qlib.data.dataset.DataHandlerLP <../reference/api.html#qlib.data.dataset.handler.DataHandlerLP>`_. The core idea of this class is that: we will have some leanable ``Processors`` which can learn the parameters of data processing. When new data comes in, these `trained` ``Processors`` can then infer on the new data and thus processing real-time data in an efficient way. More information about ``Processors`` will be listed in the next subsection.
In order to achieve so, ``Qlib`` provides a base class `qlib.data.dataset.DataHandlerLP <../reference/api.html#qlib.data.dataset.handler.DataHandlerLP>`_. The core idea of this class is that: we will have some leanable ``Processors`` which can learn the parameters of data processing(e.g., parameters for zscore normalization). When new data comes in, these `trained` ``Processors`` can then process the new data and thus processing real-time data in an efficient way becomes possible. More information about ``Processors`` will be listed in the next subsection.


Interface
Expand Down Expand Up @@ -321,7 +349,10 @@ Dataset

The ``Dataset`` module in ``Qlib`` aims to prepare data for model training and inferencing.

The motivation of this module is that we want to maximize the flexibility of of different models to handle data that are suitable for themselves. This module gives the model the rights to process their data in an unique way. For instance, models such as ``GBDT`` may work well on data that contains `nan` or `None` value, while neural networks such as ``MLP`` will break down on such data.
The motivation of this module is that we want to maximize the flexibility of of different models to handle data that are suitable for themselves. This module gives the model the flexibility to process their data in an unique way. For instance, models such as ``GBDT`` may work well on data that contains `nan` or `None` value, while neural networks such as ``MLP`` will break down on such data.

If user's model need process its data in a different way, user could implement his own ``Dataset`` class. If the model's
data processing is not special, ``DatasetH`` can be used directly.

The ``DatasetH`` class is the `dataset` with `Data Handler`. Here is the most important interface of the class:

Expand Down
49 changes: 7 additions & 42 deletions docs/component/model.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
.. _model:

============================================
Interday Model: Model Training & Prediction
Forecast Model: Model Training & Prediction
============================================

Introduction
===================

``Interday Model`` is designed to make the `prediction score` about stocks. Users can use the ``Interday Model`` in an automatic workflow by ``qrun``, please refer to `Workflow: Workflow Management <workflow.html>`_.
``Forecast Model`` is designed to make the `prediction score` about stocks. Users can use the ``Forecast Model`` in an automatic workflow by ``qrun``, please refer to `Workflow: Workflow Management <workflow.html>`_.

Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Interday Model`` can be used as an independent module also.
Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Forecast Model`` can be used as an independent module also.

Base Class & Interface
======================
Expand All @@ -18,52 +18,17 @@ Base Class & Interface

The base class provides the following interfaces:

- `__init__(**kwargs)`
- Initialization.

- `fit(self, dataset, **kwargs)`
- Train model.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. For more information about ``DatasetH``, users can refer to the related document: `Qlib Dataset <../component/data.html#dataset>`_.
The `dataset` is passed into the `model`'s method because there are some unique data preprocessing procedures for each, we want to give each model maximum flexibility to handle the data that is suitable for their own.
The following code example shows how to retrieve `x_train`, `y_train` and `w_train` from the `dataset`:

.. code-block:: Python

# get features and labels
df_train, df_valid = dataset.prepare(
["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
)
x_train, y_train = df_train["feature"], df_train["label"]
x_valid, y_valid = df_valid["feature"], df_valid["label"]

# get weights
try:
wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
except KeyError as e:
w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)

- `predict(self, dataset, **kwargs)`
- Predict test data.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
- Returns:
- Predic results with type: `pandas.Series`.

- `finetune(self, dataset, **kwargs)`
- Finetune the model.
- Parameter:
- `dataset`, ``Qlib``'s ``DatasetH`` type. The usage is similar to the example above.
.. autoclass:: qlib.model.base.Model
:members:

``Qlib`` also provides a base class `qlib.model.base.ModelFT <../reference/api.html#qlib.model.base.ModelFT>`_, which includes the method for finetuning the model.

For other interfaces such as `finetune`, please refer to `Model API <../reference/api.html#module-qlib.model.base>`_.

Example
==================

``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are treated as the baselines of ``Interday Model``. The following steps show how to run`` LightGBM`` as an independent module.
``Qlib``'s `Model Zoo` includes models such as ``LightGBM``, ``MLP``, ``LSTM``, etc.. These models are treated as the baselines of ``Forecast Model``. The following steps show how to run`` LightGBM`` as an independent module.

- Initialize ``Qlib`` with `qlib.init` first, please refer to `Initialization <../start/initialization.html>`_.
- Run the following code to get the `prediction score` `pred_score`
Expand Down
9 changes: 5 additions & 4 deletions docs/component/recorder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Qlib Recorder: Experiment Management

Introduction
===================
``Qlib`` contains an experiment management system named ``QlibRecorder``, which is designed to help users handle experiment and analysis results in an efficient way.
``Qlib`` contains an experiment management system named ``QlibRecorder``, which is designed to help users handle experiment and analyse results in an efficient way.

There are three components of the system:

Expand All @@ -34,8 +34,7 @@ Here is a general view of the structure of the system:
- Recorder 2
- ...
- ...

Currently, the components of this experiment management system are implemented using the machine learning platform: ``MLFlow`` (`link <https://mlflow.org/>`_).
This experiment management system defines a set of interface and provided a concrete implementation based on the machine learning platform: ``MLFlow`` (`link <https://mlflow.org/>`_).


Qlib Recorder
Expand Down Expand Up @@ -73,6 +72,8 @@ The ``Experiment`` class is solely responsible for a single experiment, and it w

For other interfaces such as `search_records`, `delete_recorder`, please refer to `Experiment API <../reference/api.html#experiment>`_.

``Qlib`` also provides a default ``Experiment``, which will be created and used under certain situations when users use the APIs such as `log_metrics` or `get_exp`. If the default ``Experiment`` is used, there will be related logged information when running ``Qlib``. Users are able to change the name of the default ``Experiment`` in the config file of ``Qlib`` or during ``Qlib``'s `initialization <../start/initialization.html#parameters>`_, which is set to be '`Experiment`'.

Recorder
===================

Expand All @@ -94,4 +95,4 @@ The ``RecordTemp`` class is a class that enables generate experiment results suc
- ``SigAnaRecord``: This class generates the `IC`, `ICIR`, `Rank IC` and `Rank ICIR` of the model.
- ``PortAnaRecord``: This class generates the results of `backtest`. The detailed information about `backtest` as well as the available `strategy`, users can refer to `Strategy <../component/strategy.html>`_ and `Backtest <../component/backtest.html>`_.

For more information about the APIs, please refer to `Record Template API <../reference/api.html#module-qlib.workflow.record_temp>`_.
For more information about the APIs, please refer to `Record Template API <../reference/api.html#module-qlib.workflow.record_temp>`_.
Loading