Fix ZScoreNorm processor bug #1398

SunsetWolf · 2022-12-23T10:26:05Z

Description

Because the version of numba is limited to 0.52.0, some old code(e.g. np.long) in numba can not adapt to the new version of numpy, resulting in CI failure, so update numba to the latest version to solve the CI problem.

qlib/data/dataset/processor.py

setup.py

.github/workflows/test_qlib_from_source.yml

qlib/data/dataset/processor.py

you-n-g · 2022-12-29T00:21:38Z

qlib/data/dataset/processor.py

@@ -361,7 +359,8 @@ def __init__(self, fields_group=None):

    def __call__(self, df):
        cols = get_group_columns(df, self.fields_group)
-        df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
+        df.index.astype(np.datetime64)


What is this for?

you-n-g · 2022-12-29T00:26:17Z

tests/test_processor.py

+        assert (df == ((origin_df - origin_df.mean()).div(origin_df.std()))).all().all()
+
+
+def suite():


Why is this function required?

I think a simpler implementation should achieve similar effects

you-n-g · 2022-12-29T00:45:52Z

tests/test_processor.py

+
+    def test_CSZScoreNorm(self):
+        st = """
+        2000-01-01,1,2


The data does not have a instruments column.
It does not align with the desired format.

you-n-g · 2022-12-30T02:49:26Z

qlib/data/dataset/processor.py

@@ -361,7 +367,7 @@ def __init__(self, fields_group=None):

    def __call__(self, df):
        cols = get_group_columns(df, self.fields_group)
-        df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
+        df[cols] = df[cols].groupby("datetime", group_keys=False).apply(lambda x: x.fillna(df[cols].mean()))


I can't understand what is this for.

you-n-g · 2022-12-30T08:17:33Z

qlib/data/dataset/processor.py

@@ -361,7 +368,7 @@ def __init__(self, fields_group=None):

    def __call__(self, df):
        cols = get_group_columns(df, self.fields_group)
-        df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
+        df[cols] = df[cols].groupby("datetime", group_keys=False).apply(lambda x: x.fillna(x.mean()))


, group_keys=False
Why is this necessary ?

you-n-g · 2022-12-30T08:18:37Z

tests/test_processor.py

+        min_val = np.nanmin(origin_df.values, axis=0)
+        max_val = np.nanmax(origin_df.values, axis=0)
+        origin_df.loc(axis=1)[origin_df.columns] = (origin_df.values - min_val) / (max_val - min_val)
+        assert (df.iloc[:, :-1] == origin_df.iloc[:, :-1]).all().all()


origin_df["test"] remains 0 is also a important check

tests/test_processor.py

* fix_ZScoreNorm_bug * fix_CI_error * fix_CI_error * add_test_processor * fix_pylint_error * fix_some_error_and_optimize_code * modify_terrible_code * optimize_code * optimize_code

github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Dec 23, 2022

SunsetWolf added bug Something isn't working and removed waiting for triage Cannot auto-triage, wait for triage. labels Dec 23, 2022

fix_ZScoreNorm_bug

7037b30

github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Dec 23, 2022

you-n-g reviewed Dec 25, 2022

View reviewed changes

qlib/data/dataset/processor.py Outdated Show resolved Hide resolved

you-n-g reviewed Dec 26, 2022

View reviewed changes

setup.py Outdated Show resolved Hide resolved

.github/workflows/test_qlib_from_source.yml Show resolved Hide resolved

SunsetWolf added 4 commits December 26, 2022 18:08

fix_CI_error

5f9801f

fix_CI_error

08f5319

add_test_processor

057481f

fix_pylint_error

4e42910

you-n-g reviewed Dec 29, 2022

View reviewed changes

fix_some_error_and_optimize_code

8543940

you-n-g reviewed Dec 30, 2022

View reviewed changes

SunsetWolf added 2 commits December 30, 2022 11:52

modify_terrible_code

140b737

optimize_code

7fcda4c

you-n-g reviewed Dec 30, 2022

View reviewed changes

optimize_code

5adc5b0

you-n-g removed the waiting for triage Cannot auto-triage, wait for triage. label Dec 30, 2022

you-n-g merged commit 756bd0f into microsoft:main Dec 30, 2022

This was referenced Dec 30, 2022

fix typo, staticmethod etc. #1402

Merged

[DDG-DA] Update crowd-sourced data results #1405

Merged

SunsetWolf mentioned this pull request Dec 30, 2022

Alpha360 data preprocess does not work. Problem may come from ZScoreNorm #1306

Closed

you-n-g mentioned this pull request Dec 31, 2022

Fix ZScoreNorm&MinMaxNorm #1355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ZScoreNorm processor bug #1398

Fix ZScoreNorm processor bug #1398

SunsetWolf commented Dec 23, 2022 •

edited

Loading

you-n-g Dec 29, 2022

you-n-g Dec 29, 2022

you-n-g Dec 29, 2022

you-n-g Dec 29, 2022

you-n-g Dec 30, 2022

you-n-g Dec 30, 2022

you-n-g Dec 30, 2022

		assert (df == ((origin_df - origin_df.mean()).div(origin_df.std()))).all().all()


		def suite():

Fix ZScoreNorm processor bug #1398

Fix ZScoreNorm processor bug #1398

Conversation

SunsetWolf commented Dec 23, 2022 • edited Loading

Description

you-n-g Dec 29, 2022

Choose a reason for hiding this comment

you-n-g Dec 29, 2022

Choose a reason for hiding this comment

you-n-g Dec 29, 2022

Choose a reason for hiding this comment

you-n-g Dec 29, 2022

Choose a reason for hiding this comment

you-n-g Dec 30, 2022

Choose a reason for hiding this comment

you-n-g Dec 30, 2022

Choose a reason for hiding this comment

you-n-g Dec 30, 2022

Choose a reason for hiding this comment

SunsetWolf commented Dec 23, 2022 •

edited

Loading