Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ZScoreNorm processor bug #1398

Merged
merged 9 commits into from
Dec 30, 2022
Merged

Fix ZScoreNorm processor bug #1398

merged 9 commits into from
Dec 30, 2022

Conversation

SunsetWolf
Copy link
Collaborator

@SunsetWolf SunsetWolf commented Dec 23, 2022

Description

Because the version of numba is limited to 0.52.0, some old code(e.g. np.long) in numba can not adapt to the new version of numpy, resulting in CI failure, so update numba to the latest version to solve the CI problem.

@github-actions github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Dec 23, 2022
@SunsetWolf SunsetWolf added bug Something isn't working and removed waiting for triage Cannot auto-triage, wait for triage. labels Dec 23, 2022
@github-actions github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Dec 23, 2022
setup.py Outdated Show resolved Hide resolved
.github/workflows/test_qlib_from_source.yml Show resolved Hide resolved
qlib/data/dataset/processor.py Show resolved Hide resolved
@@ -361,7 +359,8 @@ def __init__(self, fields_group=None):

def __call__(self, df):
cols = get_group_columns(df, self.fields_group)
df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
df.index.astype(np.datetime64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

assert (df == ((origin_df - origin_df.mean()).div(origin_df.std()))).all().all()


def suite():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler implementation should achieve similar effects


def test_CSZScoreNorm(self):
st = """
2000-01-01,1,2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data does not have a instruments column.
It does not align with the desired format.

@@ -361,7 +367,7 @@ def __init__(self, fields_group=None):

def __call__(self, df):
cols = get_group_columns(df, self.fields_group)
df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
df[cols] = df[cols].groupby("datetime", group_keys=False).apply(lambda x: x.fillna(df[cols].mean()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't understand what is this for.

@@ -361,7 +368,7 @@ def __init__(self, fields_group=None):

def __call__(self, df):
cols = get_group_columns(df, self.fields_group)
df[cols] = df[cols].groupby("datetime").apply(lambda x: x.fillna(x.mean()))
df[cols] = df[cols].groupby("datetime", group_keys=False).apply(lambda x: x.fillna(x.mean()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, group_keys=False
Why is this necessary ?

min_val = np.nanmin(origin_df.values, axis=0)
max_val = np.nanmax(origin_df.values, axis=0)
origin_df.loc(axis=1)[origin_df.columns] = (origin_df.values - min_val) / (max_val - min_val)
assert (df.iloc[:, :-1] == origin_df.iloc[:, :-1]).all().all()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin_df["test"] remains 0 is also a important check

tests/test_processor.py Outdated Show resolved Hide resolved
@you-n-g you-n-g removed the waiting for triage Cannot auto-triage, wait for triage. label Dec 30, 2022
@you-n-g you-n-g merged commit 756bd0f into microsoft:main Dec 30, 2022
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* fix_ZScoreNorm_bug

* fix_CI_error

* fix_CI_error

* add_test_processor

* fix_pylint_error

* fix_some_error_and_optimize_code

* modify_terrible_code

* optimize_code

* optimize_code
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* fix_ZScoreNorm_bug

* fix_CI_error

* fix_CI_error

* add_test_processor

* fix_pylint_error

* fix_some_error_and_optimize_code

* modify_terrible_code

* optimize_code

* optimize_code
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* fix_ZScoreNorm_bug

* fix_CI_error

* fix_CI_error

* add_test_processor

* fix_pylint_error

* fix_some_error_and_optimize_code

* modify_terrible_code

* optimize_code

* optimize_code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants