Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scores for FDatairregular objects as described in #609 #610

Merged
merged 9 commits into from
Jul 5, 2024

Conversation

pcuestas
Copy link
Contributor

@pcuestas pcuestas commented Apr 1, 2024

This pull request depends on #608 (integrating FDataIrregular objects is needed to implement the scores).

As explained in #609 , mean_absolute_error, mean_absolute_percentage_error, mean_squared_error and mean_squared_log_error have been implemented in the case when both y_true and y_pred are FDataIrregular objects.

Test cases have been included to ensure the same score is obtained if the FDataIrregular objects are obtained from FDataGrid's.

@pcuestas pcuestas requested a review from vnmabus April 1, 2024 17:59
score: FDataIrregular,
squared: bool = True,
weights: NDArrayFloat | None = None,
) -> float:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
DAR201 Missing "Returns" in Docstring: - return

@@ -554,6 +605,23 @@
return _multioutput_score_grid(error, multioutput)


@mean_absolute_percentage_error.register # type: ignore[attr-defined, misc]
def _mean_absolute_percentage_error_fdatairregular(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS118 Found too long name: _mean_absolute_percentage_error_fdatairregular > 45

epsilon = np.finfo(np.float64).eps

if np.any(np.abs(y_true.values) < epsilon):
warnings.warn('Zero denominator', RuntimeWarning)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
B028 No explicit stacklevel argument found. The warn method from the warnings module uses a stacklevel of 1 by default. This will only show a stack trace for the line on which the warn method is called. It is therefore recommended to use a stacklevel of 2 or greater to provide more information to the user.

@@ -461,3 +469,101 @@ def test_negative_msle(self) -> None:
y_true_grid,
y_pred_grid,
)


############### Test irregular data scoring ####################
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E266 too many leading '#' for block comment

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 86.67%. Comparing base (73161f7) to head (a5b7617).

❗ Current head a5b7617 differs from pull request most recent head 4bd8713. Consider uploading reports for the commit 4bd8713 to get more accurate results

Files Patch % Lines
skfda/misc/scoring.py 85.71% 4 Missing ⚠️
skfda/tests/test_scoring.py 93.75% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #610      +/-   ##
===========================================
+ Coverage    86.65%   86.67%   +0.02%     
===========================================
  Files          156      156              
  Lines        13322    13380      +58     
===========================================
+ Hits         11544    11597      +53     
- Misses        1778     1783       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The integral of the score is normalized because each integral is divided by
the length of the curve's domain.

If the score is vector-valued, then the mean of each codimension integral
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want? Is what we do for the other types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the question is regarding whether to divide by the length of the curve's domain or by the length of the FDataIrregular object's domain. This is the only difference that there is between the results of FDataGrid scores and the FDataIrregular that I implemented. As I said in #609, I think that dividing by each curve's domain length is more accurate, as the integral of that curve is being made only taking into account its particular domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant the treatment of vector-valued functions, but you also raised an interesting point that I didn't notice, and maybe we should discuss in the meeting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering the initial question, then: yes, for other types we also take the mean of each codimension integral in the case of vector-valued functions. I think it is a reasonable design decision.

skfda/misc/scoring.py Show resolved Hide resolved
skfda/misc/scoring.py Outdated Show resolved Hide resolved
skfda/misc/scoring.py Show resolved Hide resolved
skfda/tests/test_scoring.py Outdated Show resolved Hide resolved
skfda/tests/test_scoring.py Outdated Show resolved Hide resolved
skfda/tests/test_scoring.py Outdated Show resolved Hide resolved
@pcuestas pcuestas requested a review from vnmabus April 12, 2024 10:31
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge it for now, as the only remaining issue is the length we use in the division, and there are other PRs waiting for this to be merged.

I propose to keep the associated issue #609 open and mention in it explicitly the problem with the quotient lengths, to be solved in the future.

@pcuestas
Copy link
Contributor Author

pcuestas commented Jun 30, 2024

I'll explain the problem with the quotient lengths in #609.

Related to said problem is the design (or definition) of the integral of discretized functional observations (regular or irregular) when the grid endpoints do not coincide with the domain's. I will create another issue to discuss this (edit: this is the issue: #619).

@vnmabus vnmabus merged commit d19e1bd into develop Jul 5, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants