Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KNearestNeighborsImputer #743

Closed
lars-reimann opened this issue May 8, 2024 · 7 comments · Fixed by #864
Closed

Add KNearestNeighborsImputer #743

lars-reimann opened this issue May 8, 2024 · 7 comments · Fixed by #864
Assignees
Labels
enhancement 💡 New feature or request lab Suitable for the lab released Included in a release team1

Comments

@lars-reimann
Copy link
Member

lars-reimann commented May 8, 2024

Is your feature request related to a problem?

We currently only have a basic imputer, but there are more sophisticated imputation strategies.

Desired solution

Add a KNearestNeighborsImputer that uses the KNNImputer of scikit-learn internally.

  • Superclass: TableTransformer
  • Constructor parameters:
    • neighbor_count: int,
    • column_names: str | list[str] | None = None (keyword-only). List of columns to transform, if None all columns passed in fit.
    • value_to_replace: float | str | None = None (keyword-only)
  • Attributes:
    • self._wrapped_transformer: sk_KNNImputer | None = None
  • fit:
    • Call _check_columns_exist to ensure columns to transform exist
    • Raise value error if row_count is 0
    • Create a new instance of the KNearestNeighborsImputer, don't mutate it in place
    • Create and fit an sk_KNNImputer and store it in _wrapped_transformer of the copied transformer
  • transform:
    • TransformerNotFittedError if transformer is not fitted

    • Call _check_columns_exist to ensure columns to transform exist

    • Transform with the _wrapped_transformer

Possible alternatives (optional)

No response

Screenshots (optional)

No response

Additional Context (optional)

No response

@lars-reimann lars-reimann added the enhancement 💡 New feature or request label May 8, 2024
@lars-reimann lars-reimann added the lab Suitable for the lab label May 28, 2024
@LIEeOoNn
Copy link
Contributor

is this a feat or how should we name the branch ?

@lars-reimann
Copy link
Member Author

The branch name is irrelevant. The PR type should be feat. See here for a list of PR types.

@LIEeOoNn
Copy link
Contributor

i was wondering if we should also check if the value_toreplace is in the col or not and raise an error if not or just let the user figure it out

@lars-reimann
Copy link
Member Author

I'd probably not raise an error, since the transformer can be applied to multiple tables. You may train the imputer on a table that does not contain the value_to_replace and then transform a table that does.

@LIEeOoNn
Copy link
Contributor

Do we have to add KNearestNeighborsImputer to this test_table_transformer.py would make sense,
which makes me ask myself if I need to write extra hash tests here when I only need to add one line of code at test_table_transformer.py
@lars-reimann

@lars-reimann
Copy link
Member Author

Do we have to add KNearestNeighborsImputer to this test_table_transformer.py would make sense, which makes me ask myself if I need to write extra hash tests here when I only need to add one line of code at test_table_transformer.py @lars-reimann

Yep, it's suffices to add the transformer to that file.

lars-reimann pushed a commit that referenced this issue Jul 19, 2024
## [0.27.0](v0.26.0...v0.27.0) (2024-07-19)

### Features

*  join ([#870](#870)) ([5764441](5764441)), closes [#745](#745)
* activation function for forward layer ([#891](#891)) ([5b5bb3f](5b5bb3f)), closes [#889](#889)
* add `ImageDataset.split` ([#846](#846)) ([3878751](3878751)), closes [#831](#831)
* add FunctionalTableTransformer ([#901](#901)) ([37905be](37905be)), closes [#858](#858)
* add InvalidFitDataError ([#824](#824)) ([487854c](487854c)), closes [#655](#655)
* add KNearestNeighborsImputer ([#864](#864)) ([fcdfecf](fcdfecf)), closes [#743](#743)
* add moving average plot ([#836](#836)) ([abcf68a](abcf68a))
* add RobustScaler ([#874](#874)) ([62320a3](62320a3)), closes [#650](#650) [#873](#873)
* add SequentialTableTransformer ([#893](#893)) ([e93299f](e93299f)), closes [#802](#802)
* add temporal operations ([#832](#832)) ([06eab77](06eab77))
* added 'histogram_2d' in TablePlotter  ([#903](#903)) ([4e65ba9](4e65ba9)), closes [#869](#869) [#798](#798)
* added from_str_to_temporal and continues prediction ([#767](#767)) ([35f468a](35f468a)), closes [#806](#806) [#765](#765) [#740](#740) [#773](#773)
* added GRU layer ([#845](#845)) ([d33cb5d](d33cb5d))
* Adds Dropout Layer ([#868](#868)) ([a76f0a1](a76f0a1)), closes [#848](#848)
* dark mode for plots ([#911](#911)) ([5447551](5447551)), closes [#798](#798)
* easily create a baseline model ([#811](#811)) ([8e1b995](8e1b995)), closes [#710](#710)
* get first cell with value other than `None` ([#904](#904)) ([5a0cdb3](5a0cdb3)), closes [#799](#799)
* hyperparameter optimization for fnn models ([#897](#897)) ([c1f66e5](c1f66e5)), closes [#861](#861)
* implement violin plots ([#900](#900)) ([9f5992a](9f5992a)), closes [#867](#867)
* plot decision tree ([#876](#876)) ([d3f81dc](d3f81dc)), closes [#856](#856)
* prediction no longer takes a time series dataset only table ([#838](#838)) ([762e5c2](762e5c2)), closes [#837](#837)
* raise if `remove_colums` is called with unknown column by default ([#852](#852)) ([8f78163](8f78163)), closes [#807](#807)
* regularization strength for logistic classifier ([#866](#866)) ([9f74e92](9f74e92)), closes [#750](#750)
* reorders parameters of RangeScaler and makes them keyword-only ([#847](#847)) ([2b82db7](2b82db7)), closes [#809](#809)
* replace seaborn with matplotlib for box_plot ([#863](#863)) ([4ef078e](4ef078e)), closes [#805](#805) [#849](#849)
* replaced seaborn with matplotlib for correlation_heatmap ([#850](#850)) ([d4680d4](d4680d4)), closes [#800](#800) [#849](#849)

### Bug Fixes

* **deps:** bump urllib3 from 2.2.1 to 2.2.2 ([#842](#842)) ([b81bcd6](b81bcd6)), closes [#3122](https://github.com/Safe-DS/Library/issues/3122) [#3363](https://github.com/Safe-DS/Library/issues/3363) [#3122](https://github.com/Safe-DS/Library/issues/3122) [#3363](https://github.com/Safe-DS/Library/issues/3363) [#3406](https://github.com/Safe-DS/Library/issues/3406) [#3398](https://github.com/Safe-DS/Library/issues/3398) [#3399](https://github.com/Safe-DS/Library/issues/3399) [#3396](https://github.com/Safe-DS/Library/issues/3396) [#3394](https://github.com/Safe-DS/Library/issues/3394) [#3391](https://github.com/Safe-DS/Library/issues/3391) [#3316](https://github.com/Safe-DS/Library/issues/3316) [#3387](https://github.com/Safe-DS/Library/issues/3387) [#3386](https://github.com/Safe-DS/Library/issues/3386)
* labels of correlation heatmap ([#894](#894)) ([a88a609](a88a609)), closes [#871](#871)
* make multi-processing in baseline models more consistent ([#909](#909)) ([fa24560](fa24560)), closes [#907](#907)

### Performance Improvements

* improved performance in various methods in `Image` and `ImageList` ([#879](#879)) ([134e7d8](134e7d8))
@lars-reimann
Copy link
Member Author

🎉 This issue has been resolved in version 0.27.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request lab Suitable for the lab released Included in a release team1
Projects
Status: ✔️ Done
Development

Successfully merging a pull request may close this issue.

4 participants