-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup DataConfig implementation #1187
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
} | ||
}, | ||
"pre_process_data_config": { | ||
"params": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need the params
key name? it would be better if like
"load_dataset_config": {
"data_name": "glue",
"subset": "mrpc",
"split": "validation"
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type
info is needless since the load_dataset_config
already contains the type info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need the
params
key name? it would be better if like"load_dataset_config": { "data_name": "glue", "subset": "mrpc", "split": "validation" },
I agree that params seems redundant. I wasn't sure, what would the config class look like since type
is required parameter and other are all optional. There is no strict list of what the params can be (think of custom data set/ data container implementation which can take any user arguments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the
type
info is needless since theload_dataset_config
already contains the type info
Hmm, I am unsure what you mean config already contains the type info. Please elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree params
nesting is not needed. But like Hitesh said, type
field is still needed if the user wants to override the default component type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the changes in this PR look good, I would like to merge this change before making the decorative params
change. It's modifying too many files, and I don't want to sit on them for too long. Merging will be very hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me! as you said, it's a cosmetic change that only looks different on the user side. In the backend, we need a way to hold the params anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the type
required? I don't see it in the example.
what I mean is not change the DataComponentConfig
and just convert the plain json dict to DataComponentConfig
in validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without that params is still required to collect only the parameters that are passed down to the creator function. I noticed that DataSet/DataContainer constructors fail if they are passed unknown arguments (including in kwargs). So, it might not actually be possible to remove the params entirely. If we did, we will still be doing custom include/exclude list to collect only the arguments that are passed down to the constructor (but then what are the other parameters for? discard?).
eac2ab0
to
e0ff9e8
Compare
* Removed DataConfig::params_config * Removed DataConfig::components/component_args All components specific parameters are now grouped in four separate objects: + load_dataset_config + pre_process_data_config + post_process_data_config + dataloader_config
Cleanup DataConfig implementation
All components specific parameters are now grouped in four separate objects:
Checklist before requesting a review
lintrunner -a
(Optional) Issue link