-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FERC 714: Integrate the XBRL data Respondent ID table #3857
base: transform-714-xbrl
Are you sure you want to change the base?
Conversation
@@ -175,27 +175,45 @@ | |||
"""Mapping between standardized time offset codes and canonical timezones.""" | |||
|
|||
EIA_CODE_FIXES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of converting all of these code fixes into the pudl-derived respondent id-based fixes, I kept the sourcey-ness. This is mostly to enable checking things at each stage - primarily ensure_eia_code_uniqueness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
@@ -293,6 +317,26 @@ def _assign_respondent_id_ferc714( | |||
return df | |||
|
|||
|
|||
def _fillna_respondent_id_ferc714_source( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually also necessary in the hourly table as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth checking whether it's necessary for the annual table too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly non-blocking comments or questions!
@@ -175,27 +175,45 @@ | |||
"""Mapping between standardized time offset codes and canonical timezones.""" | |||
|
|||
EIA_CODE_FIXES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
"xbrl": { | ||
"entity_id": "respondent_id_ferc714_xbrl", | ||
"respondent_legal_name": "respondent_name_ferc714", | ||
"respondent_identification_code": "eia_code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for sure eia code? seems fishy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed the csv version had eia_code so it makes sense that this is too.
# use the source utility ID column to get a unique map and for merging | ||
resp_id_col = f"respondent_id_ferc714_{source}" | ||
resp_map_series = respondent_map_ferc714.dropna(subset=[resp_id_col]).set_index( | ||
"respondent_id_ferc714" | ||
)[resp_id_col] | ||
|
||
df[resp_id_col] = df[resp_id_col].fillna( | ||
df["respondent_id_ferc714"].map(resp_map_series) | ||
) | ||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how this is working. The _assign_respondent_id_ferc714
function maps the respondent_id_ferc714
column and then this column appears to work backwards from that to map on missing respondent_id_ferc714_source
values. How is that possible? If there is no respondent_id_ferc714_source
to begin with how can we map a respondent_id_ferc714
value onto it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not sure I understand why this is important when we drop the respondent id source columns anyways?
@@ -293,6 +317,26 @@ def _assign_respondent_id_ferc714( | |||
return df | |||
|
|||
|
|||
def _fillna_respondent_id_ferc714_source( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth checking whether it's necessary for the annual table too
TODO: rip this out. enforce_schema happens via the io_managers now. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's stopping us from removing it now?
eia_code and all eia_codes that are actually the respondent_id_ferc714_xbrl | ||
are nulled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the respondent_id_ferc714_xbrl
and eia_code
are the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm looks like no because they start with C!
) | ||
xbrl.loc[code_is_respondent_id_mask, "eia_code"] = pd.NA | ||
|
||
# lets null out some of the eia_code's from XBRL that we've manually culled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By manually culled do you mean that these IDs are not the actual eia_code?
|
||
@staticmethod | ||
def convert_into_static_table_xbrl(xbrl: pd.DataFrame) -> pd.DataFrame: | ||
"""Convert this annually reported table into a skinner, static table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean skinnier?
xbrl.groupby(["respondent_id_ferc714_xbrl"])[ # noqa: PD101 | ||
["respondent_name_ferc714"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible there are some different spellings of the respondent name that would cause this to look like a 1:1 ratio when it's not?
def condense_into_one_source_table(df): | ||
"""Condense the CSV and XBRL records together into one record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking at all, but in the other two tables we just put this code directly into the run
function instead of creating it's own function wrapper. NBD, but it would be good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OR we could very easily make this a function vs. a class method because I think it's the same for all tables. We could just have a column parameter or something.
Overview
Closes #3839 and #3858
What problem does this address?
What did you change?
There were two main threads that needed pulling to get this table updates:
eia_code
's we needed to clean up, but besides that it was static. I then converted the XBRL data into a static table, then I concat-ed the tables and checked the static-ness again.spot_fix_eia_codes
&EIA_CODE_FIXES
still todo that I'd rather finish in
transform-714-xbrl
3842:Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list