Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map FERC 714 XBRL and CSV IDs #3849

Merged
merged 3 commits into from
Sep 17, 2024
Merged

Map FERC 714 XBRL and CSV IDs #3849

merged 3 commits into from
Sep 17, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Sep 17, 2024

Overview

Closes #3846.

What problem does this address?
Creates a glue CSV mapping the IDs of FERC 714 respondents in XBRL and CSV data. This builds on the migration mapping noted in #3846.

The migrated data page from FERC notes the following:

Companies that did not have a CID prior to the migration have been assigned a CID that begins with R, i.e., a temporary RID. These RIDs will be replaced in future with the accurate CIDs and new datasets will be published.

However, some of the IDs beginning with C in the migrated data weren't found in the actual XBRL data, while respondents matching the names and locations were found with different respondent IDs. Thus, I manually reviewed the IDs for each respondent, matching based on name. Some quirks to note:

  • Unmatched respondents mostly occur due to mergers, splits, acquisitions, and companies that no longer exist.
  • All respondents are matched 1:1 from CSV to XBRL data. For instance, Western Area Power Admin - Upper Missouri-East and Western Area Power Administration - Upper Missouri West in the CSV data merge into Western Area Power Administration - Upper Great Plains Region in the XBRL data, but these records are not linked.
  • Where the migration data assigned an ID beginning with C that doesn't appear in the data, and no updated ID was found, I have left the ID in place in case it reappears in the XBRL data. All temporary IDs (beginning with R) were removed.

What did you change?

  • Added a new CSV to the pudl.package_data.glue module.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

@e-belfer e-belfer added the ferc714 Anything having to do with FERC Form 714 label Sep 17, 2024
@e-belfer e-belfer self-assigned this Sep 17, 2024
@e-belfer e-belfer changed the base branch from main to transform-714-xbrl September 17, 2024 19:27
@e-belfer
Copy link
Member Author

@aesharpe Think I should probably update docs somewhere along the way about the matching process, but let me know if you have thoughts about where in the docs a cleaned up version of these notes should live?

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@aesharpe
Copy link
Member

@aesharpe Think I should probably update docs somewhere along the way about the matching process, but let me know if you have thoughts about where in the docs a cleaned up version of these notes should live?

I think this should live in the "Notable Irregularities" section of the Data Source page for now. We don't have one for 714 yet, but either @cmgosnell or I was planning on adding one. I might make a separate PR branch for this that either of you can add to.

@e-belfer
Copy link
Member Author

@aesharpe Think I should probably update docs somewhere along the way about the matching process, but let me know if you have thoughts about where in the docs a cleaned up version of these notes should live?

I think this should live in the "Notable Irregularities" section of the Data Source page for now. We don't have one for 714 yet, but either @cmgosnell or I was planning on adding one. I might make a separate PR branch for this that either of you can add to.

Great, that makes sense to me! As this is documented in the PR description I'll go ahead and merge this, and we can pull from this to add to the PR.

@e-belfer e-belfer merged commit c1e66af into transform-714-xbrl Sep 17, 2024
16 of 17 checks passed
@e-belfer e-belfer deleted the xbrl-csv-714-ids branch September 17, 2024 19:43
@@ -0,0 +1,218 @@
respondent_id_ferc714,respondent_id_ferc714_xbrl,respondent_id_714_csv,Source,Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that respondent_id_714_csv doesn't follow the same naming convention as the other two ID columns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sheer oversight - sounds like @cmgosnell already fixed this in her branch though.

@aesharpe
Copy link
Member

@e-belfer you can add any future 714 docs changes to this PR #3850

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc714 Anything having to do with FERC Form 714
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Synchronize FERC 714 XBRL and CSV IDs and combine into one table
4 participants