Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make fewer copies when building a string array #5503

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

mhilton
Copy link
Contributor

@mhilton mhilton commented Aug 13, 2024

Change the StringBuilder to only create an arrow builder when it becomes necessary. Whist the StringBuilder has only a single distinct value it will just record how long the array is. Creating an arrow.StringBuilder when a second distinct value is added.

Checklist

Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.

  • ✏️ Write a PR description, regardless of triviality, to include the value of this PR
  • 🔗 Reference related issues
  • 🏃 Test cases are included to exercise the new code
  • 🧪 If new packages are being introduced to stdlib, link to Working Group discussion notes and ensure it lands under experimental/
  • 📖 If language features are changing, ensure docs/Spec.md has been updated

Dear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.

"In the last few months a performance degradation has crept in when
processing flux. One possible cause was the changes that were made
to the way string arrays were built, which always built a full array
even if it would ultimately not be needed. This change ensures that
a full arrow array is only created when necessary.
@mhilton mhilton requested a review from a team as a code owner August 13, 2024 06:41
@mhilton mhilton requested a review from btasker August 13, 2024 06:42
array/builder.go Outdated
return
}
b.makeBuilder(buf)

}

// Append appends a string to the array being built. The input string
// will always be copied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: the second half of this comment was true prior to this change, but now isn't

@mhilton mhilton merged commit c2433e6 into master Aug 13, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants