Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark report #463

Merged
merged 2 commits into from
Jul 2, 2023

Conversation

artmoskvin
Copy link
Contributor

@artmoskvin artmoskvin commented Jul 1, 2023

This PR addresses #240 by adding the benchmark report after running the evaluation step. Additionally, this report can be inserted in the benchmark/RESULTS.md file if needed. The insertion is a bit hacky now because it is based on certain assumptions around the results file structure. I believe there's a better way of doing it so any feedback is appreciated.

On a more general note, I was a bit confused by the difference between the Works and Perfect criteria. The current evaluation logic leaves Works empty if a user replies 'yes' to Perfect which is a bit misleading. I think if the code works it works and ideally we would measure that using tests which is probably part of the roadmap. Anyway, please take a look and let me know what y'all think 🙏

scripts/benchmark.py Outdated Show resolved Hide resolved
@AntonOsika
Copy link
Collaborator

Thanks for the PR!

Suggested a small clarification to the print statement.

I agree with works defintion: It should be set to true if perfect is true.

Was unsure if I wanted to populate it if the user didn't explicitly set it, but now I think it should.

@AntonOsika AntonOsika merged commit d4e9ba9 into gpt-engineer-org:main Jul 2, 2023
3 of 4 checks passed
@artmoskvin artmoskvin deleted the aggregate-feedback branch July 2, 2023 21:21
70ziko pushed a commit to 70ziko/gpt-engineer that referenced this pull request Oct 25, 2023
* Add benchmark report

* Update scripts/benchmark.py

---------

Co-authored-by: Artem Moskvin <artemm@spotify.com>
Co-authored-by: Anton Osika <anton.osika@gmail.com>
@captivus
Copy link
Collaborator

This series of comments smells spammy. @MrDaVinC please respond to this comment to substantiate your comments.

@captivus
Copy link
Collaborator

This series of comments smells spammy. @MrDaVinC please respond to this comment to substantiate your comments.

User blocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants