Add benchmark report #463

artmoskvin · 2023-07-01T16:12:06Z

This PR addresses #240 by adding the benchmark report after running the evaluation step. Additionally, this report can be inserted in the benchmark/RESULTS.md file if needed. The insertion is a bit hacky now because it is based on certain assumptions around the results file structure. I believe there's a better way of doing it so any feedback is appreciated.

On a more general note, I was a bit confused by the difference between the Works and Perfect criteria. The current evaluation logic leaves Works empty if a user replies 'yes' to Perfect which is a bit misleading. I think if the code works it works and ideally we would measure that using tests which is probably part of the roadmap. Anyway, please take a look and let me know what y'all think 🙏

scripts/benchmark.py

AntonOsika · 2023-07-02T13:14:29Z

Thanks for the PR!

Suggested a small clarification to the print statement.

I agree with works defintion: It should be set to true if perfect is true.

Was unsure if I wanted to populate it if the user didn't explicitly set it, but now I think it should.

* Add benchmark report * Update scripts/benchmark.py --------- Co-authored-by: Artem Moskvin <artemm@spotify.com> Co-authored-by: Anton Osika <anton.osika@gmail.com>

captivus · 2024-01-25T18:37:42Z

This series of comments smells spammy. @MrDaVinC please respond to this comment to substantiate your comments.

captivus · 2024-01-25T18:53:13Z

This series of comments smells spammy. @MrDaVinC please respond to this comment to substantiate your comments.

User blocked.

Add benchmark report

18e7e28

AntonOsika reviewed Jul 2, 2023

View reviewed changes

scripts/benchmark.py Outdated Show resolved Hide resolved

Update scripts/benchmark.py

499b8ab

AntonOsika merged commit d4e9ba9 into gpt-engineer-org:main Jul 2, 2023
3 of 4 checks passed

artmoskvin deleted the aggregate-feedback branch July 2, 2023 21:21

AntonOsika added the benchmark label Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark report #463

Add benchmark report #463

artmoskvin commented Jul 1, 2023 •

edited

Loading

AntonOsika commented Jul 2, 2023

captivus commented Jan 25, 2024

captivus commented Jan 25, 2024

Add benchmark report #463

Add benchmark report #463

Conversation

artmoskvin commented Jul 1, 2023 • edited Loading

AntonOsika commented Jul 2, 2023

captivus commented Jan 25, 2024

captivus commented Jan 25, 2024

artmoskvin commented Jul 1, 2023 •

edited

Loading