Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git diff-style approach for improve function #1005

Merged
merged 69 commits into from
Feb 18, 2024

Conversation

similato87
Copy link
Collaborator

Git diff-style approach for improve function

Overview

This pull request brings substantial enhancements to the improve function within the gpt-engineer project, incorporating a Git diff-style approach for parsing chats from LLMs. The modifications encompass the addition of a new preprompt file tailored for the improve function, the introduction of a new class named "Diff.py" designed to emulate Git diffs for validation and correction purposes, updates to the chat_to_files.py and steps.py files located in the core directory, and an extensive set of test files and scenarios in the testing suite.

Example of git diff

Example of git diff on an existing file:

```diff
--- example.txt
+++ example.txt
@@ -6,3 +6,4 @@
     line content A
     line content B
+    new line added
-    original line X
+    modified line X with changes
@@ -26,4 +27,5 @@
         condition check:
-            action for condition A
+            if certain condition is met:
+                alternative action for condition A

Motivation

The existing approach, largely inspired by aider, involves using natural language prompts with the --improve flag. This method, which requires the LLM to provide improvements in edit blocks, has been somewhat error-prone, as evidenced in issues #721, #814, and #841. The primary challenge has been ensuring the HEAD part of the edit block matches the code, or effectively employing intelligent heuristics when no exact match is found.

In response, we're proposing an alternative: adopting the classic diff syntax for edits. This method uniquely identifies the edit location using the file name and line number, facilitating precise modifications. To implement this, we've added line numbers in the Code class and updated the to_chat method to mirror the diff syntax format.

Expectation

The integration of a Git diff-style parser significantly mitigates parsing issues. The introduction of enhanced validation and correction functionalities within the parsing segment ensures that common errors stemming from LLMs are rectified, allowing only legitimate modifications to be implemented in the user's files. The utilization of a succinct preprompt alongside robust validation mechanisms contributes to reduced token consumption and improved performance, particularly in the refinement of complex and extensive code files.

similato87 and others added 30 commits January 14, 2024 00:36
…o diff-syntax-for-improve-command

� Conflicts:
�	gpt_engineer/preprompts/improve
@similato87 similato87 self-assigned this Feb 11, 2024
@viborc viborc mentioned this pull request Feb 15, 2024
@similato87
Copy link
Collaborator Author

@AntonOsika, @ATheorell, @captivus, @TheoMcCabe, @viborc

I am reaching out to request your thorough review and feedback on PR #1005 for the gpt-engineer project. This pull request represents a significant advancement in our code improvement process, implementing a Git diff-style approach to enhance the improve function. Many thanks to @ATheorell for invaluable guidance and assistance throughout the substantial changes made in this PR.

Key Changes:

  1. Update of preprompt File:

    • Introduction of a new preprompt file to provide clear guidance and context for code improvements.
    • The preprompt file outlines objectives, requirements, and expected outcomes to ensure alignment with project goals.
    • The new pre-prompts file is expected to dramatically improve the performance of transferring code changes from LLM to the user codebase.
  2. Implementation of Diff.py Class:

    • The addition of a "Diff.py" class emulates Git differentials, allowing for structured and standardized code presentation.
    • The "Diff.py" class enhances transparency and comprehensibility by providing a familiar format for reviewing code changes.
    • A robust validation and correction function capable of validating and correcting change instructions from LLMs, addressing common errors such as non-matching lines or incorrect change types.
  3. Updates to Core Files:

    • Updates to core files reflect a thorough integration of the Git diff-style approach across various aspects of the codebase.
  4. Inclusion of Robust Test Suite:

    • Tests include the addition of a comprehensive test suite to verify the functionality and reliability of the introduced changes.
    • Our exhaustive tests encompass a wide range of tasks, including improving long files, enhancing non-Python files, correcting error instructions from LLMs, and beyond.

Example

Enhanced Efficiency for Bulk Changes

This PR introduces significant enhancements, allowing us to request improvements from the LLMS seamlessly. Previously, encountering the improve function often led to various exceptions. However, the robustness of this feature has been greatly enhanced. One of the most significant improvements is the ability to make a lot of changes to non-python complex files in a single session.

Visual Demonstrations:

To provide clarity on these enhancements, here are some visual representations:

  1. Original Files in PS1 (Over 600 lines):

    Original File

  2. LLMs Chat (More than 30 parts):

    Chat from LLMs

  3. Log for Successful Parsing Changes:

    Log for Successful Parsing Changes

Request for Review:

Your insights and expertise are invaluable in ensuring the effectiveness and quality of these enhancements. Please review and feedback on this PR to help identify any areas for improvement or optimization.

Thank you for your time and attention to this matter.

Best regards,
Talion

@similato87 similato87 marked this pull request as ready for review February 17, 2024 20:59
@ATheorell ATheorell merged commit a40165e into main Feb 18, 2024
3 of 4 checks passed
@similato87 similato87 deleted the diff-syntax-for-improve-command branch February 25, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants