Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: repo map #496

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

feat: repo map #496

wants to merge 4 commits into from

Conversation

yetone
Copy link
Owner

@yetone yetone commented Sep 3, 2024

lua/avante/utils/init.lua Outdated Show resolved Hide resolved
lua/avante/utils/init.lua Outdated Show resolved Hide resolved
lua/avante/api.lua Outdated Show resolved Hide resolved
@yetone yetone force-pushed the feat/repo-map branch 5 times, most recently from 181bddb to 866f6f3 Compare September 4, 2024 07:37
lua/avante/init.lua Outdated Show resolved Hide resolved
@yetone
Copy link
Owner Author

yetone commented Sep 4, 2024

This PR is still in rapid iteration and is not yet ready for review.

@yetone yetone force-pushed the feat/repo-map branch 11 times, most recently from 7644a37 to 910c397 Compare September 4, 2024 17:22
@jmmarotta
Copy link

You could probably use the aider queries for treesitter.

@aarnphm
Copy link
Collaborator

aarnphm commented Sep 5, 2024

we are iterating a lot atm, and will consider refactor the query later.

@yetone
Copy link
Owner Author

yetone commented Sep 8, 2024

You could probably use the aider queries for treesitter.您可能可以使用树木管理员的辅助查询

aider's queries return all class/function definitions in the file, but I only need the externally exposed classes/functions (similar to a C header file), so their code doesn't suit my requirements.

@yetone yetone force-pushed the feat/repo-map branch 3 times, most recently from f347f5a to a431056 Compare September 8, 2024 11:05
@yetone
Copy link
Owner Author

yetone commented Sep 14, 2024

Here's an update for everyone: Actually, this PR was completed a long time ago. The functionality tests are all normal and the results are very good, but it consumes a large number of tokens (to test this PR, my Anthropic quota has already been exhausted 😢). We are currently considering how to optimize this aspect.

@SchnozzleCat
Copy link

Here's an update for everyone: Actually, this PR was completed a long time ago. The functionality tests are all normal and the results are very good, but it consumes a large number of tokens (to test this PR, my Anthropic quota has already been exhausted 😢). We are currently considering how to optimize this aspect.

Is the consumed token amount similar to what is consumed when using something like Aider (and adding all files in a project)?

@jmmarotta
Copy link

jmmarotta commented Sep 17, 2024

Is the consumed token amount similar to what is consumed when using something like Aider (and adding all files in a project)?

The core issue seems to be that the current implementation is using the entire repomap instead of focusing on the most relevant tags. A potential solution would involve finding a way to rank the tags produced by tree-sitter.

In Python ecosystems, libraries like NetworkX (used in aider) can represent these tags as nodes in a graph and then apply algorithms like PageRank to rank them effectively.

@aarnphm
Copy link
Collaborator

aarnphm commented Sep 17, 2024

afaik nvim has binding to python, but it is kinda slow.

Maybe we can do reranking through colbert from rust/c++ or sth.

@jmmarotta
Copy link

Pagerank might work better than colbert as it's well-suited for treesitter's tag structure, but using rust/c++ is a great idea.

petgraph might work well

@aarnphm
Copy link
Collaborator

aarnphm commented Sep 17, 2024

petgraph might work well

Thanks for the knowledge sharing, will take a look into this

@yetone
Copy link
Owner Author

yetone commented Sep 18, 2024

Is the consumed token amount similar to what is consumed when using something like Aider (and adding all files in a project)?

The core issue seems to be that the current implementation is using the entire repomap instead of focusing on the most relevant tags. A potential solution would involve finding a way to rank the tags produced by tree-sitter.

In Python ecosystems, libraries like NetworkX (used in aider) can represent these tags as nodes in a graph and then apply algorithms like PageRank to rank them effectively.

In my local tests, aider's repo map tends to be inaccurate, bringing in unrelated files while ignoring related ones.

@jmmarotta
Copy link

jmmarotta commented Sep 18, 2024

In my local tests, aider's repo map tends to be inaccurate, bringing in unrelated files while ignoring related ones.

PageRank likely overweights common utilities. For a more localized context BFS might be best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants