Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add **kwargs to index and aindex functions for custom vector_field support #26998

Merged
merged 14 commits into from
Oct 7, 2024

Conversation

Jalmeida1994
Copy link
Contributor

@Jalmeida1994 Jalmeida1994 commented Sep 30, 2024

Added **kwargs parameters to the index and aindex functions in libs/core/langchain_core/indexing/api.py. This allows users to pass additional arguments to the add_documents and aadd_documents methods, enabling the specification of a custom vector_field. For example, users can now use vector_field="embedding" when indexing documents in OpenSearchVectorStore

Copy link

vercel bot commented Sep 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Oct 7, 2024 6:37pm

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. Ɑ: core Related to langchain-core labels Sep 30, 2024
@Jalmeida1994 Jalmeida1994 changed the title community: add **kwargs to index and aindex functions for custom vector_field support core: add **kwargs to index and aindex functions for custom vector_field support Sep 30, 2024
@eyurtsev eyurtsev added the needs test PR needs to be updated with tests label Sep 30, 2024
@eyurtsev eyurtsev self-assigned this Sep 30, 2024
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain how to change the variable name.

Will need a unit test as well before this can be merged

@@ -198,6 +198,7 @@ def index(
source_id_key: Union[str, Callable[[Document], str], None] = None,
cleanup_batch_size: int = 1_000,
force_update: bool = False,
**kwargs: Any,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use ** here to avoid confusion of which namespace kwargs belong (e.g., what if we need kwargs for delete at some point or else add other parameters to index in the future).

Instead can call is upsert_kwargs. Please document which API whey will be passed to in the doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I've implemented the changes as suggested:

  1. Replaced **kwargs with a specific upsert_kwargs: Optional[dict[str, Any]] = None parameter.
  2. Updated the docstring to clarify that these kwargs will be passed to the add_documents and aadd_documents method of the VectorStore.
  3. Added an example in the docstring to demonstrate how upsert_kwargs can be used, specifically mentioning the vector_field use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added some unit tests for index and aindex with upsert_kwargs.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Oct 1, 2024
docs_to_index,
ids=uids,
batch_size=batch_size,
**(upsert_kwargs or {}),
)
elif isinstance(destination, DocumentIndex):
destination.upsert(docs_to_index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They also need to be added here -- I'll push a change in a bit

"""Test async indexing with upsert_kwargs parameter."""
mock_aadd_documents = AsyncMock()

with patch.object(upserting_vector_store, "aadd_documents", mock_aadd_documents):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find usage of spy here to be a bit easier on readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Oct 7, 2024
@eyurtsev eyurtsev merged commit 780ce00 into langchain-ai:master Oct 7, 2024
85 checks passed
@Jalmeida1994 Jalmeida1994 deleted the add-kwargs-to-index-api branch October 8, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: core Related to langchain-core lgtm PR looks good. Use to confirm that a PR is ready for merging. needs test PR needs to be updated with tests size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants