-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: add **kwargs to index and aindex functions for custom vector_field support #26998
core: add **kwargs to index and aindex functions for custom vector_field support #26998
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to explain how to change the variable name.
Will need a unit test as well before this can be merged
@@ -198,6 +198,7 @@ def index( | |||
source_id_key: Union[str, Callable[[Document], str], None] = None, | |||
cleanup_batch_size: int = 1_000, | |||
force_update: bool = False, | |||
**kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use **
here to avoid confusion of which namespace kwargs belong (e.g., what if we need kwargs for delete at some point or else add other parameters to index in the future).
Instead can call is upsert_kwargs
. Please document which API whey will be passed to in the doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. I've implemented the changes as suggested:
- Replaced
**kwargs
with a specificupsert_kwargs: Optional[dict[str, Any]] = None
parameter. - Updated the docstring to clarify that these kwargs will be passed to the
add_documents
andaadd_documents
method of the VectorStore. - Added an example in the docstring to demonstrate how
upsert_kwargs
can be used, specifically mentioning thevector_field
use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added some unit tests for index
and aindex
with upsert_kwargs
.
docs_to_index, | ||
ids=uids, | ||
batch_size=batch_size, | ||
**(upsert_kwargs or {}), | ||
) | ||
elif isinstance(destination, DocumentIndex): | ||
destination.upsert(docs_to_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They also need to be added here -- I'll push a change in a bit
"""Test async indexing with upsert_kwargs parameter.""" | ||
mock_aadd_documents = AsyncMock() | ||
|
||
with patch.object(upserting_vector_store, "aadd_documents", mock_aadd_documents): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find usage of spy
here to be a bit easier on readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
Added
**kwargs
parameters to theindex
andaindex
functions inlibs/core/langchain_core/indexing/api.py
. This allows users to pass additional arguments to theadd_documents
andaadd_documents
methods, enabling the specification of a customvector_field
. For example, users can now usevector_field="embedding"
when indexing documents inOpenSearchVectorStore