Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for Hunflair2 release #3410

Merged
merged 24 commits into from
Apr 5, 2024
Merged

Conversation

mariosaenger
Copy link
Collaborator

This PR updates the documentation for HunFlair lifting it to new release. More specifically the PR contains:

  • Documentation pages highlighting the new features and usage of HunFlair2
  • Minor adaptations to smooth user experience
  • Integration of warning messages if using HunFlair (version 1) models

@@ -260,6 +260,14 @@ def _fetch_model(model_name) -> str:

cache_dir = Path("models")
if model_name in model_map:
if model_name.startswith("hunflair") or model_name == "bioner":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hunflair2" also starts with "hunflair", so I think this warning would always be printed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. However, HunFlair2 will not be loaded as MultitaskModel. I fix it anyway.

@@ -781,6 +781,14 @@ def _fetch_model(model_name) -> str:
elif model_name in hu_model_map:
model_path = cached_path(hu_model_map[model_name], cache_dir=cache_dir)

if model_name.startswith("hunflair"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. However, HunFlair2 will not be loaded as SequenceTaggerModel. I fix it anyway.

@@ -648,6 +649,8 @@ def p(text: str) -> str:
emb = emb / torch.norm(emb)
dense_embeddings.append(emb.cpu().numpy())
sent.clear_embeddings()

# empty cuda cache if device is a cuda device
if flair.device.type == "cuda":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg-wbi Is this really required?

# Sanity conversion: if flair.device was set as a string, convert to torch.device
if isinstance(flair.device, str):
flair.device = torch.device(flair.device)

if flair.device.type == "cuda":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sg-wbi Is this really required?

Copy link
Collaborator

@alanakbik alanakbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, thanks for adding this and thanks for your patience! There are some smaller points that we will address with follow-up PRs.

One question: is the manual deleting of the cuda cache really necessary?

@sg-wbi
Copy link
Collaborator

sg-wbi commented Apr 5, 2024

Great thanks! I am not sure either. @helpmefindaname / @mariosaenger can you remember why this was added?

@alanakbik alanakbik merged commit 223f346 into master Apr 5, 2024
1 check passed
@alanakbik alanakbik deleted the hunflair2-release branch April 5, 2024 10:50
@helpmefindaname
Copy link
Collaborator

Hi @alanakbik @sg-wbi
The GPU cache clear is done after each batch, as otherwise we would get Memory errors if the embeddings of the whole dataset is larger than the memory available on the GPU (besides the needs for the Model).
Since the search is always performed on CPU, we don't need to keep the embeddings cached on the GPU, hence there is no big advantage of not deleting it.

@alanakbik
Copy link
Collaborator

Alright, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants