Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another strong reduce of concatenation for a small optimization (-4% inference time) #1130

Merged
merged 4 commits into from
Sep 19, 2019

Conversation

pommedeterresautee
Copy link
Contributor

@pommedeterresautee pommedeterresautee commented Sep 19, 2019

Reorganize embeddings + add some padding such a way we have only one call to concat (and no stack operation). Padding is preallocated to limit memory allocation. Get a constant 1 second inference time reduction on CONLL 2003 on 2080TI (25 -> 24s).
Had no measurable effect on my French dataset.

Good thing, the code is easier to read :-)

Nb: not related but I made a mistake in my measures... unfortunately memory transfer is not the main remaining bottleneck, I measured some synchronization operation which was happening before memory transfer. now I am using ;CUDA_LAUNCH_BLOCKING=1 before using cProfile instead of trying to be smart and calling synchronize manually here and there... Another thing, little functions called million of times appears slower than they are with the profiler...

@alanakbik
Copy link
Collaborator

Thanks - looks good!

@alanakbik
Copy link
Collaborator

👍

1 similar comment
@yosipk
Copy link
Collaborator

yosipk commented Sep 19, 2019

👍

@yosipk yosipk merged commit 8d65cc6 into flairNLP:master Sep 19, 2019
@pommedeterresautee pommedeterresautee deleted the concat_reduce branch September 19, 2019 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants