Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA optimisation flags #484

Open
fwyzard opened this issue Jun 13, 2020 · 0 comments
Open

CUDA optimisation flags #484

fwyzard opened this issue Jun 13, 2020 · 0 comments

Comments

@fwyzard
Copy link

fwyzard commented Jun 13, 2020

CUDA 11 adds new optimisation flags we could try:

  • --dlink-time-opt / -dlto
    Perform link-time optimization of device code.
    Link-time optimization must be specified at both compile and link time; at compile time it stores high-level intermediate code, then at link time it links together and optimizes the intermediate code.If that intermediate is not found at link time then nothing happens.
    Intermediate code is also stored at compile time with the --gpu-code=lto_NN target. The options -dlto -arch=sm_NN will add a lto_NN target; if you want to only add a lto_NN target and not the compute_NN that -arch=sm_NN usually generates, use -arch=lto_NN.

  • --extra-device-vectorization
    This option enables more aggressive device code vectorization.

CUDA also has since this a while, which I'm not sure we ever tried (nor what it is supposed to do):

  • --extensible-whole-program / -ewp
    Do extensible whole program compilation of device code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant