Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use float_64 for gradient #2

Open
Stonesjtu opened this issue Jul 18, 2018 · 6 comments
Open

Why use float_64 for gradient #2

Stonesjtu opened this issue Jul 18, 2018 · 6 comments

Comments

@Stonesjtu
Copy link

Hi wang,
I'm just wondering why to convert the gradient Tensor into float64, I thought they might be just float32. And it should be more accurate than SGD required.

grad = p.grad.to(torch.device("cpu")).detach().numpy().astype(np.float64)

@hwang595
Copy link
Owner

Hey Kaiyu, @Stonesjtu. Thanks for pointing this out. There was an issue I faced when developing this prototype that np.float32 can't be fully converted to MPI.FLOAT and vice versa. It could be an potential issue in mpi4py, but I could be wrong. I haven't tested it for the current version combination of pytorch and mpi4py under the gradient compression setting, but I will do that ASAP and report the result under this thread.
If you want, you can raise a PR. Any contribution is highly appreciated.

Thanks!

@Stonesjtu
Copy link
Author

I have tested np.float32 without a problem. And I don't quite understand what fully converted means.

@hwang595
Copy link
Owner

hwang595 commented Jul 19, 2018

Sorry for being confusing @Stonesjtu. The issue I mentioned was related to this line, which I wrote for an old version where there wasn't any gradient compression strategy and each worker just send the raw gradient matrices as numpy array.

To send numpy array directly, mpi4py provides a series of APIs with capital character e.g. Isend, Irecv, and etc (http://mpi4py.scipy.org/docs/usrman/tutorial.html#point-to-point-communication), where users need to specify datatype in MPI e.g. MPI.FLOAT or MPI.DOUBLE (as I did in this line). The issue is, if .astype(np.float32) and MPI.FLOAT are specified, wrong data will be received on parameter server side. In this case, based on my test, only np.float64 with MPI.DOUBLE works. Please feel free to try it if you're curious.

However, all of the forgoing stuff is with respect to an old version. And you're right, the new version with gradient compression works with np.float32 without any problem. I already made changes on the master branch.

According to my test (on a cluster with 17 m4.2xlarge instances of AWS EC2, 1 parameter server + 16 workers), changing from np.float64 to np.float32 gains approximately 35% speedup on communication, and 11% speedup wrt per iteration runtime.

Thanks a lot for your contribution!

@Stonesjtu
Copy link
Author

So, will you try float16 to see the speedup gain? I think half precision is enough in most cases.

@hwang595
Copy link
Owner

hwang595 commented Jul 20, 2018

Actually, I think what's interesting is to add a --half-precision argument. To be more specific, when enabling half-precision, all computation in PyTorch side will be converted to HalfTensor (https://pytorch.org/docs/stable/tensors.html#torch.Tensor.half) and all gradient matrix in numpy will be then converted to np.float16. In that case, both computation and communication will scale better.

If that's what you're suggesting, then yes, I'm planning on it. Please feel free to do it if you want, any PR is appreciated.

@Stonesjtu
Copy link
Author

I do think simply transferring float16 helps a lot to reduce the communication overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants