Why use float_64 for gradient #2

Stonesjtu · 2018-07-18T07:02:24Z

Hi wang,
I'm just wondering why to convert the gradient Tensor into float64, I thought they might be just float32. And it should be more accurate than SGD required.

ps_pytorch/src/distributed_worker.py

Line 258 in 89a1cfa

grad = p.grad.to(torch.device("cpu")).detach().numpy().astype(np.float64)

The text was updated successfully, but these errors were encountered:

hwang595 · 2018-07-19T09:56:29Z

Hey Kaiyu, @Stonesjtu. Thanks for pointing this out. There was an issue I faced when developing this prototype that np.float32 can't be fully converted to MPI.FLOAT and vice versa. It could be an potential issue in mpi4py, but I could be wrong. I haven't tested it for the current version combination of pytorch and mpi4py under the gradient compression setting, but I will do that ASAP and report the result under this thread.
If you want, you can raise a PR. Any contribution is highly appreciated.

Thanks!

Stonesjtu · 2018-07-19T11:19:58Z

I have tested np.float32 without a problem. And I don't quite understand what fully converted means.

hwang595 · 2018-07-19T18:25:51Z

Sorry for being confusing @Stonesjtu. The issue I mentioned was related to this line, which I wrote for an old version where there wasn't any gradient compression strategy and each worker just send the raw gradient matrices as numpy array.

To send numpy array directly, mpi4py provides a series of APIs with capital character e.g. Isend, Irecv, and etc (http://mpi4py.scipy.org/docs/usrman/tutorial.html#point-to-point-communication), where users need to specify datatype in MPI e.g. MPI.FLOAT or MPI.DOUBLE (as I did in this line). The issue is, if .astype(np.float32) and MPI.FLOAT are specified, wrong data will be received on parameter server side. In this case, based on my test, only np.float64 with MPI.DOUBLE works. Please feel free to try it if you're curious.

However, all of the forgoing stuff is with respect to an old version. And you're right, the new version with gradient compression works with np.float32 without any problem. I already made changes on the master branch.

According to my test (on a cluster with 17 m4.2xlarge instances of AWS EC2, 1 parameter server + 16 workers), changing from np.float64 to np.float32 gains approximately 35% speedup on communication, and 11% speedup wrt per iteration runtime.

Thanks a lot for your contribution!

Stonesjtu · 2018-07-20T15:31:37Z

So, will you try float16 to see the speedup gain? I think half precision is enough in most cases.

hwang595 · 2018-07-20T19:03:50Z

Actually, I think what's interesting is to add a --half-precision argument. To be more specific, when enabling half-precision, all computation in PyTorch side will be converted to HalfTensor (https://pytorch.org/docs/stable/tensors.html#torch.Tensor.half) and all gradient matrix in numpy will be then converted to np.float16. In that case, both computation and communication will scale better.

If that's what you're suggesting, then yes, I'm planning on it. Please feel free to do it if you want, any PR is appreciated.

Stonesjtu · 2018-07-23T13:15:12Z

I do think simply transferring float16 helps a lot to reduce the communication overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use float_64 for gradient #2

Why use float_64 for gradient #2

Stonesjtu commented Jul 18, 2018

hwang595 commented Jul 19, 2018

Stonesjtu commented Jul 19, 2018

hwang595 commented Jul 19, 2018 •

edited

Loading

Stonesjtu commented Jul 20, 2018

hwang595 commented Jul 20, 2018 •

edited

Loading

Stonesjtu commented Jul 23, 2018

Why use float_64 for gradient #2

Why use float_64 for gradient #2

Comments

Stonesjtu commented Jul 18, 2018

hwang595 commented Jul 19, 2018

Stonesjtu commented Jul 19, 2018

hwang595 commented Jul 19, 2018 • edited Loading

Stonesjtu commented Jul 20, 2018

hwang595 commented Jul 20, 2018 • edited Loading

Stonesjtu commented Jul 23, 2018

hwang595 commented Jul 19, 2018 •

edited

Loading

hwang595 commented Jul 20, 2018 •

edited

Loading