fix race when temp space is used in copy & fix instance overwrite in g2c #8867

ZiyueHuang · 2017-11-29T12:37:10Z

Description

var of temp space should be in mutable_vars in engine.

[{}] * ctx_len is actually one dict.

cc @eric-haibin-lin

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
[NA] For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

unittest

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

ZiyueHuang · 2017-11-29T12:47:32Z

https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/comm.h#L187, rsc here used in ElementwiseSum should also be added to mutable vars in engine. I'll fix this in #8732.

ZiyueHuang · 2017-11-29T13:00:14Z

example/model-parallel/matrix_factorization/train.py

@@ -76,7 +76,7 @@

    # construct the module
    # map the ctx_group attribute to the context assignment
-    group2ctxs={'dev1':mx.cpu(), 'dev2':[mx.gpu(i) for i in range(num_gpus)]}
+    group2ctxs={'dev1':[mx.cpu()]*num_gpus, 'dev2':[mx.gpu(i) for i in range(num_gpus)]}


This change is just for better understandability.

eric-haibin-lin · 2017-11-30T16:37:06Z

src/ndarray/ndarray.cc

@@ -454,7 +454,8 @@ inline void CopyFromToDnsImpl(const NDArray& from, const NDArray& to, RunContext

 // Make a copy of an NDArray based on storage type
 template<typename from_xpu, typename to_xpu>
-void CopyFromToImpl(const NDArray& from, const NDArray& to, RunContext rctx) {
+void CopyFromToImpl(const NDArray& from, const NDArray& to,
+                    RunContext rctx, std::vector<Resource> requested) {


const reference for the vector?

eric-haibin-lin · 2017-11-30T16:38:02Z

src/ndarray/ndarray.cc

@@ -518,43 +515,57 @@ void CopyFromTo(const NDArray& from, const NDArray& to, int priority) {
  CHECK(from.shape().ndim() != 0)
      << "source operands have zero dimension shape";
  // important: callback must always capture by value
-  int a = from.ctx().dev_mask();
+  const auto from_ctx = from.ctx();
+  int a = from_ctx.dev_mask();


Nit : const a and const b

Please avoid using auto for simple types

eric-haibin-lin · 2017-11-30T16:39:47Z

src/ndarray/ndarray.cc

+  std::vector<Engine::VarHandle> mutable_vars(1, to.var());
+
+  std::vector<Resource> requested;
+  if (a == gpu::kDevMask && from_stype != to_stype) {


What if b is on GPU ?

Accordding to original codes,

- std::vector<Resource> requested; - if (is_same<from_xpu, mshadow::gpu>::value && from_stype != to_stype) { - requested.push_back(ResourceManager::Get()->Request(from_ctx, - ResourceRequest(ResourceRequest::kTempSpace))); - }

Seems that whether temp space is used is irrelevant with the context of b ?

Oh right. No need to request temp space if cast_storage happens on CPU.

eric-haibin-lin · 2017-11-30T16:44:24Z

tests/python/unittest/test_module.py


-    check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)})
+    check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}, [mx.cpu(1), mx.cpu(2)])


Nit: I think explicitly mentioning optional arg names (grad_ctxs) when passing optional args is a good practice since API may change in the future

ZiyueHuang · 2017-12-01T07:08:26Z

@piiswrong

…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments

ZiyueHuang added 2 commits November 29, 2017 18:19

fix race when temp space is used in copy

496ca53

fix instance overwrite in g2c

47baf41

ZiyueHuang commented Nov 29, 2017

View reviewed changes

ZiyueHuang mentioned this pull request Nov 29, 2017

Temp resource in copy from to eric-haibin-lin/mxnet#188

Closed

example of g2c

c1776e4

eric-haibin-lin suggested changes Nov 30, 2017

View reviewed changes

eric-haibin-lin self-assigned this Nov 30, 2017

ZiyueHuang added 2 commits December 1, 2017 01:46

resolve

3b4cb6b

address comments

ecf9876

eric-haibin-lin approved these changes Nov 30, 2017

View reviewed changes

piiswrong merged commit d7da05b into apache:master Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

ZiyueHuang commented Nov 29, 2017 •

edited

Loading

ZiyueHuang commented Nov 29, 2017 •

edited

Loading

ZiyueHuang Nov 29, 2017

eric-haibin-lin Nov 30, 2017

eric-haibin-lin Nov 30, 2017

eric-haibin-lin Nov 30, 2017

eric-haibin-lin Nov 30, 2017

ZiyueHuang Nov 30, 2017

eric-haibin-lin Nov 30, 2017

eric-haibin-lin Nov 30, 2017

ZiyueHuang commented Dec 1, 2017


		check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)})
		check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}, [mx.cpu(1), mx.cpu(2)])

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

Conversation

ZiyueHuang commented Nov 29, 2017 • edited Loading

Description

Checklist

Essentials

Changes

Comments

ZiyueHuang commented Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZiyueHuang commented Dec 1, 2017

ZiyueHuang commented Nov 29, 2017 •

edited

Loading

ZiyueHuang commented Nov 29, 2017 •

edited

Loading