Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

Merged
merged 5 commits into from
Dec 1, 2017
Merged

fix race when temp space is used in copy & fix instance overwrite in g2c #8867

merged 5 commits into from
Dec 1, 2017

Conversation

ZiyueHuang
Copy link
Member

@ZiyueHuang ZiyueHuang commented Nov 29, 2017

Description

var of temp space should be in mutable_vars in engine.

[{}] * ctx_len is actually one dict.

cc @eric-haibin-lin

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • [NA] For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
  • To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • unittest

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@ZiyueHuang
Copy link
Member Author

ZiyueHuang commented Nov 29, 2017

https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/comm.h#L187, rsc here used in ElementwiseSum should also be added to mutable vars in engine. I'll fix this in #8732.

@@ -76,7 +76,7 @@

# construct the module
# map the ctx_group attribute to the context assignment
group2ctxs={'dev1':mx.cpu(), 'dev2':[mx.gpu(i) for i in range(num_gpus)]}
group2ctxs={'dev1':[mx.cpu()]*num_gpus, 'dev2':[mx.gpu(i) for i in range(num_gpus)]}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is just for better understandability.

@@ -454,7 +454,8 @@ inline void CopyFromToDnsImpl(const NDArray& from, const NDArray& to, RunContext

// Make a copy of an NDArray based on storage type
template<typename from_xpu, typename to_xpu>
void CopyFromToImpl(const NDArray& from, const NDArray& to, RunContext rctx) {
void CopyFromToImpl(const NDArray& from, const NDArray& to,
RunContext rctx, std::vector<Resource> requested) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const reference for the vector?

@@ -518,43 +515,57 @@ void CopyFromTo(const NDArray& from, const NDArray& to, int priority) {
CHECK(from.shape().ndim() != 0)
<< "source operands have zero dimension shape";
// important: callback must always capture by value
int a = from.ctx().dev_mask();
const auto from_ctx = from.ctx();
int a = from_ctx.dev_mask();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : const a and const b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using auto for simple types

std::vector<Engine::VarHandle> mutable_vars(1, to.var());

std::vector<Resource> requested;
if (a == gpu::kDevMask && from_stype != to_stype) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if b is on GPU ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordding to original codes,

-  std::vector<Resource> requested;
 -  if (is_same<from_xpu, mshadow::gpu>::value && from_stype != to_stype) {
 -    requested.push_back(ResourceManager::Get()->Request(from_ctx,
 -        ResourceRequest(ResourceRequest::kTempSpace)));
 -  }

Seems that whether temp space is used is irrelevant with the context of b ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. No need to request temp space if cast_storage happens on CPU.


check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)})
check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}, [mx.cpu(1), mx.cpu(2)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think explicitly mentioning optional arg names (grad_ctxs) when passing optional args is a good practice since API may change in the future

@eric-haibin-lin eric-haibin-lin self-assigned this Nov 30, 2017
@ZiyueHuang
Copy link
Member Author

@piiswrong

@piiswrong piiswrong merged commit d7da05b into apache:master Dec 1, 2017
KellenSunderland pushed a commit to KellenSunderland/incubator-mxnet that referenced this pull request Dec 13, 2017
…g2c (apache#8867)

* fix race when temp space is used in copy

* fix instance overwrite in g2c

* example of g2c

* address comments
zhreshold pushed a commit to zhreshold/mxnet that referenced this pull request Dec 14, 2017
…g2c (apache#8867)

* fix race when temp space is used in copy

* fix instance overwrite in g2c

* example of g2c

* address comments
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
…g2c (apache#8867)

* fix race when temp space is used in copy

* fix instance overwrite in g2c

* example of g2c

* address comments
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…g2c (apache#8867)

* fix race when temp space is used in copy

* fix instance overwrite in g2c

* example of g2c

* address comments
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants