Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13708

Merged
merged 47 commits into from
Aug 27, 2019

Conversation

dgarvit
Copy link
Contributor

@dgarvit dgarvit commented Aug 12, 2019

This PR adds several related package modules:

  • modules/packages/AtomicObjects.chpl
  • modules/packages/EpochManager.chpl
  • modules/packages/LockFreeQueue.chpl
  • modules/packages/LockFreeStack.chpl

The EpochManager is the main part. AtomicObjects is used in implementing
it but might be generally useful. LockFreeQueue and LockFreeStack are
user-facing data structures using the EpochManager.

Please note that these tests modules only function on x86 with compilers
built to include extern blocks (i.e. CHPL_LLVM!=none).

Please see issue #13690 for design discussion.

Copy link
Member

@LouisJenkinsCS LouisJenkinsCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we need to move only what we use from the Utilities package (originally from https://github.com/pnnl/chgl/blob/master/src/Utilities.chpl), and the helper data structures such as Vector, LockFreeQueue, LimboList, etc., all into EpochManager and submodules (undocumented); we also need to rename them. Also the LockFreeQueue here is the ABA recycled queue, so you need to move ReclaimedLockFreeQueue to where it is right now.

@LouisJenkinsCS
Copy link
Member

Also this issue should likely be given the same title as the original issue, as this PR is about providing an infrastructure rather than just an epoch-based memory reclamation system.

@dgarvit dgarvit changed the title [GSoC 2019] Epoch based Memory Reclamation System [GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures Aug 12, 2019
head.write(new unmanaged Node(int)); // Need 'write(objType)'
*/

extern {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that CMPXCHG16B stuff should be added directly to the runtime and extern from that, what do you think @mppf and @gbtitus? Right now it would require extern blocks and CHPL_LLVM=llvm, but its not really necessary if these functions are already available from the runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime could provide it, certainly, perhaps in files that were siblings of include/atomics/*/chpl-atomics.h. I don't think that's strictly necessary for the purposes of this PR, however. That said, if you were going to support ARM you'd obviously need to update what's here so as to use a different _asm block for that architecture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I just disliked the fact that you need to compile with llvm to use this module, when its not strictly necessary; LLVM takes a long time to build, and I know that there are some platforms that have difficulty building it (as I've run into issue when trying to build it myself). I'd like to commit to a quick runtime change, if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to add some minimal 128-bit CAS support to the runtime.

Copy link
Member

@LouisJenkinsCS LouisJenkinsCS Aug 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CHPL_LLVM=llvm set by default now? I've noticed that I do not have CHPL_LLVM environment variable set; if so, then it doesn't matter if the LocalAtomics.chpl module has an extern block.

Edit: I found that the reason why it set CHPL_LLVM=llvm by default was because I had built it once before.

@memoize
def get():
llvm_val = overrides.get('CHPL_LLVM')
if not llvm_val:
# compute a default based on if the included llvm is built
chpl_third_party = get_chpl_third_party()
llvm_target_dir = get_uniq_cfg_path_for('llvm')
llvm_subdir = os.path.join(chpl_third_party, 'llvm', 'install',
llvm_target_dir)
llvm_header = os.path.join(llvm_subdir, 'include', 'llvm',
'PassSupport.h')
if os.path.exists(llvm_header):
llvm_val = 'llvm'
else:
llvm_val = 'none'
return llvm_val

}


inline proc localityCheck(objs...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why I perform this check again, since I used pointer compression.

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved
modules/packages/LockFreeLinkedList.chpl Outdated Show resolved Hide resolved
modules/packages/LockFreeQueue.chpl Outdated Show resolved Hide resolved
modules/packages/Utilities.chpl Outdated Show resolved Hide resolved
modules/packages/Vector.chpl Outdated Show resolved Hide resolved
modules/packages/Vector.chpl Outdated Show resolved Hide resolved
modules/packages/Vector.chpl Outdated Show resolved Hide resolved
modules/packages/Vector.chpl Outdated Show resolved Hide resolved
modules/packages/Vector.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
@LouisJenkinsCS
Copy link
Member

I'd recommend that we add a LockFreeStack module as well; its easy to implement.

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved
modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved
@LouisJenkinsCS
Copy link
Member

I would recommend that ReclaimedLockFree* get changed to LockFree*.

@LouisJenkinsCS
Copy link
Member

I think this PR is ready to be merged. We will make additions to the runtime at a later date.

Copy link
Member

@mppf mppf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this feedback is addressed I will need to look at the function bodies in LocalAtomics and EpochManager in more detail. So far I have focused on the API issues.

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved
test/library/packages/LockFreeStack/lockFreeStack.chpl Outdated Show resolved Hide resolved
modules/packages/LockFreeStack.chpl Outdated Show resolved Hide resolved
modules/packages/LockFreeStack.chpl Outdated Show resolved Hide resolved
modules/packages/LockFreeStack.chpl Show resolved Hide resolved
modules/packages/EpochManager.chpl Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved
@LouisJenkinsCS
Copy link
Member

I just performed the major overhaul of AtomicObject! I still have one module, but I have param switches, hasGlobalSupport and hasABASupport. Also made changes for node to Node.

@LouisJenkinsCS
Copy link
Member

Finished documentation for AtomicObject

@LouisJenkinsCS
Copy link
Member

Note: @dgarvit We need to rebase this branch on top of upstream master so that we can ensure that our PR is compatible with the numerous potentially breaking changes that are rolling out soon. We should do so this weekend if you are available.

@LouisJenkinsCS
Copy link
Member

Would be helpful if #13873 was resolved, which would lighten the workload.

modules/packages/LockFreeQueue.chpl Show resolved Hide resolved
modules/packages/LockFreeQueue.chpl Show resolved Hide resolved
tok.pin();
do {
var oldTop = _top.read();
n.next = oldTop;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the deque have a chpl_task_yield here but not the stack? Generally it's good to only yield every n iterations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle cases of oversubscription, and handle cases where there is contention. Ideally, we'd have some backoff if we fail, because it means that there is a lot of contention on the head or tail. Coupled with the fact that Chapel is a cooperative tasking language, yielding is a way to allow for other tasks to have a go. Although maybe it should try a few times before it tries to yield.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but there are several other loops like this (in the LockFreeStack, in the EpochManager). I would expect all of them to include chpl_task_yield.

var retval : objType;
return (false, retval);
}
var newTop = oldTop.next;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be calling chpl_task_yield periodically?

this.complete();
if hasABASupport {
var ptr : c_void_ptr;
posix_memalign(c_ptrTo(ptr), 16, c_sizeof(ABA(objType?)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are just allocating 16 bytes, why not store it in the record itself? You should be able to do that with a c_array storing 2 uints, e.g. c_array(uint, 2).

Otherwise, it seems that the atomic is stored in the record if !hasABASupport but it is stored off of a pointer if hasABASupport. I can't think of a reason that this would be what is desired (wouldn't you want hasABASupport to just affect the size of the atomic, not whether it is separately allocated)?

If there is a performance benefit to the aligned allocation, shouldn't we also allocate something if !hasABASupport?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a multilocale compilation and we have hasGlobalSupport set, I'd expect the pointer to consist of 128-bits even when !hasABASupport, rather than relying on the pointer compression. It seems to me that the pointer compression is only necessary for hasABASupport && hasGlobalSupport.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. CMPXCHG16B instruction requires that the source operand is 16-byte aligned, or else it will result in a general protection fault (GPF). It is necessary that this is a pointer.
  2. The reasoning behind using pointer compression is RDMA atomics. If hasGlobalSupport && !hasABASupport, then you have RDMA atomics for everything. If hasGlobalSupport && hasABASupport, this must be done as remote-execution atomics. If !hasGlobalSupport && hasABASupport, then the compression isn't used and we only use the lower 64-bits. Enabling RDMA atomics is a huge performance optimization here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that this should be documented somewhere though.

forall i in 1..EBR_EPOCHS do
limbo_list[i] = new unmanaged LimboList();
forall i in LocaleSpace do
objsToDelete[i] = new unmanaged Vector(unmanaged object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this duplicate code with the other init be factored out into a helper method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose so.


:returns: A handle to the manager
*/
proc register() : owned DistTokenWrapper { // owned DistTokenWrapper { // Should be called only once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code should be removed

for tok in allocated_list {
var local_epoch = tok.local_epoch.read();
if (local_epoch > 0) then
minEpoch = min(minEpoch, local_epoch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? Since the epoch numbers are cyclic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is wrong since we do not use fetchAdd, like I originally thought we did. We should be doing fetchAdd and then modulus division on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're thinking needs to be fetchAdd'd ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind about this, but it was to refer to when the global_epoch gets advanced.

}
const current_global_epoch = global_epoch.read();

if minEpoch == current_global_epoch || minEpoch == max(uint) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to check that each task is on current_global_epoch, I'm not sure checking the minimum epoch does it. Because the global epoch could be 1, and we might have some tasks on epoch 3 and some on epoch 1. Here 3 is "before" 1.

One way to address this would be to get both the minimum and maximum of the task epochs. Another way would be to simply compute the epoch shared by all the tasks if it exits, and some sentinel value (e.g. max(uint) or 0) if not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this will not work. The new plan would be to fetchAdd on the epoch, in which the monotonicity would ensure getMinimumEpoch is valid, and use modulus division to determine which limbo list to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead we'll just check if given the current epoch e, whether the lowest task is on e - 1 or not; since there are only 3 possible values, and it is impossible for another task to be on e - 2.

------------------------
To avoid reclamation while a task is accessing a resource, I.E. to enter
critical section, a task must `pin`. Correspondingly to exit critical section,
the task must `unpin`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin/unpin doesn't do anything to protect the user data structure, right? It's just a critical section for the memory reclamation.

@mppf mppf merged commit a6121f4 into chapel-lang:master Aug 27, 2019
mppf added a commit that referenced this pull request Aug 27, 2019
Add skipifs for tests using AtomicObjects

Follow-on to PR #13708.

AtomicObjects.chpl uses extern blocks so can only work when CHPL_LLVM!=none
Additionally it uses inline assembly and so is only expected to work with gcc/clang on x86_64.

Trivial test change only; not reviewed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants