[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13708

dgarvit · 2019-08-12T17:44:49Z

This PR adds several related package modules:

modules/packages/AtomicObjects.chpl
modules/packages/EpochManager.chpl
modules/packages/LockFreeQueue.chpl
modules/packages/LockFreeStack.chpl

The EpochManager is the main part. AtomicObjects is used in implementing
it but might be generally useful. LockFreeQueue and LockFreeStack are
user-facing data structures using the EpochManager.

Please note that these tests modules only function on x86 with compilers
built to include extern blocks (i.e. CHPL_LLVM!=none).

Please see issue #13690 for design discussion.

LouisJenkinsCS

I believe that we need to move only what we use from the Utilities package (originally from https://github.com/pnnl/chgl/blob/master/src/Utilities.chpl), and the helper data structures such as Vector, LockFreeQueue, LimboList, etc., all into EpochManager and submodules (undocumented); we also need to rename them. Also the LockFreeQueue here is the ABA recycled queue, so you need to move ReclaimedLockFreeQueue to where it is right now.

LouisJenkinsCS · 2019-08-12T17:51:42Z

Also this issue should likely be given the same title as the original issue, as this PR is about providing an infrastructure rather than just an epoch-based memory reclamation system.

LouisJenkinsCS · 2019-08-12T17:53:49Z

modules/packages/LocalAtomics.chpl

+  head.write(new unmanaged Node(int)); // Need 'write(objType)'
+  */
+
+  extern {


I think that CMPXCHG16B stuff should be added directly to the runtime and extern from that, what do you think @mppf and @gbtitus? Right now it would require extern blocks and CHPL_LLVM=llvm, but its not really necessary if these functions are already available from the runtime.

The runtime could provide it, certainly, perhaps in files that were siblings of include/atomics/*/chpl-atomics.h. I don't think that's strictly necessary for the purposes of this PR, however. That said, if you were going to support ARM you'd obviously need to update what's here so as to use a different _asm block for that architecture.

Okay, I just disliked the fact that you need to compile with llvm to use this module, when its not strictly necessary; LLVM takes a long time to build, and I know that there are some platforms that have difficulty building it (as I've run into issue when trying to build it myself). I'd like to commit to a quick runtime change, if possible.

I think it'd make sense to add some minimal 128-bit CAS support to the runtime.

Is CHPL_LLVM=llvm set by default now? I've noticed that I do not have CHPL_LLVM environment variable set; if so, then it doesn't matter if the LocalAtomics.chpl module has an extern block.

Edit: I found that the reason why it set CHPL_LLVM=llvm by default was because I had built it once before.

chapel/util/chplenv/chpl_llvm.py

Lines 28 to 43 in 7011bde

@memoize

def get():

llvm_val = overrides.get('CHPL_LLVM')

if not llvm_val:

# compute a default based on if the included llvm is built

chpl_third_party = get_chpl_third_party()

llvm_target_dir = get_uniq_cfg_path_for('llvm')

llvm_subdir = os.path.join(chpl_third_party, 'llvm', 'install',

llvm_target_dir)

llvm_header = os.path.join(llvm_subdir, 'include', 'llvm',

'PassSupport.h')

if os.path.exists(llvm_header):

llvm_val = 'llvm'

else:

llvm_val = 'none'

return llvm_val

LouisJenkinsCS · 2019-08-12T17:55:35Z

modules/packages/LocalAtomics.chpl

+    }
+
+
+    inline proc localityCheck(objs...) {


I forget why I perform this check again, since I used pointer compression.

modules/packages/LocalAtomics.chpl

modules/packages/LockFreeLinkedList.chpl

modules/packages/LockFreeQueue.chpl

modules/packages/Utilities.chpl

modules/packages/Vector.chpl

modules/packages/EpochManager.chpl

test/library/packages/EpochManager/sharedMemory/ReclaimedLockFreeQueue.chpl

LouisJenkinsCS · 2019-08-12T20:43:58Z

I'd recommend that we add a LockFreeStack module as well; its easy to implement.

modules/packages/LocalAtomics.chpl

modules/packages/ReclaimedLockFreeStack.chpl

LouisJenkinsCS · 2019-08-12T23:44:25Z

I would recommend that ReclaimedLockFree* get changed to LockFree*.

LouisJenkinsCS · 2019-08-19T14:42:53Z

I think this PR is ready to be merged. We will make additions to the runtime at a later date.

modules/packages/LocalAtomics.chpl

mppf

Once this feedback is addressed I will need to look at the function bodies in LocalAtomics and EpochManager in more detail. So far I have focused on the API issues.

modules/packages/LocalAtomics.chpl

test/library/packages/LockFreeStack/lockFreeStack.chpl

modules/packages/LockFreeStack.chpl

modules/packages/EpochManager.chpl

LouisJenkinsCS · 2019-08-23T00:50:36Z

I just performed the major overhaul of AtomicObject! I still have one module, but I have param switches, hasGlobalSupport and hasABASupport. Also made changes for node to Node.

LouisJenkinsCS · 2019-08-23T18:31:36Z

Finished documentation for AtomicObject

LouisJenkinsCS · 2019-08-24T03:49:42Z

Note: @dgarvit We need to rebase this branch on top of upstream master so that we can ensure that our PR is compatible with the numerous potentially breaking changes that are rolling out soon. We should do so this weekend if you are available.

…to one file; update tests

…EpochManager

…it tests, renamed a few

…tion relatively gracefully; also added documentation and a new 'drain' iterator for stack

LouisJenkinsCS · 2019-08-25T13:47:56Z

Would be helpful if #13873 was resolved, which would lighten the workload.

modules/packages/LockFreeQueue.chpl

mppf · 2019-08-26T14:10:44Z

modules/packages/LockFreeStack.chpl

+      tok.pin();
+      do {
+        var oldTop = _top.read();
+        n.next = oldTop;


Why does the deque have a chpl_task_yield here but not the stack? Generally it's good to only yield every n iterations.

To handle cases of oversubscription, and handle cases where there is contention. Ideally, we'd have some backoff if we fail, because it means that there is a lot of contention on the head or tail. Coupled with the fact that Chapel is a cooperative tasking language, yielding is a way to allow for other tasks to have a go. Although maybe it should try a few times before it tries to yield.

Right, but there are several other loops like this (in the LockFreeStack, in the EpochManager). I would expect all of them to include chpl_task_yield.

mppf · 2019-08-26T14:10:58Z

modules/packages/LockFreeStack.chpl

+          var retval : objType;
+          return (false, retval);
+        }
+        var newTop = oldTop.next;


should this be calling chpl_task_yield periodically?

mppf · 2019-08-26T14:16:51Z

modules/packages/AtomicObjects.chpl

+      this.complete();
+      if hasABASupport {
+        var ptr : c_void_ptr;
+        posix_memalign(c_ptrTo(ptr), 16, c_sizeof(ABA(objType?)));


If you are just allocating 16 bytes, why not store it in the record itself? You should be able to do that with a c_array storing 2 uints, e.g. c_array(uint, 2).

Otherwise, it seems that the atomic is stored in the record if !hasABASupport but it is stored off of a pointer if hasABASupport. I can't think of a reason that this would be what is desired (wouldn't you want hasABASupport to just affect the size of the atomic, not whether it is separately allocated)?

If there is a performance benefit to the aligned allocation, shouldn't we also allocate something if !hasABASupport?

If it's a multilocale compilation and we have hasGlobalSupport set, I'd expect the pointer to consist of 128-bits even when !hasABASupport, rather than relying on the pointer compression. It seems to me that the pointer compression is only necessary for hasABASupport && hasGlobalSupport.

CMPXCHG16B instruction requires that the source operand is 16-byte aligned, or else it will result in a general protection fault (GPF). It is necessary that this is a pointer.

The reasoning behind using pointer compression is RDMA atomics. If hasGlobalSupport && !hasABASupport, then you have RDMA atomics for everything. If hasGlobalSupport && hasABASupport, this must be done as remote-execution atomics. If !hasGlobalSupport && hasABASupport, then the compression isn't used and we only use the lower 64-bits. Enabling RDMA atomics is a huge performance optimization here.

I see now that this should be documented somewhere though.

mppf · 2019-08-26T15:31:30Z

modules/packages/EpochManager.chpl

+      forall i in 1..EBR_EPOCHS do
+        limbo_list[i] = new unmanaged LimboList();
+      forall i in LocaleSpace do
+        objsToDelete[i] = new unmanaged Vector(unmanaged object);


Can this duplicate code with the other init be factored out into a helper method?

Yeah, I suppose so.

mppf · 2019-08-26T15:31:59Z

modules/packages/EpochManager.chpl

+
+      :returns: A handle to the manager
+    */
+    proc register() : owned DistTokenWrapper { // owned DistTokenWrapper { // Should be called only once


commented out code should be removed

mppf · 2019-08-26T15:33:30Z

modules/packages/EpochManager.chpl

+        for tok in allocated_list {
+          var local_epoch = tok.local_epoch.read();
+          if (local_epoch > 0) then
+            minEpoch = min(minEpoch, local_epoch);


Is this right? Since the epoch numbers are cyclic?

You're right, this is wrong since we do not use fetchAdd, like I originally thought we did. We should be doing fetchAdd and then modulus division on that.

I'm not sure what you're thinking needs to be fetchAdd'd ?

I changed my mind about this, but it was to refer to when the global_epoch gets advanced.

mppf · 2019-08-26T15:38:54Z

modules/packages/EpochManager.chpl

+      }
+      const current_global_epoch = global_epoch.read();
+
+      if minEpoch == current_global_epoch || minEpoch == max(uint) {


If the goal is to check that each task is on current_global_epoch, I'm not sure checking the minimum epoch does it. Because the global epoch could be 1, and we might have some tasks on epoch 3 and some on epoch 1. Here 3 is "before" 1.

One way to address this would be to get both the minimum and maximum of the task epochs. Another way would be to simply compute the epoch shared by all the tasks if it exits, and some sentinel value (e.g. max(uint) or 0) if not.

You're right that this will not work. The new plan would be to fetchAdd on the epoch, in which the monotonicity would ensure getMinimumEpoch is valid, and use modulus division to determine which limbo list to use.

Actually, instead we'll just check if given the current epoch e, whether the lowest task is on e - 1 or not; since there are only 3 possible values, and it is impossible for another task to be on e - 2.

mppf · 2019-08-26T15:43:36Z

modules/packages/EpochManager.chpl

+------------------------
+To avoid reclamation while a task is accessing a resource, I.E. to enter
+critical section, a task must `pin`. Correspondingly to exit critical section,
+the task must `unpin`.


Pin/unpin doesn't do anything to protect the user data structure, right? It's just a critical section for the memory reclamation.

…unction

Add skipifs for tests using AtomicObjects Follow-on to PR #13708. AtomicObjects.chpl uses extern blocks so can only work when CHPL_LLVM!=none Additionally it uses inline assembly and so is only expected to work with gcc/clang on x86_64. Trivial test change only; not reviewed.

LouisJenkinsCS suggested changes Aug 12, 2019

View reviewed changes

dgarvit changed the title ~~[GSoC 2019] Epoch based Memory Reclamation System~~ [GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures Aug 12, 2019

LouisJenkinsCS suggested changes Aug 12, 2019

View reviewed changes

modules/packages/Vector.chpl Outdated Show resolved Hide resolved

modules/packages/Vector.chpl Outdated Show resolved Hide resolved

modules/packages/Vector.chpl Outdated Show resolved Hide resolved

modules/packages/Vector.chpl Outdated Show resolved Hide resolved

LouisJenkinsCS suggested changes Aug 12, 2019

View reviewed changes

modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved

modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved

modules/packages/EpochManager.chpl Outdated Show resolved Hide resolved

LouisJenkinsCS suggested changes Aug 12, 2019

View reviewed changes

test/library/packages/EpochManager/sharedMemory/ReclaimedLockFreeQueue.chpl Outdated Show resolved Hide resolved

LouisJenkinsCS suggested changes Aug 12, 2019

View reviewed changes

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved

LouisJenkinsCS reviewed Aug 12, 2019

View reviewed changes

modules/packages/ReclaimedLockFreeStack.chpl Outdated Show resolved Hide resolved

chapelautomaton added the stat: cla signed label Aug 12, 2019

gbtitus reviewed Aug 19, 2019

View reviewed changes

modules/packages/LocalAtomics.chpl Outdated Show resolved Hide resolved

mppf reviewed Aug 20, 2019

View reviewed changes

dgarvit added 12 commits August 23, 2019 23:52

Add EpochManager

bc4a521

Add modules to Makefile for chpldoc generation

300a01f

Set token field to nil on unregister

0453aa7

Modify ReclaimedLockFreeQueue test

35506c8

Add lock free data structures as submodules

1110f76

Add no doc pragma

e397408

Remove Utilities.chpl; move Distributed version and shared version in…

bb444df

…to one file; update tests

Remove proc main from LocalAtomics

37e5272

Update Makefile for Documentation

a6ea017

Move ReclaimedLockFreeQueue from test to package

5c8c0d0

Add Cray License

26c7547

Add reclaimed version of Treiber's Stack

84aa4cc

dgarvit and others added 9 commits August 23, 2019 23:52

Make feedback changes

597cd48

Finished documentation for AtomicObject

0b05c8b

Add feedback changes

710045d

Rename EpochManager to LocalEpochManager; DistributedEpochManager to …

28b57eb

…EpochManager

Adding lock free queue and stack correctness tests

fdb1a6d

Added default argument for lock free data structures and added explic…

95e006a

…it tests, renamed a few

Added yielding to handle oversubscription and cases of extreme conten…

a7e97e3

…tion relatively gracefully; also added documentation and a new 'drain' iterator for stack

Added Drain

5531bb7

Adding some attempts at nil-ability fixes

74903e7

LouisJenkinsCS force-pushed the epoch_manager branch from c97ed77 to 74903e7 Compare August 24, 2019 04:49

LouisJenkinsCS added 3 commits August 24, 2019 13:52

Began more changes for nil-ability

3695162

Fixed issues with documentation

8e1adbe

Second pass at making nil-ability checks work

426cd6e

LouisJenkinsCS and others added 3 commits August 25, 2019 09:49

A few more nil-ability conversions

7f9e767

Add newline

ff4f0d4

Add AtomicObjects to docs Makefile

ba464e5

mppf reviewed Aug 26, 2019

View reviewed changes

LouisJenkinsCS and others added 9 commits August 26, 2019 19:25

Documentation for EPochManager

e3d753f

Added some documentation and refactored CAS128Bit to its own helper f…

15617a3

…unction

Finally compiles

3669b0c

Added documentation, made change for handling limbo list reclamation

7ba3cf3

Capture posix_memalign returned value

60b5a8e

Change compilerWarning to compilerError

256c793

Add LockFreeStack and LockFreeQueue to documentation Makefile

043c366

Feedback changes

7d10d8e

Apply feedback changes

fb18180

mppf merged commit a6121f4 into chapel-lang:master Aug 27, 2019

mppf mentioned this pull request Aug 27, 2019

Add skipifs for tests using AtomicObjects #13902

Merged

	@memoize
	def get():
	llvm_val = overrides.get('CHPL_LLVM')
	if not llvm_val:
	# compute a default based on if the included llvm is built
	chpl_third_party = get_chpl_third_party()
	llvm_target_dir = get_uniq_cfg_path_for('llvm')
	llvm_subdir = os.path.join(chpl_third_party, 'llvm', 'install',
	llvm_target_dir)
	llvm_header = os.path.join(llvm_subdir, 'include', 'llvm',
	'PassSupport.h')
	if os.path.exists(llvm_header):
	llvm_val = 'llvm'
	else:
	llvm_val = 'none'
	return llvm_val

[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13708

[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13708

Conversation

dgarvit commented Aug 12, 2019 • edited by mppf Loading

LouisJenkinsCS left a comment

Choose a reason for hiding this comment

LouisJenkinsCS commented Aug 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LouisJenkinsCS Aug 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LouisJenkinsCS commented Aug 12, 2019

LouisJenkinsCS commented Aug 12, 2019

LouisJenkinsCS commented Aug 19, 2019

mppf left a comment

Choose a reason for hiding this comment

LouisJenkinsCS commented Aug 23, 2019

LouisJenkinsCS commented Aug 23, 2019

LouisJenkinsCS commented Aug 24, 2019

LouisJenkinsCS commented Aug 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgarvit commented Aug 12, 2019 •

edited by mppf

Loading

LouisJenkinsCS Aug 17, 2019 •

edited

Loading