Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement emutls inside compiler_rt.zig #7577

Merged
merged 2 commits into from
Jan 13, 2021
Merged

Conversation

semarie
Copy link
Contributor

@semarie semarie commented Dec 28, 2020

This implementation follows LLVM compiler_rt emutls implementation (release 8.0 - MIT).

It is required for at least OpenBSD and Android (see #5921), but as I can't test for Android, so I don't have setup it for it.

There are still some XXX points which would need discussion:

  • emutls needs to allocate data on heap: its owns data (a per thread array of pointers), and threadvar data (a block)
  • threadvar data needs allocation at runtime know alignment. current Allocator interface doesn't permit that (only compile-know alignement), so the current code directly use posix_memalign() (like LLVM emutls.c)
  • emutls_control struct is using data with special size: gcc_word which is defined in C as typedef unsigned int gcc_word __attribute__((mode(word))). for now I just checked the value on OpenBSD x68_64 and use it

Regarding the code itself, the entrypoint of emutls is __emutls_get_address(). The structure emutls_control is part of the ABI (what GCC or LLVM will pass to __emutls_get_address() for asking a memory pointer).

Memory is automatically reclaimed when a thread is terminated (emutls memory and threadvar memory).

Copy link
Contributor

@LemonBoy LemonBoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I started implementing this but then life got in the way.

We need some way to force the emulated TLS on for some other platforms and/or get a OpenBSD CI machine to make sure this works (and keeps working) as intended.

lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved

/// Retrieve the pointer at request index, using control to initialize it if needed.
pub inline fn get_pointer(self: *ObjectArray, index: usize, control: *emutls_control) !ObjectPointer {
if (self.slots[index] == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using and abusing orelse like this

return self.slots[index] orelse block: {
    // init code here...
    break :block xxx;
};

lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
var mutex: std.c.pthread_mutex_t = std.c.PTHREAD_MUTEX_INITIALIZER;

/// Return a next free index, incrementing global counter.
pub fn new() usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just need to pull a fresh index I'd turn this into a single LOCK+GET+INCREMENT+UNLOCK sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand the exact context of the remark.

if it is just about the new() method, it is already done under locking (i will rename it to new_locked() to be more explicit).

if it is about the global index managment (getIndex()), grabing a full mutex would be too inefficient: the function is called at every threadvar access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is just about the new() method, it is already done under locking (i will rename it to new_locked() to be more explicit).

Instead of requiring the caller to lock, call new then unlock the mutex my suggestion was to replace it with a single method like this:

fn next_index() usize {
    mutex.lock();
    defer mutex.unlock();
    const old = index;
    index += 1;
    return index;
}

After all you only need the mutex to make sure each index is unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general usage of the index is: "get the index but if 0 initialize it with the next global value". with the fact that several threads could race to read/set the index.

the simple method could be:

  • lock globally all threads for the global counter (it will also serialize access on control index)
  • check the control index
  • if 0, get the next index (and incr the global value)
  • return the index (and unlock)

it is would involve taking a lock for every access of the index. lock couldn't be taken per variable, as emutls_control doesn't have structure to hold it.

the approch taken by LLVM emutls.c (and so by me too), is to use atomic reading lockless (so with possible races). in the general case (index already initialized) there is no problem and access is quick. for first access, it needs locking (and race checking as two threads could have read 0 index value and want to initialize it). so it can't be a all-in-one function for getting the next index.

eventually I could merge Index.new() inside control.get_index() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind then, if multiple threads can race to initialize the same emutls_control this makes sense.
Packing mutex with the index is misleading, one is tempted to assume its only task is to protect the global counter while here it serves as a synchronization point for every thread/TLS object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorganized the code. the global variable for the index has been put toplevel. and the mutex is a global variable inside emutls_control and lcok() and unlock() helpers.

and now I wonder if I should put the global variable global_index inside emutls_control too.


/// Invoked by pthread specific destructor. the passed argument is the ObjectArray pointer.
fn deinit(arrayPtr: *c_void) callconv(.C) void {
var array = @ptrCast(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's about time this @ptrCast(*T, @alignCast(@alignOf(T), ptr) pattern is turned into a nice helper in std.mem.

And... are you sure @alignOf(*ObjectArray) is not @alignOf(ObjectArray) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And... are you sure @Alignof(*ObjectArray) is not @Alignof(ObjectArray) ?

the passed argument is the same object set with pthread_setspecific(). I passed a *ObjectArray, so I should get a *ObjectArray. Regarding the alignment, I am a bit unexperimented with them. I am casting to *ObjectArray so I keep the same for the alignment. But I could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This small snippet should help understand the subtle difference between the two:

    const A = struct {
        x: u32,
    };
    var x: A = undefined;
    @compileLog(@typeInfo(@TypeOf(&x)).Pointer.alignment); // 4
    @compileLog(@typeInfo(@TypeOf(@alignCast(@alignOf(A), &x))).Pointer.alignment); // 4
    @compileLog(@typeInfo(@TypeOf(@alignCast(@alignOf(*A), &x))).Pointer.alignment); // 8

lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
lib/std/special/compiler_rt/emutls.zig Outdated Show resolved Hide resolved
@semarie
Copy link
Contributor Author

semarie commented Dec 29, 2020

We need some way to force the emulated TLS on for some other platforms and/or get a OpenBSD CI machine to make sure this works (and keeps working) as intended.

to force emulated TLS, it would need a new -femulated-tls and -fno-emulated-tls on zig command-line, and set option inside ZigLLVMCreateTargetMachine() too.

@LemonBoy
Copy link
Contributor

I can confirm this implementation works just fine on Linux w/ forced emulated TLS, thanks to this I discovered #7609.

@semarie semarie marked this pull request as ready for review December 31, 2020 07:39
@semarie
Copy link
Contributor Author

semarie commented Jan 7, 2021

@andrewrk could you considere the PR to be merged ? (the freebsd CI failure isn't related)

@andrewrk
Copy link
Member

andrewrk commented Jan 7, 2021

Yes I will have a look at this tomorrow!

@semarie
Copy link
Contributor Author

semarie commented Jan 12, 2021

I folded few commits and rebased it on top of master. and maybe having a green CI and some activity might help to make progress on this.

@andrewrk andrewrk merged commit 70c608a into ziglang:master Jan 13, 2021
@andrewrk
Copy link
Member

Nice work @semarie, thanks for your patience on the merge.

return control.getPointer();
}

/// Simple allocator interface, to avoid pulling in the while
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/while/whole

@semarie semarie deleted the emutls branch January 13, 2021 05:30
@mikdusan mikdusan added the release notes This PR should be mentioned in the release notes. label Jan 24, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants