Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since last nightly, winit doesn't create windows any more #43102

Closed
est31 opened this issue Jul 7, 2017 · 13 comments
Closed

Since last nightly, winit doesn't create windows any more #43102

est31 opened this issue Jul 7, 2017 · 13 comments

Comments

@est31
Copy link
Member

est31 commented Jul 7, 2017

Doing the following:


$ git clone https://github.com/tomaka/winit
$ cd winit
$ cargo run --example window

gives me a window on stable, beta, and 1.20.0-nightly (3610a70ce 2017-07-05).
On 1.20.0-nightly (696412de7 2017-07-06) I get no window.

List of the commits.

Platform is Linux/X11.

cc @tomaka

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

Obviously this also affects glium examples.

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

Using rust-bisect I've bisected the regression to #42816 (cd72f2e). cc @alexcrichton @nikomatsakis .

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

Minimized example using only the x11_dl crate (x11-dl = "2.8" in Cargo.toml):

extern crate x11_dl;

pub use x11_dl::xlib::Xlib;
pub use x11_dl::xcursor::Xcursor;
use x11_dl::error::OpenError;

fn foo() -> Result<Xlib, OpenError> {
    // opening the libraries
    println!("new 1");
    let xlib = try!(Xlib::open());
    println!("new 2");
    let _xcursor = try!(Xcursor::open());

    Ok(xlib)
}

fn main() {
    foo().unwrap();
    println!("Good!");
}

On the old nightly it prints new 1 then new 2 then Good!. On the new nightly it prints new 1 then segfaults.

@rukai
Copy link
Contributor

rukai commented Jul 7, 2017

Issue doesnt occur when running in a seperate thread:

extern crate x11_dl;

pub use x11_dl::xlib::Xlib;
pub use x11_dl::xcursor::Xcursor;
use x11_dl::error::OpenError;
use std::thread;
use std::time::Duration;

fn foo() -> Result<Xlib, OpenError> {
    // opening the libraries
    println!("new 1");
    let xlib = try!(Xlib::open());
    println!("new 2");
    let _xcursor = try!(Xcursor::open());

    Ok(xlib)
}

fn main() {
    thread::spawn(|| {
        foo().unwrap();
    });
    thread::sleep(Duration::new(999,0));
}

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

Hmm so locally applying #43072 on top of commit cd72f2e didn't help. I still get the segfault with a locally compiled rustc.

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

The location at which the stack overflow occurs seems to be here.

full backtrace (expand)

#0  0x00005555555c52b0 in compiler_builtins::probestack::__rust_probestack ()
    at src/rustc/compiler_builtins_shim/../../libcompiler_builtins/src/probestack.rs:55
#1  0x000055555555dbbe in lazy_static::lazy::{{impl}}::get::{{closure}}<[(&str, usize); 767],fn() -> [(&str, usize); 767]>
    () at ~/.cargo/registry/src/github.mirror.nvdadr.com-1ecc6299db9ec823/lazy_static-0.2.8/src/lazy.rs:22
#2  0x000055555555d82a in std::sync::once::{{impl}}::call_once::{{closure}}<closure> ()
    at ~/src/rust/src/libstd/sync/once.rs:227
#3  0x0000555555580d6d in std::sync::once::Once::call_inner (
    self=0x5555557f51d0 <<x11_dl::xlib::Xlib::init::SYMS as core::ops::deref::Deref>::deref::__stability::LAZY+8>, 
    ignore_poisoning=<Fehler beim Lesen der Variable: access outside bounds of object referenced via synthetic pointer>, 
    init=...) at src/libstd/sync/once.rs:307
#4  0x000055555555d6cc in std::sync::once::Once::call_once<closure> (
    self=0x5555557f51d0 <<x11_dl::xlib::Xlib::init::SYMS as core::ops::deref::Deref>::deref::__stability::LAZY+8>, f=...)
    at ~/src/rust/src/libstd/sync/once.rs:227
#5  0x000055555557dc00 in lazy_static::lazy::Lazy<[(&str, usize); 767]>::get<[(&str, usize); 767],fn() -> [(&str, usize); 767]> (self=<optimized out>)
    at ~/.cargo/registry/src/github.mirror.nvdadr.com-1ecc6299db9ec823/lazy_static-0.2.8/src/lazy.rs:22
#6  x11_dl::xlib::{{impl}}::init::{{impl}}::deref::__stability () at <__lazy_static_internal macros>:20
#7  x11_dl::xlib::{{impl}}::init::{{impl}}::deref (self=0x5555555c9743 <str.eC>) at <__lazy_static_internal macros>:21
#8  0x000055555557d532 in x11_dl::xlib::Xlib::init (self=0x7ffffffef940)
    at ~/.cargo/registry/src/github.mirror.nvdadr.com-1ecc6299db9ec823/x11-dl-2.14.0/src/link.rs:48
#9  0x000055555557d99d in x11_dl::xlib::Xlib::open ()
    at ~/.cargo/registry/src/github.mirror.nvdadr.com-1ecc6299db9ec823/x11-dl-2.14.0/src/link.rs:61
#10 0x000055555555b548 in c::foo () at src/main.rs:10
#11 0x000055555555b9ed in c::main () at src/main.rs:18

@tomaka
Copy link
Contributor

tomaka commented Jul 7, 2017

The problem is probably that a [(&str, usize); 767] is created on the stack here: https://github.com/rust-lang-nursery/lazy-static.rs/blob/63f333bf7795fdae2627041d3d2a782b50f6ddd5/src/lazy.rs#L23

@alexcrichton
Copy link
Member

Ok I believe that this is going to be fixed with rust-lang/compiler-builtins#176. Once that lands I'll open a PR to pull it into rust-lang/rust.

After some debugging with @est31 I also was recommended by @cuviper to cc @bwhacks here as well as I'm told you're working on the kernel side of some changes here. @bwhacks you may be quite interested in this patch!

First it may help to give you some background. Everyone seems super interested in big stacks nowadays in lots of different ways! A few days ago we landed a change which switched to using stack probes on all x86/x86_64 platforms. This uses upstream LLVM's support for emitting a call to a function as part of the prologue for a function, and the intention is to touch each page of a stack frame to ensure they're all mapped. Basically we just want to guarantee that for all threads on all platforms if you have a guard page you'll hit it.

The actual stack probing function is in this file (note that this link is current as-of the time of this writing) which we define locally and LLVM controls the ABI of how its called (register-wise and whatnot).

Unfortunately though it looks like the addition of stack probes has caused this issue as a regression as well as #43110. After debugging this with @est31 we noticed that the segfault in this issue (the program in the comment above) was suspicious in that it was faulting just below the end of the stack. In other words, the segfault was happening just below the lowest-address of the mapped stack, but within a page of the lowest stack address.

I'm personally quite unfamiliar with the implementation of stack growth in the kernel, but the investigation on #43052 clued me into at least some changes happening on the kernel side recently. Based on this I stabbed in the dark with rust-lang/compiler-builtins@f9f6bd0 and it ended up fixing the regression for @est31 (fixing this issue).

So with all that said, I think that this may be another case where the recent changes in the kernel related to stack growth may have accidentally broken programs. My guess is that the kernel previously grew the stack for any fault within a certin range, but nowadays it looks like it also has a condition that the faulting page has to be above the stack pointer. Our __rust_probestack function originally was faulting addresses below the stack pointer but the fix was to update the stack pointer as we go along and that appeared to appease the kernel and trigger normal stack-growth logic.

Note that this isn't really a problem for us. The __rust_probestack function was very recently added and never made its way to a release of Rust itself. We'll have this fixed shortly and when stack probes are stabilized we'll have the new behavior regardless. In other words on our end there's no great urgency to "fix" anything in the kernel, but it seemed like you would want to be aware of this!

@cuviper
Copy link
Member

cuviper commented Jul 7, 2017

One more detail - we were seeing this even after removing Rust's custom guard page, so it seems not to be an issue with the new padding. More like the kernel is putting new constraints on stack growth below the actual stack pointer. Maybe it's enforcing no access farther than the x86_64 red zone? (128 bytes)

@cuviper
Copy link
Member

cuviper commented Jul 7, 2017

Hmm, the x86 page fault handler checks %sp before expanding the stack

	/*
	 * Accessing the stack below %sp is always a bug.
	 * The large cushion allows instructions like enter
	 * and pusha to work. ("enter $65535, $31" pushes
	 * 32 pointers and then decrements %sp by 65535.)
	 */
	if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
		bad_area(regs, error_code, address);
		return;
	}

So "always a bug" but allows some cushion anyway -- yet it seems like we aren't getting that allowance. This code has not been changed in many years, nor does it appear to be changed in the Fedora kernel I'm running.

@est31
Copy link
Member Author

est31 commented Jul 7, 2017

Many thanks @alexcrichton for you looking into this issue!

@fintelia
Copy link
Contributor

This issue is made worse by the fact that cargo run doesn't display segmentation fault information. Silently failing is problematic in situations where no output is interpreted as success. Though even printing a segmentation fault isn't ideal given that a more precise error would be "stack overflow".

@briansmith
Copy link
Contributor

There is no red zone in Windows and (IIRC) you must always move the stack pointer below any address before you access something at that address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants