Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add libbacktrace support for Apple platforms (resubmitted) #44251

Merged
merged 6 commits into from
Sep 9, 2017

Conversation

kennytm
Copy link
Member

@kennytm kennytm commented Sep 1, 2017

Resubmitting #43422 rebased on the current master (cc @JohnColanduoni).

I have added an additional commit to fallback to dladdr-based resolve_symbol if libbacktrace returns None, otherwise the stack trace will be full of <unknown> when you forget to pass the -g flag (actually it seems — at least on macOS — the dladdr symbol is more accurate than the libbacktrace one).

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@shepmaster shepmaster added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 1, 2017
@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Sep 3, 2017

📌 Commit 7169fe5 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Sep 3, 2017

⌛ Testing commit 7169fe5 with merge caa5f7fc5b3e84b57827a21219f196a570afeb4d...

@bors
Copy link
Contributor

bors commented Sep 3, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member Author

kennytm commented Sep 3, 2017

@bors retry #38618

dist-i586-gnu-i686-musl failed spuriously.

[01:18:11] �[m�[m�[32m�[1m     Running�[m build/x86_64-unknown-linux-gnu/stage2-std/i686-unknown-linux-musl/release/deps/coretests-41a06d4921b2cc6a
[01:18:11] 
[01:18:11] running 674 tests
[01:18:11] test any::any_downcast_mut ... ok
[01:18:11] test any::any_fixed_vec ... ok
...
[01:18:11] test num::test_try_u32usize ... ok
[01:18:11] test num::test_try_u64i128 ... ok
[01:18:11] �[m�[m�[31m�[1merror:�[m An unknown error occurred
[01:18:11] 
[01:18:11] To learn more, run the command again with --verbose.
[01:18:11] 
[01:18:11] 
[01:18:11] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "-j" "4" "--target" "i686-unknown-linux-musl" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std:0.0.0" "-p" "alloc_system:0.0.0" "-p" "unwind:0.0.0" "-p" "collections:0.0.0" "-p" "panic_abort:0.0.0" "-p" "libc:0.0.0" "-p" "std_unicode:0.0.0" "-p" "compiler_builtins:0.0.0" "-p" "alloc:0.0.0" "-p" "core:0.0.0" "-p" "rand:0.0.0" "--"
[01:18:11] expected success, got: exit code: 101
[01:18:11] 
[01:18:11] 
[01:18:11] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target i686-unknown-linux-musl --target i586-unknown-linux-gnu
[01:18:11] Build completed unsuccessfully in 1:16:14

@bors
Copy link
Contributor

bors commented Sep 3, 2017

⌛ Testing commit 7169fe5 with merge e3761e1b85e0019044513195cda81e93f3e51cf4...

@bors
Copy link
Contributor

bors commented Sep 3, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member Author

kennytm commented Sep 3, 2017

check i686-apple-darwin failed, legit. (the same test for x86_64-apple-darwin passed)

[01:25:56] failures:
[01:25:56] 
[01:25:56] ---- [run-pass] run-pass/backtrace-debuginfo.rs stdout ----
[01:25:56] 	
[01:25:56] error: test run failed!
[01:25:56] status: exit code: 101
[01:25:56] command: "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/test/run-pass/backtrace-debuginfo.stage2-i686-apple-darwin"
[01:25:56] stdout:
[01:25:56] ------------------------------------------
[01:25:56] 
[01:25:56] ------------------------------------------
[01:25:56] stderr:
[01:25:56] ------------------------------------------
[01:25:56] thread 'main' panicked at 'trace does not match position list: test case 0
[01:25:56] thread 'main' panicked at 'explicit panic', /Users/travis/build/rust-lang/rust/src/test/run-pass/backtrace-debuginfo.rs:78:4
[01:25:56] stack backtrace:
[01:25:56]    0:   0x1e88c8 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::h58fac47b4a42cd77
[01:25:56]    1:   0x1e28f8 - std::sys_common::backtrace::_print::h285e13909861b307
[01:25:56]    2:   0x1f58ae - std::panicking::default_hook::{{closure}}::h41f407050fbb9ee1
[01:25:56]    3:   0x1f5490 - std::panicking::default_hook::h1733bcbadb978adb
[01:25:56]    4:   0x1f5ef1 - std::panicking::rust_panic_with_hook::hfa1cfe4ac54c174c
[01:25:56]    5:    0xa7fdd - std::panicking::begin_panic::h02a6a28c53c09dd1
[01:25:56]    6:    0xa9822 - backtrace_debuginfo::inner::hd9078f7dd9efd109
[01:25:56]    7:    0xa9942 - backtrace_debuginfo::outer::hb361f2c852aa11c2
[01:25:56]    8:    0xaa0d2 - backtrace_debuginfo::main::hda900a635795a7e5
[01:25:56]    9:   0x207a71 - __rust_maybe_catch_panic
[01:25:56]   10:   0x1f6bd9 - std::rt::lang_start::ha0d04e85043cf9de
[01:25:56]   11:    0xaafc0 - main
[01:25:56] 
[01:25:56] ---
[01:25:56] backtrace-debuginfo.rs:119
[01:25:56] backtrace-debuginfo.rs:167
[01:25:56] ', /Users/travis/build/rust-lang/rust/src/test/run-pass/backtrace-debuginfo.rs:133:4
[01:25:56] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[01:25:56] 
[01:25:56] ------------------------------------------
[01:25:56] 
[01:25:56] thread '[run-pass] run-pass/backtrace-debuginfo.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:2435:8
[01:25:56] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[01:25:56] 
[01:25:56] 
[01:25:56] failures:
[01:25:56]     [run-pass] run-pass/backtrace-debuginfo.rs
[01:25:56] 
[01:25:56] test result: FAILED. 2745 passed; 1 failed; 6 ignored; 0 measured; 0 filtered out
[01:25:56] 
[01:25:56] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:323:21
[01:25:56] 
[01:25:56] 
[01:25:56] command did not execute successfully: "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/stage0-tools/i686-apple-darwin/release/compiletest" "--compile-lib-path" "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/stage2/lib" "--run-lib-path" "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/stage2/lib/rustlib/i686-apple-darwin/lib" "--rustc-path" "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/stage2/bin/rustc" "--src-base" "/Users/travis/build/rust-lang/rust/src/test/run-pass" "--build-base" "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/test/run-pass" "--stage-id" "stage2-i686-apple-darwin" "--mode" "run-pass" "--target" "i686-apple-darwin" "--host" "i686-apple-darwin" "--llvm-filecheck" "/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/llvm/build/bin/FileCheck" "--nodejs" "/Users/travis/.nvm/versions/node/v6.9.1/bin/node" "--host-rustcflags" "-Crpath -O" "--target-rustcflags" "-Crpath -O -Lnative=/Users/travis/build/rust-lang/rust/build/i686-apple-darwin/native/rust-test-helpers" "--docck-python" "/usr/local/opt/python/bin/python2.7" "--lldb-python" "/usr/bin/python" "--lldb-version" "lldb-360.1.70" "--lldb-python-dir" "/Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Resources/Python" "--llvm-version" "4.0.1\n" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:25:56] expected success, got: exit code: 101
[01:25:56] 
[01:25:56] 
[01:25:56] failed to run: /Users/travis/build/rust-lang/rust/build/bootstrap/debug/bootstrap test
[01:25:56] Build completed unsuccessfully in 0:21:42
[01:25:56] make: *** [check] Error 1

(Now trying to reproduce locally.)

Update: Reproduced. Looks like the line numbers are just missing in i686-apple-darwin.

@kennytm
Copy link
Member Author

kennytm commented Sep 3, 2017

@alexcrichton The particular test case is fixed in 6466475b98c9019b7bcdd0a1eec346cccae23cea (tested locally on both i686-apple-darwin and x86_64-apple-darwin at stage1). Seems to be a logic error in the previous PR when comparing with the ELF implementation.

@alexcrichton
Copy link
Member

Hm I'm getting pretty nervous about this, we're adding a pretty big wad of umaintained C to the standard library for all Rust programs on OSX and we've already found a bug in it?

I'd personally feel more comfortable if we could get this upstream first, or if we know that this had gone through something like fuzzers and whatnot (or had it off by default maybe?)

@kennytm
Copy link
Member Author

kennytm commented Sep 3, 2017

I'd personally feel more comfortable if we could get this upstream first,

That's hard to do, as the upstream has no code updates for 1 year already, with both ianlancetaylor/libbacktrace#2 and ianlancetaylor/libbacktrace#6 idling 😅. Maybe rust-lang needs to fork the repo, or switch to a more maintained library (gimli?)

or if we know that this had gone through something like fuzzers and whatnot (or had it off by default maybe?)

There is a test file in https://github.com/ianlancetaylor/libbacktrace/blob/master/btest.c but I haven't tried to run it. Not sure if it was tested in #43422. I think it's not fuzzed 😨.

@JohnColanduoni
Copy link
Contributor

I've run the btest program but it's not even as comprehensive as Rust's existing test.

I'm not sure the current fix is fixing it in the right place: macho_add should not return 0 if it fails to find a dsym file or symbol table. It should only do so on a hard error condition, which is why I originally passed the error up the chain. Since the test is passing when this is bypassed it appears macho_add is failing on one of the loaded dylibs, and that this dylib is either only loaded by 32-bit executables or only the 32-bit slice throws the code for a loop. I'll try to pin down which one.

@kennytm
Copy link
Member Author

kennytm commented Sep 4, 2017

@JohnColanduoni The problem is that when loading the linked system libraries (/usr/lib/libSystem.B.dylib, /usr/lib/libresolv.9.dylib etc), it failed at macho_get_commands. It looks like that real problem is in the fat-binary check.

Still, if we compare with the ELF code, I think how 6466475 handles it is closer in spirit.

(Snippet of the ELF code)
int
backtrace_initialize (struct backtrace_state *state, int descriptor,
		      backtrace_error_callback error_callback,
		      void *data, fileline *fileline_fn)
{
  // ...
  dl_iterate_phdr (phdr_callback, (void *) &pd);
  // ...
}

static int
phdr_callback (struct dl_phdr_info *info, size_t size ATTRIBUTE_UNUSED,
	       void *pdata)
{
  // ...

  if (elf_add (pd->state, descriptor, info->dlpi_addr, pd->error_callback,
	       pd->data, &elf_fileline_fn, pd->found_sym, &found_dwarf, 0))
    {
      if (found_dwarf)
	{
	  *pd->found_dwarf = 1;
	  *pd->fileline_fn = elf_fileline_fn;
	}
    }

  return 0;
}

Update: On i686 the NATIVE_CPU_TYPE is 7 (good) but the computed cpu_type of the native libraries are 2 and 12 respectively (bad).

Update 2: ಠ_ಠ

@@ -288,7 +288,7 @@ macho_get_commands (struct backtrace_state *state, int descriptor,
 
       archs_total_size = arch_count * sizeof (struct fat_arch);
 
-      if (!backtrace_get_view (state, descriptor, sizeof (fat_header),
+      if (!backtrace_get_view (state, descriptor, sizeof (struct fat_header),
                                archs_total_size, error_callback,
                                data, &fat_archs_view))

Fixing this now gives the image_actual_base_address == 0 error so still not entirely fixed.

@JohnColanduoni
Copy link
Contributor

@kennytm It looks like the problem is lazy linking on i386 causing the virtual address slide for those system libraries to be zero. Adding a continue when current_vmslide == 0 should do the trick.

@kennytm
Copy link
Member Author

kennytm commented Sep 5, 2017

@JohnColanduoni The vmslide is also zero when ASLR is disabled, e.g. when running under LLDB. I don't think that's a valid solution.

@JohnColanduoni
Copy link
Contributor

@kennytm Ah you're right. That should only be for the main executable though, any dylibs will need to be loaded somewhere else even without ASLR (and the main executable will never be lazy loaded anyway). Perhaps i != 0 && current_vmslide == 0?

@kennytm
Copy link
Member Author

kennytm commented Sep 5, 2017

@JohnColanduoni Added to aa6bd11

The remaining problem is to demonstrate that the code is correct and safe (#44251 (comment)).

@alexcrichton
Copy link
Member

Thanks for the investigation and the fix here @kennytm and @JohnColanduoni!

To be clear I don't want to necessarily block this from landing until it's 100% perfect. To some degree the problem here is basically "we need to decide to take on the maintenance burden". Using libbacktrace at all has been a maintenance burden many times before, but if we went back and did it all over again I feel like we'd still decide to include it because line numbers are just so damn useful.

In that sense I think we're basically commited to "we really need this in one form or another." There's a lot of OSX users and they'll be quite happy to start seeing line numbers in backtraces I imagine! We basically, I think, just need to think about possible mitigations.

Some mitigations for making this as low-impact as possible are:

  1. Ensure this is thoroughly tested by our own test suite. This code is on by default, we're testing OSX, and the relevant tests here are updated to ensure that this passes on OSX. I think we're covered here.
  2. Have this be a runtime option. One possibility we could do is to make this off-by-default and have a testing period where you have to opt-in to see these backtraces. I would fear, however, that this'd work for everyone who turned it on and it wouldn't uncover many "corner cases".
  3. Wait for this to be fuzzed/better tested.

Y'all have already finished (1), I don't think we'd get much benefit from (2), and I don't think that (3) is well-defined enough to ever actually be finished/completed. In that sense while I still do personally worry about this I think it's something we should land and take under our maintenance wing. We've effectively done that for MinGW and libbacktrace, and this isn't really adding all that much extra from that point of view.

Perhaps one day we can switch to gimli to have a Rust-based solution, but I suspect that transition is still quite aways away at this point.

What do y'all think of that?

@kennytm
Copy link
Member Author

kennytm commented Sep 5, 2017

@alexcrichton

Have this be a runtime option. One possibility we could do is to make this off-by-default and have a testing period where you have to opt-in to see these backtraces.

You mean an option in additional to RUST_BACKTRACE=1?

@alexcrichton
Copy link
Member

In theory, yes, but again I don't think it's worth it.

@JohnColanduoni
Copy link
Contributor

@alexcrichton How hard would it be to use the gimli and object crates for this if they are made compatible with no_std? It looks like doing the latter wouldn't be terribly difficult (see gimli-rs/gimli#138), and they cover most of the hard parts. Both do not contain any unsafe code so the worst case would be that they malfunction and return no/wrong information.

@alexcrichton
Copy link
Member

@JohnColanduoni in theory not too hard, but we've never done it yet so I suspect it would take quite some time to integrate and get it fully implemented. I also think those aren't the parts that we'd need here? I thought that what this PR does is adding support for basically locating the dSYM file (or equivalent) and then using mach-o parsing to load the dwarf data, only to hand that off to the rest of libbacktrace. Isn't the "rest of libbacktrace" basically covered by gimli?

@kennytm
Copy link
Member Author

kennytm commented Sep 5, 2017

@JohnColanduoni Note that libbacktrace also handles symbolication on Windows (PE/COFF), which is out-of-scope for gimli.

I think a RIIR backtrace-sys replacement based on gimli + object + pdb + pelite could first be developed on crates.io. After that is shown to be working well, we could then think about how to do the integration.

@JohnColanduoni
Copy link
Contributor

@alexcrichton Yes, and ideally object would cover most of the code in this PR (it doesn't understand symbol tables or UUIDs at the moment, but that could be easily rectified). I was just curious as to whether that would be a good long-term solution.

@kennytm Yeah that was my thought as well. Is there a reason not to use DbgHelp when dealing with PDBs?

@alexcrichton
Copy link
Member

@kennytm and @JohnColanduoni do y'all still think we should land this at this moment? (I think so, myself)

@kennytm
Copy link
Member Author

kennytm commented Sep 6, 2017

@JohnColanduoni Thanks. I just searched for pdb on docs.rs 😄

@alexcrichton Yes I think so.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Sep 6, 2017

📌 Commit aa6bd11 has been approved by alexcrichton

@alexcrichton alexcrichton added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 6, 2017
@gilescope
Copy link
Contributor

Any way to get line numbers in OSX is better than no way. There's nothing to stop switching to a better implementation later. Line numbers are pretty backward compatible. (If we had to have this behind runtime flags / compile time flags that would do - we just need those line numbers).

Really looking forward to this in nightly!

@bors
Copy link
Contributor

bors commented Sep 9, 2017

⌛ Testing commit aa6bd11 with merge ddd123e...

bors added a commit that referenced this pull request Sep 9, 2017
Add libbacktrace support for Apple platforms (resubmitted)

Resubmitting #43422 rebased on the current master (cc @JohnColanduoni).

I have added an additional commit to fallback to `dladdr`-based `resolve_symbol` if `libbacktrace` returns `None`, otherwise the stack trace will be full of `<unknown>` when you forget to pass the `-g` flag (actually it seems — at least on macOS — the `dladdr` symbol is more accurate than the `libbacktrace` one).
@bors
Copy link
Contributor

bors commented Sep 9, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing ddd123e to master...

@bors bors merged commit aa6bd11 into rust-lang:master Sep 9, 2017
@kennytm kennytm deleted the osx-backtrace branch September 10, 2017 04:33
@alexcrichton alexcrichton added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 15, 2017
@mbrubeck
Copy link
Contributor

CC #24346

@behnam
Copy link
Contributor

behnam commented Nov 3, 2017

Could this be the cause of #45731, where on MacOS "cargo test with RUST_BACKTRACE causes SIGBUS"?

@whitequark
Copy link
Member

whitequark commented Jan 8, 2018

FYI I've made gimli #![no_std] compatible a while ago. object is slightly more annoying, but watch gimli-rs/object#45.

@JohnColanduoni
Copy link
Contributor

@whitequark Awesome! I'll try to take a crack at making this work this weekend. I'll sleep easier knowing if I introduced any more bugs like #45731 nobody is likely to find them 😉

@m4b m4b mentioned this pull request Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants