Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in checkResultsOfRunWorker in RelVals 1001.3, 1040.1, 1041.0, 1042.0 #44887

Closed
iarspider opened this issue May 2, 2024 · 12 comments · Fixed by #44891
Closed

Comments

@iarspider
Copy link
Contributor

Probably an expected result of merging #44767:

#0  0x000015084d7ac301 in poll () from /lib64/libc.so.6
#1  0x0000150848ed811f in full_read.constprop () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x0000150848e8c5dc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x0000150848e8cf40 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000015084d6d6acf in raise () from /lib64/libc.so.6
#6  0x000015084d6a9ea5 in abort () from /lib64/libc.so.6
#7  0x000015084d6a9d79 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#8  0x000015084d6cf426 in __assert_fail () from /lib64/libc.so.6
#9  0x0000150850162740 in edm::Path::workerFinished(std::__exception_ptr::exception_ptr const*, unsigned int, edm::EventTransitionInfo const&, edm::ServiceToken const&, edm::StreamID const&, edm::StreamContext const*, tbb::detail::d1::task_group&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x0000150850162a52 in edm::FunctorWaitingTask<edm::Path::runNextWorkerAsync(unsigned int, edm::EventTransitionInfo const&, edm::ServiceToken const&, edm::StreamID const&, edm::StreamContext const*, tbb::detail::d1::task_group&)::{lambda(std::__exception_ptr::exception_ptr const*)#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x0000150850315f28 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-05-01-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#12 0x000015084e91695b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x15084bef2d00, waiter=..., this=0x15084bfc9500) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#13 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x15084bfc9500) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/arena.cpp:137
#15 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/market.cpp:599
#16 0x000015084e918b0e in tbb::detail::r1::rml::private_worker::run (this=0x150849874000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#17 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x150849874000) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#18 0x000015084da551ca in start_thread () from /lib64/libpthread.so.0
#19 0x000015084d6c1e73 in clone () from /lib64/libc.so.6

(full logs: 1001.3, 1040.1, 1041.0, 1042.0 )

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

A new Issue was created by @iarspider.

@Dr15Jones, @makortel, @rappoccio, @antoniovilela, @smuzaffar, @sextonkennedy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@iarspider
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

cmsbuild commented May 2, 2024

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

makortel commented May 2, 2024

Likely caused by #44767, #44840, or their interplay. #44767 seems to be playing more important role, so I opened #44889 to revert it until the cause is understood.

@makortel
Copy link
Contributor

makortel commented May 2, 2024

I closed the aforementioned revert PR as we came up with a proper fix for #44888 in #44891 . Further testing of #44891 shows it does not fix this problem.

@makortel
Copy link
Contributor

makortel commented May 2, 2024

@wddgit Could you take a look? This problem seems to be caused by #44840

@wddgit
Copy link
Contributor

wddgit commented May 2, 2024

I'll look at this immediately after I finish lunch.

@makortel
Copy link
Contributor

makortel commented May 2, 2024

I inspected the workflow 1001.3 step 3 with gdb (including #44891). What happens is the exception added in #44767 gets thrown

(gdb) where
#0  0x00007ffff44412f1 in __cxxabiv1::__cxa_throw (obj=0x6160035f8600, tinfo=0x7ffff63364b0 <typeinfo for edm::Exception>, dest=0x7ffff6212490 <edm::Exception::~Exception()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x00007ffff6d994ba in edm::EventSelector::initPathNames(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) [clone .cold] ()
   from /build/mkortela/debug/issue44888/CMSSW_14_1_ASAN_X_2024-05-01-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#2  0x00007ffff705e3cc in edm::EventSelector::acceptEvent (this=this@entry=0x6120003c2ac0, tr=...) at src/FWCore/Framework/src/EventSelector.cc:246
#3  0x00007ffff74b37a1 in edm::detail::NamedEventSelector::match (product=..., this=0x6120003c2a40) at src/FWCore/Framework/interface/TriggerResultsBasedEventSelector.h:35
#4  edm::detail::TriggerResultsBasedEventSelector::wantEvent (this=this@entry=0x6030026319d0, ev=...) at src/FWCore/Framework/src/TriggerResultsBasedEventSelector.cc:118
#5  0x00007ffff71ae18b in edm::core::OutputModuleCore::prePrefetchSelection (this=<optimized out>, id=..., ep=..., mcc=<optimized out>)
    at src/FWCore/Framework/src/OutputModuleCore.cc:294
#6  0x00007ffff74c431e in operator() (__closure=<synthetic pointer>) at src/FWCore/Framework/src/Worker.cc:168
#7  edm::convertException::wrap<edm::Worker::prePrefetchSelectionAsync(tbb::detail::d1::task_group&, edm::WaitingTask*, const edm::ServiceToken&, edm::StreamID, const edm::EventPrincipal*)::<lambda(const std::__exception_ptr::exception_ptr*)>::<lambda()> > (iFunc=...)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-01-2300/src/FWCore/Utilities/interface/ConvertException.h:21
#8  operator() (__closure=0x60800357ce40) at src/FWCore/Framework/src/Worker.cc:167
#9  0x00007ffff635c3bf in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) ()
   from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-01-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#10 0x00007ffff4ca4281 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff07cbe00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff07cbe00)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#12 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#13 0x00007ffff6f51acc in tbb::detail::d1::wait (ctx=..., wait_ctx=...)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/detail/_task.h:197
#14 tbb::detail::d1::task_group_base::wait()::{lambda()#1}::operator()() const (__closure=0x7ffffffefc70)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/task_group.h:582
#15 tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) (on_completion_body=..., this=0x7ffffffefc70)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/detail/_template_helpers.h:230
#16 tbb::detail::d1::task_group_base::wait (this=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/task_group.h:583
#17 edm::FinalWaitingTask::wait (this=this@entry=0x7ffffffeff70)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-01-2300/src/FWCore/Concurrency/interface/FinalWaitingTask.h:42
#18 0x00007ffff6ef22e3 in edm::EventProcessor::processRuns (this=0x61a0001a5880) at src/FWCore/Framework/src/EventProcessor.cc:1207
#19 0x00007ffff6f239fe in edm::(anonymous namespace)::RunsInFileProcessor::processRuns (iEP=..., this=0x7fffffff0308) at src/FWCore/Framework/src/TransitionProcessors.icc:84
#20 edm::(anonymous namespace)::FilesProcessor::processFiles (iEP=..., this=0x7fffffff0300) at src/FWCore/Framework/src/TransitionProcessors.icc:115
#21 operator() (__closure=<synthetic pointer>) at src/FWCore/Framework/src/EventProcessor.cc:938
#22 edm::convertException::wrap<edm::EventProcessor::runToCompletion()::<lambda()> > (iFunc=...)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-01-2300/src/FWCore/Utilities/interface/ConvertException.h:21
#23 edm::EventProcessor::runToCompletion (this=0x61a0001a5880) at src/FWCore/Framework/src/EventProcessor.cc:927
#24 0x000000000040bb65 in operator() (__closure=0x7fffffff18c0) at src/FWCore/Framework/bin/cmsRun.cpp:283
#25 tbb::detail::d1::task_arena_function<main(int, char const**)::<lambda()>::<lambda()>, void>::operator()(void) const (this=<optimized out>)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/task_arena.h:68
#26 0x00007ffff4c909ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...)
    at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_1_0_pre3-el8_amd64_gcc12/build/CMSSW_14_1_0_pre3-build/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/tbb-v2021.9.0/src/tbb/arena.cpp:688
#27 0x000000000040f71b in tbb::detail::d1::task_arena::execute_impl<void, main(int, char const**)::<lambda()>::<lambda()> > (f=..., this=0x7fffffff1870)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/task_arena.h:250
#28 tbb::detail::d1::task_arena::execute<main(int, char const**)::<lambda()>::<lambda()> > (f=..., this=0x7fffffff1870)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02835/el8_amd64_gcc12/external/tbb/v2021.9.0-5849be8e21b090e14f1b189539cee138/include/oneapi/tbb/task_arena.h:403
#29 operator() (__closure=0x7fffffff20a0) at src/FWCore/Framework/bin/cmsRun.cpp:262
#30 0x00000000004083b5 in edm::convertException::wrap<main(int, char const**)::<lambda()> > (iFunc=...)
    at /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ASAN_X_2024-05-01-2300/src/FWCore/Utilities/interface/ConvertException.h:19
#31 main (argc=<optimized out>, argv=<optimized out>) at src/FWCore/Framework/bin/cmsRun.cpp:104

and after that the assertion in


fails.

So on closer look it does not seem to be caused by #44840 (unless there is some deeper connection)

@makortel
Copy link
Contributor

makortel commented May 2, 2024

This problem is fixed now in #44891

@makortel
Copy link
Contributor

makortel commented May 3, 2024

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented May 3, 2024

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants