Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require hwloc-ohpc for all OSes #1851

Closed
wants to merge 11 commits into from

Commits on Jul 27, 2023

  1. Require hwloc-ohpc for all OSes

    Require chkconfig for RHEL and openEuler
    Point to the PMIx installation path
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    c714e4c View commit details
    Browse the repository at this point in the history
  2. Add BuildRequires for libevent-devel

    Error from Github builds:
    
    2023-07-27T11:43:46.9132267Z /bin/sh ../../libtool  --tag=CC   --mode=link gcc  -fPIC   -o pbs_mom pbs_mom-job_attr_def.o pbs_mom-node_attr_def.o pbs_mom-resc_def_all.o pbs_mom-shared_python_utils.o pbs_mom-mom_info.o pbs_mom-attr_recov.o pbs_mom-dis_read.o pbs_mom-jattr_get_set.o pbs_mom-nattr_get_set.o pbs_mom-job_func.o pbs_mom-process_request.o pbs_mom-reply_send.o pbs_mom-req_quejob.o pbs_mom-resc_attr.o pbs_mom-vnparse.o pbs_mom-setup_resc.o pbs_mom-mom_mach.o pbs_mom-mom_start.o pbs_mom-pe_input.o pbs_mom-catch_child.o pbs_mom-job_recov_fs.o pbs_mom-mock_run.o pbs_mom-mom_comm.o pbs_mom-mom_hook_func.o pbs_mom-mom_inter.o pbs_mom-mom_func.o pbs_mom-mom_main.o pbs_mom-mom_updates_bundle.o pbs_mom-mom_pmix.o pbs_mom-mom_server.o pbs_mom-mom_vnode.o pbs_mom-mom_walltime.o pbs_mom-popen.o pbs_mom-prolog.o pbs_mom-requests.o pbs_mom-stage_func.o pbs_mom-start_exec.o pbs_mom-vnode_storage.o pbs_mom-renew_creds.o  ../../src/lib/Libattr/libattr.a ../../src/lib/Liblog/liblog.a ../../src/lib/Libnet/libnet.a ../../src/lib/Libpbs/.libs/libpbs.a ../../src/lib/Libsec/libsec.a ../../src/lib/Libsite/libsite.a ../../src/lib/Libtpp/libtpp.a ../../src/lib/Libutil/libutil.a -L/opt/ohpc/pub/libs/hwloc/lib -Wl,-rpath,/opt/ohpc/pub/libs/hwloc/lib -lhwloc -L/opt/ohpc/admin/pmix/lib -lpmix  -L/usr/lib64 -lpython3.9 -lcrypt -ldl  -lm -lm  -lpython3.9 -lcrypt -ldl  -lm -lm  -lz -lssl -lcrypto   -ldl -lcrypt -lc
    2023-07-27T11:43:46.9138600Z libtool: link: gcc -fPIC -o pbs_mom pbs_mom-job_attr_def.o pbs_mom-node_attr_def.o pbs_mom-resc_def_all.o pbs_mom-shared_python_utils.o pbs_mom-mom_info.o pbs_mom-attr_recov.o pbs_mom-dis_read.o pbs_mom-jattr_get_set.o pbs_mom-nattr_get_set.o pbs_mom-job_func.o pbs_mom-process_request.o pbs_mom-reply_send.o pbs_mom-req_quejob.o pbs_mom-resc_attr.o pbs_mom-vnparse.o pbs_mom-setup_resc.o pbs_mom-mom_mach.o pbs_mom-mom_start.o pbs_mom-pe_input.o pbs_mom-catch_child.o pbs_mom-job_recov_fs.o pbs_mom-mock_run.o pbs_mom-mom_comm.o pbs_mom-mom_hook_func.o pbs_mom-mom_inter.o pbs_mom-mom_func.o pbs_mom-mom_main.o pbs_mom-mom_updates_bundle.o pbs_mom-mom_pmix.o pbs_mom-mom_server.o pbs_mom-mom_vnode.o pbs_mom-mom_walltime.o pbs_mom-popen.o pbs_mom-prolog.o pbs_mom-requests.o pbs_mom-stage_func.o pbs_mom-start_exec.o pbs_mom-vnode_storage.o pbs_mom-renew_creds.o -Wl,-rpath -Wl,/opt/ohpc/pub/libs/hwloc/lib  ../../src/lib/Libattr/libattr.a ../../src/lib/Liblog/liblog.a ../../src/lib/Libnet/libnet.a ../../src/lib/Libpbs/.libs/libpbs.a ../../src/lib/Libsec/libsec.a ../../src/lib/Libsite/libsite.a ../../src/lib/Libtpp/libtpp.a ../../src/lib/Libutil/libutil.a -L/opt/ohpc/pub/libs/hwloc/lib -L/opt/ohpc/admin/pmix/lib /opt/ohpc/admin/pmix/lib/libpmix.so -levent_core -levent_pthreads -lhwloc -L/usr/lib64 -lpython3.9 -lm -lz -lssl -lcrypto -ldl -lcrypt -lc -Wl,-rpath -Wl,/opt/ohpc/admin/pmix/lib -Wl,-rpath -Wl,/opt/ohpc/admin/pmix/lib
    2023-07-27T11:43:46.9140966Z /usr/bin/ld: cannot find -levent_core
    2023-07-27T11:43:46.9141302Z /usr/bin/ld: cannot find -levent_pthreads
    2023-07-27T11:43:46.9141589Z collect2: error: ld returned 1 exit status
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    9a3610b View commit details
    Browse the repository at this point in the history
  3. Temporarily use mpirun to start jobs in tests

    This way we could pass --mca parameters for more verbose debug
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 27, 2023
    Configuration menu
    Copy the full SHA
    8a31b74 View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2023

  1. Temporarily disable the changes for this PR

    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    d8d6d33 View commit details
    Browse the repository at this point in the history
  2. Pass "--mpi=pmix" to srun in the tests

    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    ff98626 View commit details
    Browse the repository at this point in the history
  3. Pass '--mpi=pmix' to srun via sbatch

    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    66f0724 View commit details
    Browse the repository at this point in the history
  4. Use 'prun` - it has logic to detect PMIx

    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    b07a201 View commit details
    Browse the repository at this point in the history
  5. Try with PMIx in SIMPLE_CI, to see what is the error

    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    1c7ea49 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2023

  1. Disable PMIx for Slurm

    To see how this will affect the SIMPLE_CI tests
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    cfcf049 View commit details
    Browse the repository at this point in the history
  2. Disable PMIx for OpenMPI

    Re-enable PMIx for Slurm
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    994054c View commit details
    Browse the repository at this point in the history
  3. Disable PMIx for Slurm

    It is already disabled for OpenMPI
    
    Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
    martin-g committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    23c92ed View commit details
    Browse the repository at this point in the history