Skip to content

How to get FW syndrome when using DEVX

Artemy-Mellanox edited this page Jan 5, 2022 · 3 revisions

When you see UCX error looking like UCX ERROR mlx5dv_xxx in many cases this means error reported by FW. Every such error has syndrome code which allows precisely identify error cause. Currently the only way to retrieve this syndrome code is enabling dynamic debug in mlx5 driver. Here is my recipe for this:

  • echo 'func mlx5_cmd_check +p' | sudo tee /sys/kernel/debug/dynamic_debug/control - enable dynamic debug
  • sudo dmesg -c > /dev/null - clear dmesg
  • execute command which yield the error
  • dmesg | tee dmesg.log - capture dmesg, it should contain the syndrome code of failed DEVX command
  • echo 'func mlx5_cmd_check +p' | sudo tee /sys/kernel/debug/dynamic_debug/control - disable dynamic debug

References:

Clone this wiki locally