Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM CPU feature cleanups #355

Merged
merged 10 commits into from
Mar 17, 2024
Merged

ARM CPU feature cleanups #355

merged 10 commits into from
Mar 17, 2024

Conversation

ebiggers
Copy link
Owner

  • checksum_benchmarks.sh: handle adler32_arm_neon_dotprod()
  • lib/arm: move selection of pmull_wide into arm_cpu_features
  • lib/arm: drop the arm32 support for pmull and crc32 instructions
  • lib/arm: simplify by not trying to skip target attributes
  • lib/arm: fix arm64 builds with -march=armv8-a+nosimd
  • lib/arm: centralize the intrinsic header inclusions
  • lib/arm: simplify conditions for detecting intrinsics
  • lib/arm: use asm fallback when clang intrinsics unusable
  • lib/arm: remove unnecessary NATIVE macros

Handle the selection of crc32_arm_pmullx12_crc using a CPU feature flag,
similar to X86_CPU_FEATURE_ZMM.  This allows the code to be tested on
platforms other than macOS.
Drop support for the pmull and crc32 optimized CRC-32 functions when
building for 32-bit ARM.  Not many people care about 32-bit ARM these
days, and these optimizations were always a struggle to keep working on
32-bit due to compiler issues.  They also only ever applied to
processors that support 64-bit too.
As was done in lib/x86/, use the target function attribute even if the
features are available natively, as this has no known downside.

Exception: this cannot be done for plain simd (NEON), since old versions
of clang don't accept the target attribute for it.
With MSVC it's necessary to assume that arm64 means NEON is available,
but this logic should not be applied generally because gcc and recent
versions of clang support arm64 without NEON.
Include all needed intrinsic headers from lib/arm/cpu_features.h so that
includes don't need to be scattered in other places.
- Don't check *_NATIVE or HAVE_DYNAMIC_ARM_CPU_FEATURES, since
  technically these are orthognal to intrinsic support.  It's true that
  when building for an operating system that doesn't have runtime CPU
  feature detection enabled, there is no use in using intrinsics except
  when the features are supported natively.  But we can still build the
  code; it just won't be called and will be optimized out as unused.

- Don't place conditions like defined(ARCH_ARM64) and !defined(_MSC_VER)
  on HAVE_SHA3_NATIVE and HAVE_DOTPROD_NATIVE.  These conditions are
  only relevant to intrinsics, not the CPU feature per se.
Instead of manually defining macros like __ARM_FEATURE_CRC32 to get the
intrinsic headers of clang 15 and earlier to work, just use inline
assembly.  This should be a better solution as it does not rely on clang
implementation details as much.

We already used an inline assembly fallback for veor3q_u8 with gcc 8,
and with clang 7 through 12.  This commit extends the same pattern to
the crc32 and dotprod intrinsics, and extends the version range to clang
15.  It also drops gcc 8 from the veor3q_u8 fallback, as that is just a
single major version and not worth enabling the fallback for.
Since most of the uses of the HAVE_*_NATIVE macros have been removed,
and most of them provide no additional value over the original
compiler-provided macro like __ARM_FEATURE_CRC32 anyway, there's not
much point in having them anymore.  Remove them, except for
HAVE_NEON_NATIVE which is still worthwhile to have.
@ebiggers ebiggers merged commit 45a5de7 into master Mar 17, 2024
52 checks passed
@ebiggers ebiggers deleted the dev branch March 17, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant