Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Enable more warning flags. #249

Merged
merged 11 commits into from
Feb 16, 2021

Conversation

alliepiper
Copy link
Collaborator

Fixes #228.

@alliepiper alliepiper marked this pull request as draft December 11, 2020 02:44
@alliepiper alliepiper added this to the 1.12.0 milestone Dec 11, 2020
@alliepiper alliepiper linked an issue Jan 9, 2021 that may be closed by this pull request
@alliepiper alliepiper force-pushed the enh/pedantic_flags/gh.cub228 branch 2 times, most recently from 06366b2 to 8983abe Compare January 23, 2021 02:55
@alliepiper alliepiper marked this pull request as ready for review January 27, 2021 19:26
append_option_if_available("-Wgnu" cxx_compile_options)
# Calling a variadic macro with zero args is a GNU extension until C++20,
# but the THRUST_PP_ARITY macro is used with zero args. Need to see if this
# is a real problem worth fixing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be a problem as every compiler under the sun supports the extension, including NVRTC I believe.

@brycelelbach
Copy link
Collaborator

brycelelbach commented Feb 8, 2021

You should re-run this with CUB in -DDEBUG debugging mode - you'll hit a LOT more of the assignment in conditional warning.

@brycelelbach
Copy link
Collaborator

The whole if (CubDebug(error = ...)) break pattern should probably be encapsulated in a macro.

this->block_end = num_items_; // Initialize past-the-end
this->num_items = num_items_;
this->total_tiles = (num_items_ + tile_items - 1) / tile_items;
this->grid_size = CUB_MIN(static_cast<int>(total_tiles), max_grid_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could overflow, because total_tiles is OffsetT which could be 64bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be related to #221.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer us to either make sure all the static casts don't overflow, or move them to a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this and similar issues in the thrust/cub kernel dispatch code with a new cub::DivideAndRoundUp method that avoids these (n + d - 1) / d overflows.

OffsetT avg_tiles_per_block = total_tiles / grid_size;
this->big_shares = total_tiles - (avg_tiles_per_block * grid_size); // leftover grains go to big blocks
// leftover grains go to big blocks
this->big_shares = static_cast<int>(total_tiles - (avg_tiles_per_block * grid_size));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason big_share can't be an OffsetT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look and if anything, total_tiles should be changed to a 32-bit int since it's used as a block dimension and those can't exceed INT32_MAX. The only usages of the variable assume it is a 32-bit int.

I'll change this so that total_tiles is appropriately sized, which will eliminate the need for this (and a few other) static_casts.

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would deprecate or remove this entirely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -114,7 +114,7 @@ struct half_t
{ /* normal */
ir |= ia >> (24 - 11);
ia = ia << (32 - (24 - 11));
ir = ir + ((14 + shift) << 10);
ir = static_cast<uint16_t>(ir + ((14 + shift) << 10));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably okay because it's just in the test code, but I'm pretty sure this is actually a bug due to potential overflow.

TestDispatch<T, OffsetT, LengthT>(num_items);
unsigned int num;
RandomBits(num);
num = (unsigned int) ((double(num) * double(10000000)) / double(max_int));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're changing this already, maybe static_cast here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

TestOp<InputT>(num_items, identity, initial_value);
unsigned int num;
RandomBits(num);
num = (unsigned int) ((double(num) * double(10000000)) / double(max_int));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe static_cast here as a drive-by fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@brycelelbach brycelelbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, I'd prefer to split the static_cast into a separate PR, or make sure they don't overflow, maybe add a comment to each explaining why they're fine.

@alliepiper
Copy link
Collaborator Author

You should re-run this with CUB in -DDEBUG debugging mode - you'll hit a LOT more of the assignment in conditional warning.

Updated #1175 with this recommendation.

@alliepiper
Copy link
Collaborator Author

This looks fine, I'd prefer to split the static_cast into a separate PR, or make sure they don't overflow, maybe add a comment to each explaining why they're fine.

After fixing #221, I reviewed the remaining usages and all are safe. Added comments to any non-obviously-safe ones.

Anonymous structs are C features. In C++, they're non-portable
compiler extensions.

These only seemed to pop up in CUB-style `TempStorage` objects, I just
picked some reasonable sounding names for them.
We need to deprecate this class since the underlying CUDA APIs
are deprecated. This suppression is a temporary workaround.

Tracked by NVIDIA#191.
Changing to a `static_cast` fixes this warning.
The `cuda_std_17` compile feature is broken for MSVC when
CMake < 3.18.3.
Users have been reporting that device algorithms return invalid
`temp_storage_bytes` values when `num_items` is close to -- but
not over -- INT32_MAX.

This is caused by an overflow in the numerator of the pattern
`num_tiles = (num_items + items_per_tile - 1) / items_per_tile`.

The new function implements the same calculation but protects against
overflow.

Fixes NVIDIA#221.
Bug 3075796
This value will always be representable with an int, and all usages
of it treat it as a 32-bit int. Changing the type avoids some casts
at usage sites.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants