Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram refactor #1003

Merged
merged 6 commits into from
Aug 2, 2023
Merged

Conversation

aschaffer
Copy link
Contributor

Refactoring done:

  1. unify the 3 code paths GPU/CPU/OMP into one implementation.
  2. use SFINAE specialization on decltype(ex_pol_obj), which is less likely to change in future CTK releases of Thrust.

@manopapad manopapad added the category:improvement PR introduces an improvement and will be classified as such in release notes label Jul 25, 2023
Copy link
Contributor

@manopapad manopapad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's beautiful. One comment to try and avoid a problematic case from mixing macros and templates.


auto&& [global_result_size, global_result_ptr] = accessors::get_accessor_ptr(result, result_rect);

#ifdef CUDA_STREAM_CHECKER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead guard this with something like

if constexpr (!std::is_same_v<exe_policy_t, std::remove_cv_t<decltype(thrust::host)>> &&
              !std::is_same_v<exe_policy_t, std::remove_cv_t<decltype(thrust::omp::par)>)

and possibly abstract-out this condition into a helper template struct like is_host_policy_v? This may be generally useful if we start reusing this pattern in other operations as well.

Using a macro like this to control the instantiation of a template makes me a bit uneasy. It's conceptually possible that another compilation unit could instantiate histogram_wrapper<thrust::cuda::par_nosync, ...> without defining CUDA_STREAM_CHECKER, then we end up with two instantiations of the same template with different contents.

This is kind of a far-fetched scenario for this particular case, but if it's easy to avoid it and keep the code robust, might as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if constexpr(...) cannot be a substitute for a pre-processor macro, like #ifdef. That's because the compile-time false path of an if constexpr(...) still gets compiled .

For example (see https://en.cppreference.com/w/cpp/language/if):

void f()
{
    if constexpr(false)
    {
        int i = 0;
        int *p = i; // Error even though in discarded statement
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only option to #ifdef CUDA_STREAM_CHECKER is specializing a function template

template <typename exe_policy_t> synchronize(exe_policy_t exe_pol){...}

The problem is function template cannot be partially specialized. They need to be fully specialized. And neither the host nor the gpu policies are unique. That means having to explicitly specialize on every possible thrust execution policy.

But, I can provide a template class to that effect, which can be partially specialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compile-time false path of an if constexpr(...) still gets compiled .

Would histogram_wrapper fall into the exception mentioned later in https://en.cppreference.com/w/cpp/language/if?

If a constexpr if statement appears inside a templated entity, and if condition is not value-dependent after instantiation, the discarded statement is not instantiated when the enclosing template is instantiated.

I think it does, by virtue of being a function template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work in one compiler at least https://godbolt.org/z/v6b4veGrz

Copy link
Contributor Author

@aschaffer aschaffer Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

histogram.cc, and histogram_omp.cc fail to compile for me with the if constexpr(...) approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In file included from /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram.cc:21:
    /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h: In function 'void cunumeric::detail::histogram_wrapper(exe_policy_t, legate::AccessorRO<elem_t, 1>&, legate::Rect<1>&, legate::AccessorRO<bin_t, 1>&, legate::Rect<1>&, legate::AccessorRO<weight_t, 1>&, legate::Rect<1>&, legate::AccessorRD<Legion::SumReduction<weight_t>, true, 1>&, legate::Rect<1>&, cudaStream_t)':
    /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:145:52: error: there are no arguments to 'CHECK_CUDA_STREAM' that depend on a template parameter, so a declaration of 'CHECK_CUDA_STREAM' must be available [-fpermissive]
      145 |   if constexpr (!is_host_policy_v<exe_policy_t>) { CHECK_CUDA_STREAM(stream); }
          |                                                    ^~~~~~~~~~~~~~~~~
    /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:145:52: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
    /home/aschaffer/Development/Projects/cuNumeric/forked/fea_ext_histogram/histogram_refactor/cunumeric/src/cunumeric/stat/histogram_impl.h:156:52: error: there are no arguments to 'CHECK_CUDA_STREAM' that depend on a template parameter, so a declaration of 'CHECK_CUDA_STREAM' must be available [-fpermissive]
      156 |   if constexpr (!is_host_policy_v<exe_policy_t>) { CHECK_CUDA_STREAM(stream); }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like I was giving if constexpr too much credit.

But, I can provide a template class to that effect, which can be partially specialized.

Would this still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can use a sync_policy_t which then can be specialized and used similarly to segmented_sum_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

Copy link
Contributor

@manopapad manopapad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, clean code we can be proud of

@aschaffer
Copy link
Contributor Author

Excellent, clean code we can be proud of

Thanks!

@aschaffer aschaffer merged commit 04ff6d1 into nv-legate:branch-23.09 Aug 2, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:improvement PR introduces an improvement and will be classified as such in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants