parallel_scheduler for P2079 Section 4.1 User Facing API (ref #6601)#6655
parallel_scheduler for P2079 Section 4.1 User Facing API (ref #6601)#6655charan-003 wants to merge 19 commits into
Conversation
|
Can one of the admins verify this patch? |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
cb9d1a7 to
1a77a29
Compare
|
Update:
@hkaiser — ready for re-review when convenient. Thanks for the guidance! |
|
@hkaiser Can you please verify it now? I'm not sure why those 2 tests are failing. |
hkaiser
left a comment
There was a problem hiding this comment.
Wow, I'm impressed! I have a couple of minor comments, though.
|
Also, you may want to move all types that are don't have to visible by users into |
Sure, I'll do that soon |
|
i'm not sure why it's failing for clang_format. i used |
|
@hkaiser The custom bulk chunking might duplicate functionality in Offers a parallel execution option that leverages HPX’s thread pool, maintaining high performance for CPU-bound tasks. While this version lacks P2079R7’s replaceability API (section 4.2) and many more, it could be extended to support it ( will try to work on this :)) |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
The CI uses an older version of clang-format that does not always agree with newer versions on how to format things. Simply surround the offending lines with to disable it checking those. |
|
@hkaiser @isidorostsa Implement parallel_scheduler wrapping thread_pool_policy_schedulerhpx::launch, aligning with P2079R7 user-facing API:
replaceability API , bulk operations is the next thing i will keep working on. |
|
@hkaiser @isidorostsa this version is using stdexec I'm not sure of the failing test case, any hint on the failing cases ? |
9803589 to
3b4121f
Compare
|
@hkaiser @isidorostsa |
Performance test reportHPX PerformanceComparison
Info
Comparison
Info
Comparison
Info
Explanation of Symbols
|
Performance test reportHPX PerformanceComparison
Info
Comparison
Info
Comparison
Info
Explanation of Symbols
|
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 3 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
Just FYI: https://wg21.link/p4031 |
|
@hkaiser @isidorostsa |
|
@charan-003 Could you please resolve the conflicts. I will review the PR once you've done that. |
ab0fc9b to
8623109
Compare
|
@hkaiser a signature mismatch for bulk_t: HPX uses |
3044549 to
dd64808
Compare
| hpx::execution::experimental::get_completion_signatures_t, | ||
| transform_mpi_sender const&, Env const&) | ||
| -> hpx::execution::experimental::transform_completion_signatures_of< | ||
| -> stdexec::__transform_completion_signatures_of_t< |
There was a problem hiding this comment.
This is not the right way of fixing this as it depends on internals of stdexec. Let's apply a correct fix to our forwarding header instead (see also #7252 for a potential different way of addressing this).
|
Inspect is not happy again: |
| [&]() { measure_parallel_foreach(data_representation, exec); }); | ||
| } | ||
|
|
||
| #if defined(HPX_HAVE_STDEXEC) |
There was a problem hiding this comment.
HPX_HAVE_STDEXEC is always defined.
|
|
||
| #if defined(HPX_HAVE_STDEXEC) | ||
| // Forward declaration for parallel_scheduler_domain | ||
| class parallel_scheduler; |
There was a problem hiding this comment.
Please mark all symbols that should be visible outside of HPX with HPX_CXX_CORE_EXPORT to ensure those are accessible through C++ module BMIs.
| { | ||
| return *sched.get_pu_mask(); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Why do you need a helper to extract the parameters if those are already exposed through the corresponding customization points?
| using params_type = | ||
| detail::thread_pool_params<std::decay_t<BaseScheduler>>; | ||
| auto* pool = params_type::pool(exec.sched_); | ||
| auto first_core = params_type::first_core(exec.sched_); | ||
| auto num_cores = params_type::num_cores(exec.sched_); | ||
| auto const& policy = params_type::policy(exec.sched_); | ||
| auto mask = params_type::pu_mask(exec.sched_); | ||
|
|
||
| return hpx::parallel::execution::detail:: | ||
| index_queue_bulk_async_execute(pool, first_core, | ||
| num_cores, policy, HPX_FORWARD(F, f), shape, mask, | ||
| HPX_FORWARD(Ts, ts)...); |
There was a problem hiding this comment.
This snippet could be extracted into a new function, which should simplify a lot of the code below.
| @@ -106,18 +140,38 @@ namespace hpx::execution::experimental { | |||
| hpx::execution::experimental::stdexec_internal::__sender_for< | |||
There was a problem hiding this comment.
Please move all stdexec specific to the forwarding header to simplify future use of standard library implementations.
Fixes #6601
Implement P2079R7 parallel_scheduler in HPX Core
This issue tracks the implementation of
hpx::execution::experimental::parallel_schedulerin HPX core, aligning with P2079R7 (Parallel scheduler, Section 4.1) by wrappingthread_pool_policy_schedulerwithhpx::launchpolicies. The goal is to provide a standards-compliant parallel scheduler with global access, cancellation support, task chaining, and bulk operations.Implementation Checklist
Phase 1: Core Implementation
Implement
hpx::execution::experimental::parallel_schedulerclass per P2079R7 Section 4.1.Add
schedule()returningparallel_senderwithsender_conceptandcompletion_signatures:Supports
set_value_t(),set_stopped_t(),set_error_t(std::exception_ptr).Wrap
thread_pool_policy_schedulerwithhpx::launch(async/sync policies).Implement
noexceptmove/copy constructors and assignment operators.Add
operator==returningtruefor scheduler comparison.Return
forward_progress_guarantee::parallelviaget_forward_progress_guarantee_t.Implement
thenoperation for task chaining withparallel_sender.Code to rely on
STDEXEC.Phase 2: Bulk Support
Extend
bulk_tcustomization to delegate tothread_pool_scheduler_bulk.Implement
bulk_senderfor parallel bulk execution per P2079R7.Support
bulk_chunkedandbulk_unchunkedoperations.Phase 3: Replaceability API
Implement
std::execution::system_context_replaceabilitynamespace per P2079R7 Section 4.2.Add
query_parallel_scheduler_backend()returningshared_ptr<parallel_scheduler>.Support link-time replaceability using weak symbols.
Implement
receiverandbulk_item_receiverinterfaces for frontend-backend interaction.Ensure
storagehandling for scheduling operations.P3927R2 extends C++26's task_scheduler (a type-erased scheduler wrapper) to support efficient parallel bulk operations. Previously, bulk work executed through task_scheduler would run sequentially even when wrapping a parallel_scheduler, defeating the purpose of parallel execution.
P3804R2 refines the parallel_scheduler specification (from P2079) to clarify execution policy semantics and eliminate implementation ambiguities