Initial implementation of np.unique(return_index=True)#1138
Initial implementation of np.unique(return_index=True)#1138JosephGuman wants to merge 3 commits intonv-legate:branch-24.03from
Conversation
… Joseph Guman <joeytg@stanford.edu>
rohany
left a comment
There was a problem hiding this comment.
In general, please add more comments to the code, especially in places where you have changed some logic in a non-trivial manner. Another good place for comments is when you introduce (reasonable) code duplication -- a small comment saying why this is the case is good for future readers.
cunumeric/array.py
Outdated
| """ | ||
| thunk = self._thunk.unique() | ||
| return ndarray(shape=thunk.shape, thunk=thunk) | ||
| thunk = self._thunk.unique(return_index) |
There was a problem hiding this comment.
[nit] -- i wouldn't call this thunk anymore, since when return_index=True it's not a thunk anymore, it's a tuple of thunks.
| task.add_scalar_arg(return_index, ty.bool_) | ||
|
|
||
| result = None | ||
| # Assuming legate core will always choose GPU variant |
There was a problem hiding this comment.
[nit]: Add a comment as to why the branches here are creating the thunk in different ways (i.e. different types)
There was a problem hiding this comment.
Extra nit -- comments should be complete sentences, so punctuation and periods please.
| if return_index: | ||
| returned_indices = self.runtime.create_unbound_thunk(ty.int64) | ||
| if self.runtime.num_gpus > 0: | ||
| task.add_output(returned_indices.base) |
There was a problem hiding this comment.
again, comment this asymmetry between the CPU and GPU implementations
| ) | ||
| if return_index: | ||
| task = self.context.create_auto_task(CuNumericOpCode.UNZIP) | ||
| task.add_input(result.base) |
There was a problem hiding this comment.
this is an unclear way of writing the code, adding result.base as input, overwriting result, and then adding result.base as output.
| int64_t index; | ||
| }; | ||
|
|
||
| // Surprisingly it seems as though thrust can't figure out this comparison |
There was a problem hiding this comment.
You might not need this if you either:
- implement comparison overloads on
ZippedIndex - use a builtin type like
std::pairinstead, which has equality implemented already
|
|
||
| if (return_index) { | ||
| indices.first.destroy(); | ||
| other_index.first.destroy(); |
There was a problem hiding this comment.
Should you be early releasing this above like other_piece?
There was a problem hiding this comment.
I originally wanted to destroy the buffers here largely for the sake of clean code to minimize the number of special blocks for the return_index case. Are you saying we should free these buffers closer to where we release my_piece and other_piece to potentially free up space before
my_piece.first = output.create_output_buffer<VAL, 1>(buf_size);
in line 182?
| sizeof(int64_t) * my_piece.second, | ||
| cudaMemcpyDeviceToDevice, | ||
| stream)); | ||
| merged_index.destroy(); |
There was a problem hiding this comment.
I do not think it is safe for you to do a cudaMemcpyAsync and then destroy merged_index before the cudaMemcpyAsync is guaranteed to be done.
There was a problem hiding this comment.
Updating here it is presently safe to do this in Legate as all operations on a given GPU use the same stream. Since this is a tree reduction I think it's probably best to destroy without manually synchronizing to avoid excess latency and duplicating data across multiple iterations of the reduction.
| case CUNUMERIC_MATVECMUL: | ||
| case CUNUMERIC_UNIQUE_REDUCE: { | ||
| case CUNUMERIC_UNIQUE_REDUCE: | ||
| case CUNUMERIC_UNZIP_INDICES: { |
There was a problem hiding this comment.
based on looking at the implementation of unzip_indices below, you don't need any special mapping for it and should fall to the default case at the bottom here.
…age when unzipping indices
for more information, see https://pre-commit.ci
| task.add_scalar_arg(return_index, ty.bool_) | ||
|
|
||
| result = None | ||
| # Assuming legate core will always choose GPU variant |
There was a problem hiding this comment.
Extra nit -- comments should be complete sentences, so punctuation and periods please.
PR motivating nv-legate/legate#942
I also decided to create a special task for unzipping indices and values as opposed to modifying legate.core's Reduce implementation after speaking with @magnatelee .
@rohany