Add multipart parallel upload support for s3#26
Conversation
| kwargs['Metadata'] = metadata | ||
| mpu = self.s3_client.create_multipart_upload(Bucket=bucket, Key=key, **kwargs) | ||
|
|
||
| size = src_file_handle.getbuffer().nbytes |
There was a problem hiding this comment.
getbuffer() appears to only exist for bytesio.
There was a problem hiding this comment.
Thanks, I'll make size a parameter
| part_size: int, | ||
| content_type: str=None, | ||
| metadata: dict=None, | ||
| parallelization_factor=8) -> typing.Sequence[dict]: |
There was a problem hiding this comment.
| parallelization_factor=8) -> typing.Sequence[dict]: | |
| parallelization_factor: int=8, | |
| ) -> typing.Sequence[dict]: |
| metadata: dict=None, | ||
| parallelization_factor=8) -> typing.Sequence[dict]: | ||
| """ | ||
| Upload a file object in parallel. |
| Upload a file object in parallel. | ||
| :param bucket: | ||
| """ | ||
| kwargs: dict = dict() |
There was a problem hiding this comment.
Is this type annotation actually useful? I would assume that kwargs = dict() conveys the same information.
Possibly this may be more useful?
| kwargs: dict = dict() | |
| kwargs: Dict[str, Any] = dict() |
| raise BlobNotFoundError(f"Could not find s3://{bucket}/{key}") from ex | ||
| raise BlobStoreUnknownError(ex) | ||
|
|
||
| def multipart_parallel_upload( |
There was a problem hiding this comment.
The biggest design concern I have with this is that if you eventually support a GCP multipart upload, this API is not particularly generalized. It might be worth studying what parameters need to be passed in for GCP multipart upload, and moving the common parameters up to the front, and the S3-specific ones to the end.
| part_size, | ||
| ) | ||
| part_size = 5 * 1024 * 1024 | ||
| with self.subTest("should work parallelization factor of 1"): |
There was a problem hiding this comment.
Not sure how useful of a test this is...
This optimization is usefu for the DSS, and probably has a wider audience.
459e507 to
f8b4564
Compare
This is useful for optimizing DSS endpoints, and may have a wider audience.