WIP Add upload to azure functionality by Qi77Qi · Pull Request #356 · DataBiosphere/terra-notebook-utils

Qi77Qi · 2021-09-22T18:04:58Z

TODO:
Figure out how to use managed identity instead of access key for auth.

Tested on a terra VM

>>> drs.copy("drs://jade.datarepo-dev.broadinstitute.org/v1_0c86170e-312d-4b39-a0a4-2a2bfaa24c7a_c0e40912-8b14-43f6-9a2f-b278144d0060", "https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/subdir/jade1")
2021-09-22 05:50:19::INFO  Enabling requester pays for your workspace. This will only take a few seconds...
2021-09-22 05:50:19::WARNING  Failed to init requester pays for workspace qi-monitoring-1218/qi-ws-1: Expected '204', got '401' for 'https://rawls.dsde-prod.broadinstitute.org/api/workspaces/qi-monitoring-1218/qi-ws-1/enableRequesterPaysForLinkedServiceAccounts'. You will not be able to access DRS URIs that interact with requester pays buckets.
2021-09-22 05:50:21::INFO  Request URL: 'https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/subdir/jade1'
2021-09-22 05:50:21::INFO  Request method: 'PUT'
2021-09-22 05:50:21::INFO  Request headers:
2021-09-22 05:50:21::INFO      'x-ms-blob-type': 'REDACTED'
2021-09-22 05:50:21::INFO      'Content-Length': '62043448'
2021-09-22 05:50:21::INFO      'If-None-Match': '*'
2021-09-22 05:50:21::INFO      'x-ms-version': 'REDACTED'
2021-09-22 05:50:21::INFO      'Content-Type': 'application/octet-stream'
2021-09-22 05:50:21::INFO      'Accept': 'application/xml'
2021-09-22 05:50:21::INFO      'User-Agent': 'azsdk-python-storage-blob/12.9.0 Python/3.7.10 (Linux-5.4.104+-x86_64-with-debian-buster-sid)'
2021-09-22 05:50:21::INFO      'x-ms-date': 'REDACTED'
2021-09-22 05:50:21::INFO      'x-ms-client-request-id': '8e96e9a6-1bcd-11ec-b6fa-0242ac120005'
2021-09-22 05:50:21::INFO      'Authorization': 'REDACTED'
2021-09-22 05:50:21::INFO  A body is sent with the request
2021-09-22 05:50:23::INFO  Response status: 201
2021-09-22 05:50:23::INFO  Response headers:
2021-09-22 05:50:23::INFO      'Content-Length': '0'
2021-09-22 05:50:23::INFO      'Content-MD5': 'REDACTED'
2021-09-22 05:50:23::INFO      'Last-Modified': 'Wed, 22 Sep 2021 17:50:23 GMT'
2021-09-22 05:50:23::INFO      'ETag': '"0x8D97DF173D92DB6"'
2021-09-22 05:50:23::INFO      'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
2021-09-22 05:50:23::INFO      'x-ms-request-id': '565e8615-201e-00d6-41da-af7108000000'
2021-09-22 05:50:23::INFO      'x-ms-client-request-id': '8e96e9a6-1bcd-11ec-b6fa-0242ac120005'
2021-09-22 05:50:23::INFO      'x-ms-version': 'REDACTED'
2021-09-22 05:50:23::INFO      'x-ms-content-crc64': 'REDACTED'
2021-09-22 05:50:23::INFO      'x-ms-request-server-encrypted': 'REDACTED'
2021-09-22 05:50:23::INFO      'Date': 'Wed, 22 Sep 2021 17:50:23 GMT'
 DefaultEndpointsProtocol=https;AccountNa   100%   [========================================]   59.2MiB   17.9MiB/s   3.31s

Qi77Qi · 2021-09-22T18:15:39Z

 from tests import config
 from tests.infra.server import ThreadedLocalServer, BaseHTTPRequestHandler
-from terra_notebook_utils.http import HTTPAdapter, Retry, http_session
+from terra_notebook_utils.http_session import HTTPAdapter, Retry, http_session


I had to rename http file because of some naming conflict error

What was the conflict?

xbrianh

This looks like good progress, and it's exciting to see TNU support Azure storage :)

I have left a few comments and questions. Also, the test suites for blobstore and copy_client will need to be extended to cover Azure operations.

xbrianh · 2021-09-27T16:08:08Z

 4. Attach your terminal to the image via `docker exec -it test-image bash`, then navigate to the directory the code is mounted to via `cd /work`. Note that the above command ensures any changes you make to files in the repo will be updated in the image as well.
 5. log in with your Google credentials using `gcloud auth application-default login`,
-6. install requirements with `pip install -r requirements.txt`
+6. install requirements with `pip3 install -r requirements.txt`


Python developers typically work in Python virtual environments. Inside a Python 3 virtual environment you use pip and python, not pip3 and python3.

This line can be reverted, and we can rely on developers to understand which pip to use.

xbrianh · 2021-09-27T16:14:37Z

+from azure.identity import DefaultAzureCredential
+
+class AzureBlobStore(blobstore.BlobStore):
+    schema = "https://"


This schema is unfortunately awkward for TNU. Currently, the CLI command to copy a drs file into a Google bucket is

tnu drs copy drs://foo gs://my-bucket/my-key

Note the gs:// schema. The analogous command for an azure container looks awkward due to the https:// schema

tnu drs copy drs://foo https://some-weird-azure-url

A further complication is URL detection. TNU uses the schema to understand the storage provider for destination URLs. Logic for detecting Azure destinations will need to be added here.

xbrianh · 2021-09-27T16:27:33Z

+        self._azure_blob_client.delete_blob("include")
+
+    def exists(self):
+        return self._azure_blob_client.exists()


Multipart uploads are supported for both s3 and gs. It is typically more performant to upload large objects as parts, sometimes concurrently. Also, it my not be possible to upload a large object as a single part. For instance in S3 you cannot upload an object larger than 5GB with a single PUT.

How are large object uploads handled in Azure?

it seems like multi part is supported under the hood by the azure sdk...see this

xbrianh · 2021-09-27T16:28:51Z

 from tests import config
 from tests.infra.server import ThreadedLocalServer, BaseHTTPRequestHandler
-from terra_notebook_utils.http import HTTPAdapter, Retry, http_session
+from terra_notebook_utils.http_session import HTTPAdapter, Retry, http_session


What was the conflict?

Qi77Qi · 2021-09-27T19:42:01Z

@xbrianh the error was something like no module found: http..and I googled a bit for the error, and someone suggested it could be naming conflict, so I changed the name of that file

cleanup try enable logging err test test test

Qi77Qi · 2021-09-28T19:40:18Z

closing this PR in favor of #362 since this is from a fork

Qi77Qi commented Sep 22, 2021

View reviewed changes

Qi77Qi force-pushed the add-azure branch 3 times, most recently from 96fa58b to 4b98abe Compare September 22, 2021 20:28

xbrianh mentioned this pull request Sep 24, 2021

Passthrough copies verify md5 sums #359

Merged

xbrianh reviewed Sep 27, 2021

View reviewed changes

Qi77Qi force-pushed the add-azure branch 3 times, most recently from 441e48a to 0700125 Compare September 28, 2021 13:21

tmp

5af93a8

cleanup try enable logging err test test test

Qi77Qi force-pushed the add-azure branch from 0700125 to d9a5b5a Compare September 28, 2021 13:25

comments

cd070db

Qi77Qi force-pushed the add-azure branch from d9a5b5a to cd070db Compare September 28, 2021 13:28

Qi77Qi closed this Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Add upload to azure functionality#356

WIP Add upload to azure functionality#356
Qi77Qi wants to merge 2 commits intoDataBiosphere:masterfrom
Qi77Qi:add-azure

Qi77Qi commented Sep 22, 2021 •

edited

Loading

Uh oh!

Qi77Qi Sep 22, 2021

Uh oh!

xbrianh Sep 27, 2021

Uh oh!

xbrianh left a comment

Uh oh!

xbrianh Sep 27, 2021

Uh oh!

xbrianh Sep 27, 2021

Uh oh!

xbrianh Sep 27, 2021

Uh oh!

Qi77Qi Sep 27, 2021

Uh oh!

xbrianh Sep 27, 2021

Uh oh!

Qi77Qi commented Sep 27, 2021

Uh oh!

Qi77Qi commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Qi77Qi commented Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qi77Qi Sep 22, 2021

Choose a reason for hiding this comment

Uh oh!

xbrianh Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

xbrianh left a comment

Choose a reason for hiding this comment

Uh oh!

xbrianh Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

xbrianh Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

xbrianh Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

Qi77Qi Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

xbrianh Sep 27, 2021

Choose a reason for hiding this comment

Uh oh!

Qi77Qi commented Sep 27, 2021

Uh oh!

Qi77Qi commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qi77Qi commented Sep 22, 2021 •

edited

Loading