Skip to content

VFolder abstraction for object storage (S3) #665

@achimnol

Description

@achimnol

Currently we are supporting object storage (Amazon S3 or compatible ones) by allowing users to install S3-compatible client libraries inside containers and images and let their workloads connect to the external object storage services.

There are increasing demands on "managed" object storage abstraction both from customers and storage vendors. In Backend.AI, a managed storage space is abstracted as a vfolder.

Potential Mapping Designs

  • Bucket-to-VFolder
  • Subdirectory-to-VFolder

For simplicity, we’d choose the first one: bucket-to-vfolder.

Component Extensions

Storage Proxy

  • Let’s treat the connection configuration to a specific object storage service as a storage-proxy backend.
    • Backend type: s3-compatible ones (including MinIO), …
    • Each volume configuration includes the endpoint and the service credentials (like API keys).
  • cf) Each vfolder is mapped to a specific storage-proxy volume via its host field.

Manager

  • Depending on the storage backend type of a vfolder, we need to pass the object storage endpoint & credentials to the agent when creating sessions.
  • Similarly to the unmanaged vfolders (https://lablup.atlassian.net/browse/BA-114), we need to apply specialized lifecycle implementations. e.g.:
    • Decouple actual filesystem-level creation/deletion of vfolders.
    • Allow users to register/deregister vfolder entries assuming that the original bucket’s lifecycle is managed by the object-storage solution.

Agent

  • Before creating a container, the agent lets s3fs mount the bucket into the local filesystem and bind-mount it into the container.
  • After destroying a container, the agents unmounts the bucket.
  • We need to keep track of the references to a specific bucket to prevent duplicate mounts and premature unmounts when there are multiple containers using the same bucket.
  • We may need to handle potential system instability due to frequent mount/unmount in the filesystem.

Technical Considerations

  • How to implement read-only mounts?
  • In the storage-proxy, we need to keep mounts of ALL registered buckets to implement managed vfolder interaction APIs. Could this incur too much burden to the storage-proxy nodes?

JIRA Issue: BA-255

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions