Skip to content

Comments

Add S3Executor strategy pattern for async S3 operations#684

Merged
laughingman7743 merged 3 commits intomasterfrom
refactor/s3-executor-strategy
Feb 22, 2026
Merged

Add S3Executor strategy pattern for async S3 operations#684
laughingman7743 merged 3 commits intomasterfrom
refactor/s3-executor-strategy

Conversation

@laughingman7743
Copy link
Member

Summary

  • Introduce S3Executor ABC with pluggable implementations (S3ThreadPoolExecutor for sync, S3AioExecutor for async), replacing hardcoded ThreadPoolExecutor in S3File and S3FileSystem
  • Add AioS3FileSystem / AioS3File using composition over inheritance — wraps sync S3FileSystem and dispatches via asyncio.to_thread + asyncio.gather, eliminating thread-in-thread nesting
  • Wire AioS3FSCursor to use AioS3FileSystem with pluggable filesystem_class in AthenaS3FSResultSet

Motivation

When aio cursors use S3FileSystem, operations get double-wrapped: the cursor wraps in asyncio.to_thread(), inside which S3FileSystem spawns more threads via ThreadPoolExecutor. The Strategy pattern abstracts the executor concern so that async paths use the event loop directly.

Changes

File Description
pyathena/filesystem/s3_executor.py New module: S3Executor ABC, S3ThreadPoolExecutor, S3AioExecutor
pyathena/filesystem/s3.py Replace ThreadPoolExecutor with S3Executor interface, add _create_executor() factory
pyathena/filesystem/s3_async.py New module: AioS3FileSystem (composition-based) and AioS3File (minimal subclass)
pyathena/aio/s3fs/cursor.py Wire AioS3FSCursor to use AioS3FileSystem
pyathena/s3fs/result_set.py Add pluggable filesystem_class parameter
tests/pyathena/filesystem/test_s3_async.py Comprehensive async test suite mirroring test_s3.py

Test plan

  • make chk passes (ruff lint, ruff format, mypy)
  • uv run pytest tests/pyathena/filesystem/ -v — 123 tests passed
  • Full CI pipeline

🤖 Generated with Claude Code

laughingman7743 and others added 3 commits February 22, 2026 16:31
Introduce S3Executor ABC with two implementations:
- S3ThreadPoolExecutor: wraps ThreadPoolExecutor (default for sync)
- S3AioExecutor: dispatches work via asyncio event loop

S3Executor supports the context manager protocol for safe resource
cleanup. S3FileSystem methods now use `with` statements instead of
manual try/finally for short-lived executors.

Replace hardcoded ThreadPoolExecutor usage in S3File and S3FileSystem
with the S3Executor interface. This enables async-aware parallel
operations without thread-in-thread nesting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement AioS3FileSystem as an fsspec AsyncFileSystem that composes
a sync S3FileSystem internally and dispatches operations via
asyncio.to_thread and asyncio.gather for parallelism.

AioS3File is a minimal S3File subclass — all async behavior is
provided by the injected S3AioExecutor, eliminating the need
for method overrides.

Wire AioS3FSCursor to use AioS3FileSystem and add pluggable
filesystem_class parameter to AthenaS3FSResultSet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive async test suite mirroring the sync S3FileSystem tests.
Covers filesystem operations (ls, info, find, du, glob, exists, rm,
touch, pipe/cat, put/get, cp, move), file-level read/write/append,
pandas CSV I/O, sync wrapper validation, and cache invalidation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@laughingman7743 laughingman7743 force-pushed the refactor/s3-executor-strategy branch from ee63541 to c23d060 Compare February 22, 2026 07:32
@laughingman7743 laughingman7743 marked this pull request as ready for review February 22, 2026 08:07
@laughingman7743 laughingman7743 merged commit 93ae890 into master Feb 22, 2026
15 checks passed
@laughingman7743 laughingman7743 deleted the refactor/s3-executor-strategy branch February 22, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add AioS3FileSystem using asyncio for parallel S3 operations

1 participant