Skip to content

refactor: make office docs optional and reduce core dependency footprint#255

Open
mlikasam-askui wants to merge 3 commits intomainfrom
refactor/remove-unnecessary-dependencies
Open

refactor: make office docs optional and reduce core dependency footprint#255
mlikasam-askui wants to merge 3 commits intomainfrom
refactor/remove-unnecessary-dependencies

Conversation

@mlikasam-askui
Copy link
Copy Markdown
Contributor

Summary

This PR reduces the default installation footprint and clarifies optional dependency usage.

Dependency changes

  • Removed markitdown from core dependencies and introduced optional extra: office_document
  • Updated all extra to include office-document (and no longer depend on the removed android extra)
  • Moved pure-python-adb into default dependencies
  • Removed bson from core dependencies

Runtime/code changes

  • generate_time_ordered_id() no longer uses bson.ObjectId; now builds IDs from time.time_ns() + UUID suffix
  • convert_to_markdown() now imports markitdown lazily and raises a clear install hint:
    • pip install "askui[office-document]"

Documentation/config updates

  • README now promotes minimal install (pip install askui) and explains optional extras
  • docs/10_extracting_data.md explicitly notes Excel/Word (OfficeDocumentSource) requires office-document
  • docs/01_setup.md updated Python requirement text
  • pyproject.toml/pdm.lock synchronized with new extras + deps
  • Removed stale mypy ignore section for bson

Why

This keeps the base package lighter, avoids forcing Office-conversion dependencies on all users, and makes Office document support explicit and discoverable.

- Move MarkItDown to new `office_document` extra and lazy-load in markdown conversion
- Remove bson usage; generate time-ordered IDs via `time_ns` + UUID fragment
- Promote `pure-python-adb` to default deps; replace `android` extra in `all`
- Relax Python constraint to `>=3.10` and align setup/readme docs
- Remove obsolete mypy ignore for `bson`
all = ["askui[android,bedrock,otel,vertex,web]"]
android = [
"pure-python-adb>=0.3.0.dev0"
all = ["askui[office-document,bedrock,otel,vertex,web]"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is android no longer part of the all group?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, because it is now a default, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s included by default, and we should start using pip install askui without the all option, especially on Windows, to improve installation speed.

str: Time-ordered ID string
"""

return f"{prefix}_{str(bson.ObjectId())}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know what bson was doing and what effects removing it has. Out of curiosity: can you maybe explain?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the reason why the SDK was not compatible with Python 3.14 and later, but now the imagehash library is the new issue causing incompatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants