Skip to content

feat: add metadata filters to gallery and search APIs#306

Open
upasana-2006 wants to merge 6 commits into
Abhash-Chakraborty:mainfrom
upasana-2006:feat-exif-gallery-filters
Open

feat: add metadata filters to gallery and search APIs#306
upasana-2006 wants to merge 6 commits into
Abhash-Chakraborty:mainfrom
upasana-2006:feat-exif-gallery-filters

Conversation

@upasana-2006

@upasana-2006 upasana-2006 commented Jun 9, 2026

Copy link
Copy Markdown

Summary

This PR adds EXIF-based metadata filtering support to the gallery endpoint, allowing users to refine gallery results using camera information and image attributes. This enhancement improves image discovery and organization, especially for users managing large local photo collections.

Fixes #301

Type of change

  • Bug fix
  • Feature
  • Documentation update
  • Refactor
  • CI / tooling

What changed

  • Added optional gallery filters for:

    • camera_make
    • camera_model
    • min_width
    • min_height
    • file_type
  • Applied filtering logic at the database query layer while preserving existing gallery behavior when filters are not provided.

  • Added tests covering all newly introduced metadata filtering functionality, including camera make/model, dimension-based filtering, and file type filtering.

  • Ensured all existing gallery tests continue to pass.

Screenshots / recordings (for UI changes)

N/A – Backend/API enhancement with no UI changes.

How to test

  1. Navigate to the backend directory:

    cd backend
  2. Install project dependencies:

    uv sync --group dev
  3. Run the gallery test suite:

    uv run pytest tests/test_gallery.py -v
  4. Verify linting and formatting:

    uv run ruff check .
    uv run ruff format --check .

Expected result:

  • All tests should pass successfully.
  • Ruff checks and formatting checks should complete without errors.

Checklist

  • I linked the related issue
  • I ran required checks from CONTRIBUTING.md
  • I updated docs/env notes if needed
  • My PR is scoped to a single issue
  • I followed commit message conventions
  • I am not committing secrets or local artifacts

GSSoC'26 checklist

  • I requested issue assignment before starting
  • I have meaningful commits (no spam commits)
  • I am ready to explain my implementation in review comments

Summary by CodeRabbit

  • New Features
    • Gallery now supports metadata filtering by camera make/model, minimum width/height, file type, upload date range, and orientation.
    • Search now supports the same metadata/EXIF filtering options.
  • Performance / Reliability
    • Search caching now varies results by the active metadata filters to prevent incorrect cached matches.
  • Bug Fixes
    • Improved validation for metadata filters, including HTTP 422 on invalid date ranges and unsupported orientation values.
  • Tests
    • Expanded gallery and search test coverage for valid/invalid filters, date-range behavior, EXIF-missing exclusion, and combined filtering.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Gallery and search endpoints now accept optional metadata filter query parameters for camera make/model, date range, minimum image dimensions, orientation, and file type. New helpers validate and apply these filters at the database layer. Query cache service incorporates filter keys to segregate cached responses by active metadata filters. Comprehensive tests verify filtering behavior, input validation, and combined filter interactions.

Changes

Metadata filtering for gallery and search

Layer / File(s) Summary
Gallery metadata filters
backend/src/find_api/routers/gallery.py, backend/tests/test_gallery.py
Adds parse_metadata_date() for timezone-aware ISO datetime parsing and apply_metadata_filters() to conditionally apply SQLAlchemy filters on EXIF make/model, created date range, minimum dimensions, orientation-based aspect ratio, and normalized file type. The get_gallery() endpoint extends with eight new optional query parameters. Tests verify filtering by camera metadata, date ranges, orientation logic, invalid input rejection (HTTP 422), exclusion of records with missing EXIF data, and combinations of multiple filters.
Query cache filter support
backend/src/find_api/services/query_cache.py
normalize_query(), get_cached_query(), and set_cached_query() each add a keyword-only filter_key parameter and incorporate it into the normalized cache key, ensuring cached responses vary by both OCR inclusion and metadata filter set.
Search metadata filters with cache integration
backend/src/find_api/routers/search.py, backend/tests/test_search.py
New _metadata_filter_sql() builds parameterized metadata filter SQL fragments, validates date ranges, and returns deterministic cache key data. The search_images() endpoint signature extends with eight filter parameters. Total-count and paginated ranking queries both include the generated filter fragment with bound parameters, and cache operations include the derived filter_key. Updated test helper enables SQL assertion; new tests verify correct SQL generation and HTTP 422 rejection of invalid filters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With dates and dimensions in tow,
Through gallery and search they flow,
EXIF data filtered just right,
And caches remember each sight! ✨📸

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding metadata filters to gallery and search APIs, which is the primary objective of the PR.
Description check ✅ Passed The PR description covers the required template sections, includes a linked issue, documents the changes, provides test instructions, and completes the checklists appropriately.
Linked Issues check ✅ Passed The PR implements camera_make, camera_model, min_width, min_height, file_type, date_from, date_to, and orientation filters for gallery and search with database-layer filtering, input validation, and comprehensive tests covering all major scenarios from issue #301.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing metadata filtering for gallery and search APIs plus query cache enhancement to support filter-based cache keys, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

PR Context Summary

Suggested issue links

  • No strong issue match found yet.

Use Fixes #123 or Closes #123 in the PR body when one of the suggestions is the intended issue.
Manual rerun: Actions > PR Context Triage > Run workflow > set pr_number and force_review=true.

@macroscopeapp

macroscopeapp Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Approvability

Verdict: Needs human review

Unable to check for correctness in d192e22. CodeQL has flagged potential SQL injection vulnerabilities in the search API's metadata filter implementation. Combined with an unresolved bug report about date filtering behavior, this new feature addition warrants security and functional review by a human.

You can customize Macroscope's approvability policy. Learn more.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
backend/src/find_api/routers/gallery.py (1)

110-120: ⚡ Quick win

Update docstring to document the new filter parameters.

The function docstring's Args section does not include the five new query parameters (camera_make, camera_model, min_width, min_height, file_type), making the API documentation incomplete.

📝 Proposed docstring update
     """
     Get paginated list of images

     Args:
         skip: Number of records to skip
         limit: Max number of records to return
         status: Filter by status (pending, processing, indexed, failed)
+        liked: Filter by liked status
+        camera_make: Filter by EXIF camera make (case-insensitive partial match)
+        camera_model: Filter by EXIF camera model (case-insensitive partial match)
+        min_width: Filter images with width >= this value
+        min_height: Filter images with height >= this value
+        file_type: Filter by file type (e.g., "jpeg", "png")

     Returns:
         Paginated list of media records
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/find_api/routers/gallery.py` around lines 110 - 120, The
docstring for the gallery endpoint is missing entries for the five new query
filters; update the function docstring in
backend/src/find_api/routers/gallery.py (the gallery GET handler function where
skip/limit/status are documented) to add Args descriptions for camera_make,
camera_model, min_width, min_height, and file_type, specifying their types (e.g.
str or int), purpose (filter by camera make/model, minimum width/height, and
file MIME/extension), and whether they are optional; keep the existing format
and ordering so generated API docs include these new query parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/find_api/routers/gallery.py`:
- Around line 128-142: Extract the filtering logic out of the router by creating
a service function (e.g., apply_media_filters or build_media_query) in a new or
existing backend service module (suggested name: media_query service), move the
camera_make/camera_model/min_width/min_height/file_type filter logic that
currently manipulates the local variable query and references the Media model
into that function, have it accept the SQLAlchemy Query and the filter
parameters and return the modified Query, then replace the inline filter block
in the gallery router with a single call to that service function and import the
service; ensure the service uses the same Media model names (Media, exif_json,
width, height, content_type) so references remain correct and add unit tests for
the service to keep the router thin and testable.
- Around line 101-107: The string query params camera_make, camera_model, and
file_type in the gallery router lack max length validation; update their Query
declarations (the parameters named camera_make, camera_model, file_type) to
include a safe max_length (e.g., 255) so extremely long inputs are rejected
before hitting the DB or consuming excessive memory; keep the descriptions and
Optional typing the same and ensure validation is enforced by FastAPI/Pydantic
via the Query(..., max_length=255) argument.

In `@backend/tests/test_gallery.py`:
- Around line 436-498: Add two tests to TestGalleryMetadataFilters to cover
missing EXIF and combined-filter behavior: implement
test_gallery_filters_exclude_missing_exif which seeds one record with exif_json
set to a dict and one with exif_json = None, calls GET /api/gallery with
camera_make and asserts only the record with EXIF appears; and implement
test_gallery_filters_combine_correctly which seeds canon_large, canon_small,
nikon_large, sets exif_json and widths appropriately, calls GET /api/gallery
with params camera_make="Canon" and min_width=1500 and asserts only canon_large
is returned; place these new tests alongside existing methods in the
TestGalleryMetadataFilters class to ensure API filtering excludes null exif_json
and combines filters with AND logic.

---

Nitpick comments:
In `@backend/src/find_api/routers/gallery.py`:
- Around line 110-120: The docstring for the gallery endpoint is missing entries
for the five new query filters; update the function docstring in
backend/src/find_api/routers/gallery.py (the gallery GET handler function where
skip/limit/status are documented) to add Args descriptions for camera_make,
camera_model, min_width, min_height, and file_type, specifying their types (e.g.
str or int), purpose (filter by camera make/model, minimum width/height, and
file MIME/extension), and whether they are optional; keep the existing format
and ordering so generated API docs include these new query parameters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4af23dd6-925e-4420-8d65-3b062957b760

📥 Commits

Reviewing files that changed from the base of the PR and between 6b9f935 and f26a251.

📒 Files selected for processing (2)
  • backend/src/find_api/routers/gallery.py
  • backend/tests/test_gallery.py

Comment thread backend/src/find_api/routers/gallery.py Outdated
Comment thread backend/src/find_api/routers/gallery.py Outdated
Comment thread backend/tests/test_gallery.py
@Abhash-Chakraborty Abhash-Chakraborty changed the title feat: add EXIF-based metadata filtering support to gallery feat: add metadata filters to gallery and search APIs Jun 19, 2026
@Abhash-Chakraborty Abhash-Chakraborty added gssoc26 Related to GirlScript Summer of Code 2026. backend FastAPI, database, storage, and API work api API contract, endpoint behavior, and response shape type:feature Feature PR. GSSoC type bonus. level:intermediate GSSoC difficulty level: intermediate. Base contributor points: 35. quality:clean Clean and maintainable PR. GSSoC contributor multiplier: 1.2x. ready-to-merge Fully approved, tested, and cleared for immediate merging. labels Jun 19, 2026
@github-actions

Copy link
Copy Markdown

@macroscope-app review

Please review this PR against its linked issue, local-first privacy rules, and the current Find repo instructions.
Linked issue(s): #301.
Trigger source: label-gated review (ready-to-merge).

@Abhash-Chakraborty Abhash-Chakraborty left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge now.

I reviewed this against #301. The original PR only added a subset of gallery filters, so I pushed a follow-up commit to complete the backend/API scope:

  • Added date_from / date_to filtering with safe 422 validation.
  • Added orientation filtering (landscape, portrait, square).
  • Added the metadata filters to /api/search as well as /api/gallery.
  • Updated the search cache key so filtered searches do not reuse unfiltered cached responses.
  • Added gallery and search regression tests for the new filters and invalid input handling.

Checks run:

  • uv run ruff format src/find_api/routers/gallery.py src/find_api/routers/search.py src/find_api/services/query_cache.py tests/test_gallery.py tests/test_search.py
  • uv run ruff check src/find_api/routers/gallery.py src/find_api/routers/search.py src/find_api/services/query_cache.py tests/test_gallery.py tests/test_search.py
  • uv run pytest tests/test_gallery.py tests/test_search.py -q

Result: 49 tests passed. Only the existing SQLAlchemy deprecation warning appeared.

Comment on lines +237 to +243
{metadata_filter_sql}
AND 1 - (vector <=> CAST(:embedding AS vector)) > :threshold
"""
""".format(metadata_filter_sql=metadata_filter_sql)
)
count_result = db.execute(
count_query, {"embedding": embedding_str, "threshold": threshold}
count_query,
{
Comment thread backend/src/find_api/routers/search.py Outdated

@Abhash-Chakraborty Abhash-Chakraborty left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix these warnings

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/find_api/routers/gallery.py`:
- Around line 77-93: The parse_metadata_date function sets date-only values
(like YYYY-MM-DD) to midnight (00:00:00), which causes the date_to comparison to
exclude records created later that same day. To fix this, detect when the input
is date-only (check if the raw_value contains no time component by checking for
absence of 'T' or colon separators), and for the date_to field specifically,
adjust the parsed datetime to the end of that day (23:59:59.999999 UTC) instead
of the start. You can determine if this is date_to by checking the field_name
parameter and adjusting the tzinfo replacement logic accordingly to set the time
to end-of-day for date_to fields only.

In `@backend/src/find_api/routers/search.py`:
- Around line 58-66: The filter_parts list is being constructed with raw
unescaped k=v pairs that are vulnerable to cache key collisions when values
contain the delimiter character "&". Special characters in filter values (like
camera_make, min_width, etc.) must be properly escaped or URL-encoded when
appending to filter_parts to prevent different filter combinations from
generating identical cache keys. Apply URL encoding using urllib.parse.quote or
similar to the value portion before constructing the filter_parts strings in all
locations where filter_parts.append is called with f-string patterns like
camera_make={value.lower()}, min_width={value}, max_width={value}, and similar
filter parameter constructions throughout the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24bbb12a-9358-473b-8ca1-80ea70b20289

📥 Commits

Reviewing files that changed from the base of the PR and between f26a251 and e480d01.

📒 Files selected for processing (5)
  • backend/src/find_api/routers/gallery.py
  • backend/src/find_api/routers/search.py
  • backend/src/find_api/services/query_cache.py
  • backend/tests/test_gallery.py
  • backend/tests/test_search.py

Comment thread backend/src/find_api/routers/gallery.py
Comment thread backend/src/find_api/routers/search.py
@Abhash-Chakraborty Abhash-Chakraborty added the gssoc:approved Valid GSSoC contribution approved for scoring. label Jun 20, 2026

@Abhash-Chakraborty Abhash-Chakraborty left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and fixed the remaining metadata-filter edge cases. Date-only end filters now include the full selected day, metadata cache keys are escaped safely, camera fields have length bounds, and focused backend checks pass: ruff check, ruff format --check, and pytest for gallery/search.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/tests/test_search.py`:
- Line 10: The function `_metadata_filter_sql` contains substantial database
logic and should not reside in the router file per coding guidelines. Move
`_metadata_filter_sql` from `find_api/routers/search.py` to a backend module
such as `find_api/services/metadata_filters.py` (or similar), then update the
import statement in this test file to import from the new backend module
location instead of from the router. Also update the import in
`routers/search.py` to reflect the new location so the router can use the
function from the backend module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 78f32adb-4a90-4f79-a931-688a18b12c27

📥 Commits

Reviewing files that changed from the base of the PR and between e480d01 and d192e22.

📒 Files selected for processing (4)
  • backend/src/find_api/routers/gallery.py
  • backend/src/find_api/routers/search.py
  • backend/tests/test_gallery.py
  • backend/tests/test_search.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/src/find_api/routers/gallery.py
  • backend/tests/test_gallery.py
  • backend/src/find_api/routers/search.py


from find_api.core.database import get_db
from find_api.main import app
from find_api.routers.search import _metadata_filter_sql

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Move _metadata_filter_sql to a backend module per coding guidelines.

The import reveals that _metadata_filter_sql contains substantial database logic (~80 lines of SQL building, parameter management, and validation) but lives in the router file. As per coding guidelines, FastAPI routers should be thin, with database logic placed in existing backend modules (e.g., backend/src/find_api/services/ or a new metadata_filters.py module).

Refactor by moving _metadata_filter_sql to a backend module and updating imports in both routers/search.py and this test file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/test_search.py` at line 10, The function `_metadata_filter_sql`
contains substantial database logic and should not reside in the router file per
coding guidelines. Move `_metadata_filter_sql` from `find_api/routers/search.py`
to a backend module such as `find_api/services/metadata_filters.py` (or
similar), then update the import statement in this test file to import from the
new backend module location instead of from the router. Also update the import
in `routers/search.py` to reflect the new location so the router can use the
function from the backend module.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API contract, endpoint behavior, and response shape backend FastAPI, database, storage, and API work gssoc:approved Valid GSSoC contribution approved for scoring. gssoc26 Related to GirlScript Summer of Code 2026. level:intermediate GSSoC difficulty level: intermediate. Base contributor points: 35. quality:clean Clean and maintainable PR. GSSoC contributor multiplier: 1.2x. ready-to-merge Fully approved, tested, and cleared for immediate merging. type:feature Feature PR. GSSoC type bonus.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add EXIF metadata filters for image gallery and search

3 participants