Skip to content

DAGE-14: Integration tests - working on Docker + Solr testing#172

Open
Intrinsical-AI wants to merge 3 commits into
dataset-generatorfrom
DAGE-14/docker-integration-test
Open

DAGE-14: Integration tests - working on Docker + Solr testing#172
Intrinsical-AI wants to merge 3 commits into
dataset-generatorfrom
DAGE-14/docker-integration-test

Conversation

@Intrinsical-AI
Copy link
Copy Markdown
Collaborator

  • Relocated some files + cleaning (Docker Compose to project root folder; solr-init also)

  • Test stack added

  • Added tests/integration/docker-compose.yml (Solr 9.8.1 + solr-precreate testcore) to keep CI fast and isolated. Simpler than the original

  • Pytest fixtures (tests/conftest.py):

  • docker_compose_file – tells pytest-docker where the compose lives.
  • solr_url – waits for testcore to be responsive and returns the base URL.
  • seed_dataset – idempotently seeds dataset.json when the core is empty.
  • Integration tests (tests/integration/test_docker_solr.py)

See: https://sease.atlassian.net/browse/DAGE-14?focusedCommentId=16343

from src.search_engine.solr_search_engine import SolrSearchEngine

def test_core_exists(solr_url):
r = requests.get(solr_url + "../admin/cores?action=STATUS&wt=json")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this ../?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solr_url # → "http://127.0.0.1:8983/solr/testcore/"

But we want to access: "http://127.0.0.1:8983/solr/admin/cores?action=STATUS"

Web directory traversal: .. it's the same as cd .. in Bash

We're going up 1 level in the path

Copy link
Copy Markdown
Collaborator

@nicolo-rinaldi nicolo-rinaldi Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we want to access that url? I was assuming that we test if the specific testcore was in the collection

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change this? what are the benefits?

Copy link
Copy Markdown
Collaborator Author

@Intrinsical-AI Intrinsical-AI Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last version was a bit heavy for the pytest (as we decided to do the migration to a specilized python library), I assume it's supposed to be the real docker-compose.yml for the application, in that case that version should live in the root folder, and we need to add a simpler version for the testing in the integration folder. That's what I've done. I'm sure we can discuss a better solution. Check:
https://sease.atlassian.net/jira/software/projects/DAGE/boards/84?selectedIssue=DAGE-14
and feel free to ping me on Slack!

Comment thread rre-dataset-generator/tests/conftest.py
Comment thread rre-dataset-generator/tests/conftest.py
Comment thread rre-dataset-generator/tests/integration/resources/llm_config.yaml Outdated
Comment thread rre-dataset-generator/tests/integration/docker-compose.yml Outdated

def test_search_engine_fetch(solr_url):
engine = SolrSearchEngine(solr_url)
docs = engine.fetch_for_query_generation(None, 3, ["title","body"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have "body" in our solr documents

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

Copy link
Copy Markdown
Collaborator Author

@Intrinsical-AI Intrinsical-AI Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not realize due to silent behaviour coming from engine.fetch_for_query_generation - that accept every string as field, even it's not registered in the index, and does not lof any warning message / error nor exception

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to change it

context: solr-init
depends_on:
- solr
command: ["sh", "-c", "/rre-dataset-generator/solr-init/solr-init.sh"]
Copy link
Copy Markdown
Contributor

@nseidan nseidan Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Intrinsical-AI we shouldn't move the docker-compose file to a root folder, it's a test env; not a service we produce, it is coming from user side; also our project is not a docker-based service, it is just CLI, no need to put it into the root folder. cc @dantuzi

Copy link
Copy Markdown
Collaborator Author

@Intrinsical-AI Intrinsical-AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Naz, hanks for the feedback! this is a research task. It's a proposal for using python-docker.

I had already tagged you in the Jira explaining it.
https://sease.atlassian.net/browse/DAGE-14?focusedCommentId=16413

Please, check the board before review! 👍

- solr
command: ["sh", "-c", "/opt/rre-dataset-generator/solr-init.sh"]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8983/solr/testcore/admin/ping"]
Copy link
Copy Markdown
Contributor

@nseidan nseidan Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Intrinsical-AI we don't need a healthcheck here, already handled in solr-init.sh

Copy link
Copy Markdown
Collaborator Author

@Intrinsical-AI Intrinsical-AI Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This research proposal suggest using python-docker, the one you mentioned during a call. if so, the checks of solr-init.sh will be moved to a proper pytests file that uses fixtures. The information it's here: https://sease.atlassian.net/browse/DAGE-14?focusedCommentId=16413
  2. This is not yet accepted - it's the proposal on "how" to do Integration tests of Solr - Docker.
  3. In general docker compose would be better to be on the root.
  4. in general: it's a good practice to implement healthheck on the docker compose, even you "handle" it outside (too).

I guess we will have to discuss in detail, but if you can propose a solution to easy orchestrating all the tests, would be nice!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pick one way to do things and stick to that across all integration tests of all search engines. TBD

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is exploratory for the reasearch

Copy link
Copy Markdown
Collaborator

@nicolo-rinaldi nicolo-rinaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this library to set up integration tests, but we need to remember that we need to find a common way to set up folders. I think this is a key to improve the process of building integration tests for the search engines supported.


def test_search_engine_fetch(solr_url):
engine = SolrSearchEngine(solr_url)
docs = engine.fetch_for_query_generation(None, 3, ["title","body"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to change it

- solr
command: ["sh", "-c", "/opt/rre-dataset-generator/solr-init.sh"]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8983/solr/testcore/admin/ping"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pick one way to do things and stick to that across all integration tests of all search engines. TBD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants