Skip to content

feat: ability to set configurations in python model#663

Open
akmalsoliev wants to merge 5 commits intoaws-samples:mainfrom
akmalsoliev:fix_install_packages
Open

feat: ability to set configurations in python model#663
akmalsoliev wants to merge 5 commits intoaws-samples:mainfrom
akmalsoliev:fix_install_packages

Conversation

@akmalsoliev
Copy link
Copy Markdown
Contributor

@akmalsoliev akmalsoliev commented Feb 13, 2026

resolves #

Description

  • One can set a full configuration of a dbt session through dbt.config inside of a python model
  • Fixed profiles.yml improperly being set with Python model when creating an interactive session

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-glue next" section.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@akmalsoliev akmalsoliev marked this pull request as draft February 13, 2026 17:29
@akmalsoliev
Copy link
Copy Markdown
Contributor Author

akmalsoliev commented Feb 13, 2026

BUG: Doesn't properly close the connection once the process is complete.

P.S.: If anyone feels like helping out with this would be appreciated.

@akmalsoliev
Copy link
Copy Markdown
Contributor Author

Just realized that it is not a PR specific issue, rather than a face that session_id is being dropped during the process, will investigate further.

@akmalsoliev akmalsoliev marked this pull request as ready for review February 16, 2026 12:38
@akmalsoliev
Copy link
Copy Markdown
Contributor Author

After doing some digging found out that you're testing installation of ['pandas', 'scikit-learn', 'numpy'] and the reason why this doesn't throw an error is that all these packages are already part of >= Glue 4.0 https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#glue-modules-provided
One way to test could be to install some package which isn't part of a list.

@yotahk
Copy link
Copy Markdown
Collaborator

yotahk commented Mar 5, 2026

Thank you for the contribution. Would you help clarifying/fix following items:

  • --additional-python-modules looks fair since it's the native option for installing third party dependencies in Glue. What's the reason we also need pip install? pip install seems to be redundant but is it intended for any specialized use case?
  • subprocess.check_call will directly use {pkg_list} from user input, which has some risk of command line injection. Any thoughts on this?
  • There're unrelated changes overlapped with fix: improper error status raise #661, would you please revert the overlaps from this PR?
  • Please update CHANGELOG.md.
  • Please create a separate refactoring PR next time - large number of refactoring could make it difficult to review/revert/blame (when we need to revisit in the future).

@akmalsoliev
Copy link
Copy Markdown
Contributor Author

Hey @yotahk,
100% valid concerns and agree.

  • --additional-python-modules looks fair since it's the native option for installing third party dependencies in Glue. What's the reason we also need pip install? pip install seems to be redundant but is it intended for any specialized use case?

I'm unsure why, but when attempting to install via --additional-python-modules it didn't work, I'm not sure what is the reason for this issue, could you test it with non conventional library? for example pip install Validoopsie and call it via

import validoopsie as vd
vd
  • subprocess.check_call will directly use {pkg_list} from user input, which has some risk of command line injection. Any thoughts on this?

Agree, this was a work around as --additional-python-modules did not work for me.

Would like to get your feedback prior to proceeding with changes, opened this PR and modifications as a form of "working alternative draft".

@yotahk
Copy link
Copy Markdown
Collaborator

yotahk commented Mar 7, 2026

@akmalsoliev I can take a look on my side too, would you be able to share some code to reproduce the issue?

@akmalsoliev akmalsoliev force-pushed the fix_install_packages branch from 0213f16 to 6bdf61c Compare March 7, 2026 19:47
@akmalsoliev
Copy link
Copy Markdown
Contributor Author

@akmalsoliev I can take a look on my side too, would you be able to share some code to reproduce the issue?

Hey Yota,
Dug a bit deeper and found the issue. connection.py was never adding packages to create session args. I fixed that issue and seems to work, though not sure about the global ---additional-python-modules in profiles.yaml. At this point I'm okay with having to set it up pymodel by pymodel and in future gathering more information to open another PR for that (if it isn't passed).

Also noticed that you were indeed checking that additional package is present in GlueCredentials, however, you never tested if they are present in session creation, added a test for that too.

@akmalsoliev akmalsoliev force-pushed the fix_install_packages branch from fda8442 to c010f1b Compare March 7, 2026 20:08
@akmalsoliev
Copy link
Copy Markdown
Contributor Author

Hey @yotahk,
was able to find the issue for profiles.yml, it is basically how _string_to_dict would convert profiles.yml's ---additional-python-modules into a dictionary, the current structure string splits default_argunments by , separator, this was fixed by using -- separator and later reattaching it back, if the string ends with , then it would .rstrip(",")ed. Not sure about why regex had = as a split when by default in yaml you have : split.
Even though I believe this shouldn't be present, there is an issue that when session is created by a python model the tags aren't properly separated, as per value_in_dictionary[i.split("=")[0].strip('\'').replace("\"", "")] = i.split("=")[1].strip('"\''), I did not face this issue because the session was already created by prior steps and I was using reusable session.

def _string_to_dict(self, value_to_convert):

     def _string_to_dict(self, value_to_convert):
         value_in_dictionary = {}
-        for i in value_to_convert.split(","):
-            value_in_dictionary[i.split("=")[0].strip('\'').replace("\"", "")] = i.split("=")[1].strip('"\'')
+        for i in value_to_convert.split("--"):
+            i = i.strip()
+            if i == "":
+                continue
+            i = f"--{i}"
+            if i[-1] == ',':
+                i=i.rstrip(",")
+            value_in_dictionary[i.split(":")[0].strip('\'').replace("\"", "")] = i.split(":")[1].strip('"\'')
         return value_in_dictionary

tests/unit/gluedbapi/test_connection.py

         call_kwargs = mock_client.create_session.call_args[1]
         default_args = call_kwargs["DefaultArguments"]
         assert "--additional-python-modules" not in default_args
+
+    @mock.patch("dbt.adapters.glue.gluedbapi.connection.get_session_waiter")
+    @mock.patch("dbt.adapters.glue.gluedbapi.connection.boto3")
+    def test_create_session_with_default_arguments_dict_and_list_values(
+        self, mock_boto3, mock_waiter
+    ) -> None:
+        """Test that default_arguments as a dict with list values joins them into comma-separated strings."""
+        mock_waiter.return_value = mock.Mock()
+        mock_session = mock_boto3.session.Session.return_value
+        mock_client = mock_session.client.return_value
+
+        credentials = GlueCredentials(
+            role_arn="arn:aws:iam::123456789012:role/GlueRole",
+            region="us-east-1",
+            workers=2,
+            worker_type="G.1X",
+            schema="test_schema",
+            default_arguments="--enable-metrics: true, --additional-python-modules:numpy,pandas,scikit-learn",
+        )
+
+        connection = GlueConnection(credentials)
+        connection._create_session("test-session-id")
+
+        mock_client.create_session.assert_called_once()
+        call_kwargs = mock_client.create_session.call_args[1]
+        default_args = call_kwargs["DefaultArguments"]
+
+        assert default_args["--enable-metrics"] == "true"
+        assert "--additional-python-modules" in default_args
+        assert (
+            default_args["--additional-python-modules"] == "numpy,pandas,scikit-learn"
+        )

@akmalsoliev akmalsoliev changed the title fix: added ability to install third party packages feat: ability to set configurations in python model Apr 14, 2026
@akmalsoliev
Copy link
Copy Markdown
Contributor Author

akmalsoliev commented Apr 14, 2026

Hey @yotahk, modified couple of things, now an operator can set a full list of configurations in python model through dbt.config

I think this is much nicer approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants