Add support for multi-line yaml values by amamd07 · Pull Request #34 · rayyansys/pyworker

amamd07 · 2025-10-03T05:42:33Z

pyworker fails when the handler looks like the following. We fixed this by delegating attribute extraction to PyYaml after adding a constructor for unknown ruby classes.

object: !ruby/object:AiScreenerJob
  raw_attributes:
    id: 50
    status: 
    user_id: 976
    review_id: 5611
    filter: '{"file_ids":[7326],"decision":"undecided","decision_by":[679]}'
    inclusion: |-
      not 'Diabetes,asjdh,askdja

      dsa
    exclusion: 'Diabetes

      '```

amamd07 · 2025-10-03T06:01:19Z

This case will still break it

lines = [
    "key: 'texts'",
    "text continues'",
    "another_key: val"
]

But there must be a limit, because even if we fix that case. Then, what if there is a : in the 'text: continues'?

hammady · 2025-10-03T16:22:05Z

I tried a simpler approach:

import yaml


class IgnoreUnknownTagsLoader(yaml.SafeLoader):
    def ignore_unknown(self, node):
        return None

IgnoreUnknownTagsLoader.add_constructor(None, IgnoreUnknownTagsLoader.ignore_unknown)

with open("multi-line.yml", 'r') as file:
    attributes = file.read().splitlines()[2:]
    yaml_content = '\n'.join(['object:', '  attributes:'] + attributes)
    data = yaml.load(yaml_content, Loader=IgnoreUnknownTagsLoader)
data['object']['raw_attributes']

Output:

{'id': 50,
 'status': None,
 'user_id': 976,
 'review_id': 5611,
 'filter': '{"file_ids":[7326],"decision":"undecided","decision_by":[679]}',
 'inclusion': "not 'Diabetes,asjdh,askdja\n\ndsa",
 'exclusion': 'Diabetes\n',
 'error': None,
 'articles_count': None,
 'started_at': None,
 'finished_at': None,
 'failed_at': None,
 'post_response': None,
 'status_response': None,
 'skipped_articles': None,
 'completed_articles': None,
 'failed_articles': None,
 'include_count': None,
 'exclude_count': None,
 'unknown_count': None,
 'provider': None,
 'ai_model_name': None,
 'created_at': datetime.datetime(2025, 10, 2, 17, 51, 55, 59669, tzinfo=datetime.timezone.utc),
 'updated_at': datetime.datetime(2025, 10, 2, 17, 51, 55, 59669, tzinfo=datetime.timezone.utc)}

It avoid all our custom parsing logic except the job class name. Try it on failing tests. It delegates all parsing to the standard PyYaml library so should be more robust.

amamd07 · 2025-10-03T18:23:54Z

I tried a simpler approach:

import yaml


class IgnoreUnknownTagsLoader(yaml.SafeLoader):
    def ignore_unknown(self, node):
        return None

IgnoreUnknownTagsLoader.add_constructor(None, IgnoreUnknownTagsLoader.ignore_unknown)

with open("multi-line.yml", 'r') as file:
    attributes = file.read().splitlines()[2:]
    yaml_content = '\n'.join(['object:', '  attributes:'] + attributes)
    data = yaml.load(yaml_content, Loader=IgnoreUnknownTagsLoader)
data['object']['raw_attributes']

Output:

{'id': 50,
 'status': None,
 'user_id': 976,
 'review_id': 5611,
 'filter': '{"file_ids":[7326],"decision":"undecided","decision_by":[679]}',
 'inclusion': "not 'Diabetes,asjdh,askdja\n\ndsa",
 'exclusion': 'Diabetes\n',
 'error': None,
 'articles_count': None,
 'started_at': None,
 'finished_at': None,
 'failed_at': None,
 'post_response': None,
 'status_response': None,
 'skipped_articles': None,
 'completed_articles': None,
 'failed_articles': None,
 'include_count': None,
 'exclude_count': None,
 'unknown_count': None,
 'provider': None,
 'ai_model_name': None,
 'created_at': datetime.datetime(2025, 10, 2, 17, 51, 55, 59669, tzinfo=datetime.timezone.utc),
 'updated_at': datetime.datetime(2025, 10, 2, 17, 51, 55, 59669, tzinfo=datetime.timezone.utc)}

It avoid all our custom parsing logic except the job class name. Try it on failing tests. It delegates all parsing to the standard PyYaml library so should be more robust.

That alone without the squashing is failing the tests. But it is a good addition. I will include it in the PR

hammady

The whole point is to avoid manually extracting attributes (by calling extract_attributes). You don't need this function and you don't need any squashing (which is semantically incorrect by the way).

Copilot

Pull Request Overview

Adds support for parsing multi-line YAML values in job handlers by refactoring the YAML parsing logic to handle quoted strings that span multiple lines and updating test fixtures to validate this functionality.

Removed manual attribute extraction function and simplified to use handler lines directly
Added custom YAML loader classes to handle Ruby objects and unknown tags
Updated attribute extraction to work with raw_attributes instead of attributes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
pyworker/job.py	Refactored YAML parsing with custom loaders and simplified attribute extraction
tests/fixtures/handler_registered.yaml	Added multi-line string test case with quoted value
tests/test_job.py	Updated test expectations to include new multi-line attribute

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

hammady

Check copilot comments.

amamd07 · 2025-10-06T20:35:21Z

    # Construct mapping normally, ignoring Ruby-specific tags
    return loader.construct_mapping(node)

+yaml.SafeLoader.add_multi_constructor("!ruby/object:", no_ruby_objects)


This will be applied universally..

Yes, I understand. In the previous commit, it was adding it to our custom loader (even though the name of that loader IgnoreUnknownTagsLoader was misleading). I see no problem in ignoring ruby classes in the Python runtime of the application using pyworker. Remember that we are not ignoring all unknown tags, but we are simply mapping ruby objects to simple python dicts regardless of their ruby class names.

add support for multiple lines yaml values and tests

fb4afe6

amamd07 requested a review from hammady October 3, 2025 05:43

hammady changed the title ~~Add support for multiple lines yaml values and tests~~ Add support for multi-line yaml values and tests Oct 3, 2025

ignore unknown tags

7f256e5

hammady requested changes Oct 3, 2025

View reviewed changes

apply suggestion from hossam

fba5c9d

amamd07 commented Oct 3, 2025

View reviewed changes

Comment thread pyworker/job.py

amamd07 commented Oct 3, 2025

View reviewed changes

Comment thread pyworker/job.py Outdated

hammady requested a review from Copilot October 3, 2025 23:07

Copilot AI reviewed Oct 3, 2025

View reviewed changes

Comment thread pyworker/job.py Outdated

Comment thread pyworker/job.py Outdated

hammady requested changes Oct 3, 2025

View reviewed changes

ahmed and others added 3 commits October 3, 2025 19:59

register to ignoreUnknownTagsLoader instead of global

02b027a

cleanup

10b2fd6

Remove redundant yaml loader & add more tests

07fcac9

hammady approved these changes Oct 6, 2025

View reviewed changes

hammady changed the title ~~Add support for multi-line yaml values and tests~~ Add support for multi-line yaml values Oct 6, 2025

hammady merged commit 662a3ac into master Oct 6, 2025
5 checks passed

hammady deleted the tasks/8536 branch October 6, 2025 20:34

amamd07 commented Oct 6, 2025

View reviewed changes

maboelnour mentioned this pull request Nov 3, 2025

Implement YAML constructor and modify attribute handling #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for multi-line yaml values#34

Add support for multi-line yaml values#34
hammady merged 6 commits into
masterfrom
tasks/8536

amamd07 commented Oct 3, 2025 •

edited by hammady

Loading

Uh oh!

amamd07 commented Oct 3, 2025 •

edited

Loading

Uh oh!

hammady commented Oct 3, 2025

Uh oh!

amamd07 commented Oct 3, 2025 •

edited

Loading

Uh oh!

hammady left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

hammady left a comment

Uh oh!

Uh oh!

amamd07 Oct 6, 2025

Uh oh!

hammady Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

amamd07 commented Oct 3, 2025 • edited by hammady Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amamd07 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hammady commented Oct 3, 2025

Uh oh!

amamd07 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hammady left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

hammady left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amamd07 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

hammady Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amamd07 commented Oct 3, 2025 •

edited by hammady

Loading

amamd07 commented Oct 3, 2025 •

edited

Loading

amamd07 commented Oct 3, 2025 •

edited

Loading