update detectFormLanguages API endpoint to detect laguages and form media#2967
update detectFormLanguages API endpoint to detect laguages and form media#2967
Conversation
| detected_languages.append(match.group(2)) | ||
| for col in df.columns: | ||
| col_norm = normalize_col(col) | ||
| match = re.search(r"::(\w+)$", col_norm) |
There was a problem hiding this comment.
Since this function is specifically for finding language column, it would be better to specify language columns in the regex like label, hint and required_message.
| if lang in INCLUDED_LANGUAGES and lang not in detected: | ||
| detected.append(lang) | ||
|
|
||
| if default_lang and default_lang not in detected: |
There was a problem hiding this comment.
Why are we searching for the same default_lang in detected twice.
| # Priority 1: default language | ||
| # Priority 2: first detected language | ||
| # Priority 3: plain "image" column | ||
| if default: |
There was a problem hiding this comment.
This language selection logic is valid only if the user doesn't use advanced configuration.
| else: | ||
| lang_to_use = None | ||
|
|
||
| if lang_to_use: |
There was a problem hiding this comment.
I think it will be better to write a regex that simply looks for image:: and media::image::, instead of also checking for language column.
9054dee to
5e111a0
Compare
| if langs["default_language"] | ||
| else [], | ||
| "supported_languages": list(INCLUDED_LANGUAGES.keys()), | ||
| "media_files": media, |
There was a problem hiding this comment.
We do already have an endpoint for returning which media is required, after the form is uploaded (ODK Central handles it this way).
Its not a bad idea to try and determined the media uploads directly from the form, before its uploaded. I just wonder if there is a good reason ODK doesn't do this
There was a problem hiding this comment.
You're right,
However, in step 3(project creation), the XLSForm won't be reaching ODK yet, so we won't be able to fetch that data from ODK at that stage (since we have the code set up for it after the form is submitted).
That's why we need to detect the media requirements directly from the custom backend.
There was a problem hiding this comment.
If extraction of the media fields is reliable this way, then personally I think its a nicer user experience.
But equally, nothing is stopping us posting the form to ODK as part of the project creation workflow, instead of at the end - resulting in the same experience and less work for us.
Let's park this until we refactor the project creation to be rely more on ODK instead of FieldTM specific stuff
|
The pre-commit fail seems to be genuine. The api container crashes if it's rebuilt. |
55a748a to
370475f
Compare
What type of PR is this? (check all applicable)
Related Issue
Describe this PR
This PR cleans up and improves how we detect languages in XLSForms while also adding support for extracting media files.
detect_languages() now just detects languages.
get_media_files() handles finding the correct media column and collecting file references.
The endpoint simply calls these helpers and returns the combined result.
What’s changed compared to the old language detection:
It’s no longer tied to FastAPI or I/O, it works purely on the parsed sheets.
Old version only looked at label::, hint::, and required_message:: columns. The new version checks any column ending with ::lang, making detection more flexible.
Screenshots
Alternative Approaches Considered
Could have added a separate endpoint for media extraction and left the old language-detection code as-is, but that would introduce duplication and extra complexity. Refactoring kept things cleaner.
Review Guide
Notes for the reviewer. How to test this change?
Checklist before requesting a review
[optional] What gif best describes this PR or how it makes you feel?