Skip to content

Fix empty abstract prevents file input#161

Merged
saimouu merged 6 commits intomainfrom
fix/empty-abstract-prevents-file-input
Mar 4, 2026
Merged

Fix empty abstract prevents file input#161
saimouu merged 6 commits intomainfrom
fix/empty-abstract-prevents-file-input

Conversation

@saimouu
Copy link
Collaborator

@saimouu saimouu commented Feb 19, 2026

  • Change validation logic to allow empty abstracts
  • Empty abstracts default to NO_ABSTRACT
  • Empty asbtract count is returned on file upload
  • Warning toast is displayed in front if empty abstracts were detected

- Change validation logic to allow empty abstracts
- Empty abstracts default to NO_ABSTRACT
- Empty asbtract count is returned on file upload
- Warning toast if displayed in front if empty abstracts were detected
@saimouu saimouu linked an issue Feb 19, 2026 that may be closed by this pull request
Copy link
Collaborator

@alehuo alehuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One minor comment

Comment on lines +1 to +66
@@ -51,4 +63,4 @@ def validate_csv(file_obj: BinaryIO, filename: str) -> List[FileError]:
file_obj.seek(0)
except Exception:
pass
return errors
return errors, empty_abstract_count
Copy link
Collaborator

@alehuo alehuo Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done without the for loop? With large CSV files it can get slow quick. With about 10 000 rows a for loop can take ~3-4 seconds, vectorized operations take milliseconds.

I would do it like this:

mask = df["something"].isna()
count_of_something_na = mask.sum()
df.loc[mask, "something"] = None

Also - NaN vs None. I'm not sure which one is better in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. I'll look into it.

The same df.iterrows() loop is done in process_files().

Could also try to remove the duplicate file reading, maybe just modifying validate_csv() to also return the data could make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed validation to use TypeAdapter(list[PublicationRowData]) which should be a bit faster. Also removed
iterrows() loops since apparently even to_dict() is faster.

Also removed the duplicate file read and parsing from process_files().

@saimouu saimouu requested a review from alehuo February 26, 2026 14:57
@saimouu saimouu merged commit 6297f11 into main Mar 4, 2026
11 of 12 checks passed
@saimouu saimouu deleted the fix/empty-abstract-prevents-file-input branch March 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty Abstract Prevent File input

2 participants