Conversation
saimouu
commented
Feb 19, 2026
- Change validation logic to allow empty abstracts
- Empty abstracts default to NO_ABSTRACT
- Empty asbtract count is returned on file upload
- Warning toast is displayed in front if empty abstracts were detected
- Change validation logic to allow empty abstracts - Empty abstracts default to NO_ABSTRACT - Empty asbtract count is returned on file upload - Warning toast if displayed in front if empty abstracts were detected
alehuo
left a comment
There was a problem hiding this comment.
Looks good to me. One minor comment
| @@ -51,4 +63,4 @@ def validate_csv(file_obj: BinaryIO, filename: str) -> List[FileError]: | |||
| file_obj.seek(0) | |||
| except Exception: | |||
| pass | |||
| return errors | |||
| return errors, empty_abstract_count | |||
There was a problem hiding this comment.
Could this be done without the for loop? With large CSV files it can get slow quick. With about 10 000 rows a for loop can take ~3-4 seconds, vectorized operations take milliseconds.
I would do it like this:
mask = df["something"].isna()
count_of_something_na = mask.sum()
df.loc[mask, "something"] = NoneAlso - NaN vs None. I'm not sure which one is better in this case.
There was a problem hiding this comment.
Yeah makes sense. I'll look into it.
The same df.iterrows() loop is done in process_files().
Could also try to remove the duplicate file reading, maybe just modifying validate_csv() to also return the data could make sense.
There was a problem hiding this comment.
Changed validation to use TypeAdapter(list[PublicationRowData]) which should be a bit faster. Also removed
iterrows() loops since apparently even to_dict() is faster.
Also removed the duplicate file read and parsing from process_files().