feat!: orjson instead of simplejson to load and save JSON objects#134
feat!: orjson instead of simplejson to load and save JSON objects#134Martin1887 wants to merge 3 commits intoaibasel:mainfrom
Conversation
|
Great, thanks! I'll test when I find the time. Maybe you can try fixing the tests in the meantime. |
jendrikseipp
left a comment
There was a problem hiding this comment.
Thanks for the PR and sorry for taking so long to review it!
Could you run a small test that measures the runtime of loading a large properties file with the old vs the new code (making sure that the filesystem cache is hot, by ignoring the first measured time)?
|
Ah, I see now that you already measured a 4x speedup. |
|
Thanks! Code looks good now. I tested it locally: for a 500 MiB properties file, simplejson takes 2s to read it, while orjson takes 1.9s. Do you have the logs for a properties file where the switch to orjson pays off more? How much more memory is used in that case? |
|
I don't have any logs, but the https://zenodo.org/records/13378665/files/experiments_scripts_and_results.zip?download=1 I experienced 4x time speedup with 2x RAM consumption. These properties files have many lists of numbers, maybe orjson is much better in this type of data. Also, disabling sorting and formatting the speedup is higher, but properties files are not human-readable. Maybe, a parameter to disable sorting and formatting in fetchers would be a good idea for large experiments where the properties files will not be manually revised anyway. |
|
Thanks! I'll look into this after the break. |
Hello.
This pull request replaces
simplejsonbyorjson.The previous JSON files generated by
simplejsonare compatible withorjsonand, indeed, they seem equal to me, so no breaking changes in this aspect.However, the API or
orjsonis not compatible with the Python nativejsonlibrary, soorjsonis required to use Lab after this change meanwhilesimplejsonwas optional.In a performance perspective,
orjsonis around 4x faster (without SWAP usage) but it uses more RAM (around the double of RAM, not a problem in the most use cases). More benchmarks would be needed in your side before merging this pull request however.Thanks.