As a Data Scientist, it is crucial to ensure the scalability of our model inference when deploying it into production. This GitHub issue addresses two key problems that can hinder inference scalability: computational complexity and memory management. We propose tackling these challenges by migrating the data preparation process from pandas to Spark, aiming to save time and computational resources.
Computational Complexity:
- By migrating data preparation to Spark, which excels at distributed computing, we can leverage its parallel processing capabilities to handle larger workloads more efficiently.
Memory Management:
- By migrating to Spark, we can benefit from its memory management capabilities, such as memory caching and efficient data storage formats, which can help mitigate memory overflow issues.
As a Data Scientist, it is crucial to ensure the scalability of our model inference when deploying it into production. This GitHub issue addresses two key problems that can hinder inference scalability: computational complexity and memory management. We propose tackling these challenges by migrating the data preparation process from pandas to Spark, aiming to save time and computational resources.
Computational Complexity:
Memory Management: