Replies: 8 comments 2 replies
-
|
I use Polars exclusively after making the switch away from Pandas. Finance is still dominated by Pandas though, so it will likely be a slow transition. In the meantime, you can easily convert from Pandas to Polars dataframes by using Sometimes you have to fiddle with the Pandas index/indices but still very workable. |
Beta Was this translation helpful? Give feedback.
-
|
Is it a matter of code convenience since the rest of your workflow is in Polars or do you actually anticipate performance gains when working with edgartools? I hadn't made the change to polars as yet because pandas is still so widely used. It should not be that difficult to add polars support, I just didn't know it was worth it |
Beta Was this translation helpful? Give feedback.
-
|
I think it's both convenience and for performance gains to be able to integrate Polars to my large workflows especially if I'm pulling a lot of filings and aggregating them. To be honest, from my job and stuff polars is being slowly transitioned from pandas. The syntax is a lot better and smaller too. |
Beta Was this translation helpful? Give feedback.
-
Yeah I agree. Im aware you can but I think would be a bit easier to allow a Polars option with Edgar tools like Edgartools would be one step ahead of the curve. and yes it's tricky with Pandas indexes sometimes. |
Beta Was this translation helpful? Give feedback.
-
|
I am open to doing this for some parts of edgartools. The internals use pyarrow and so requires a fair bit of work - though this is easy for LLMs to do now. But I'd want to understand the workflows .. the internals of edgartools won't be much faster with polars but downstream workflows might benefit from not having to do the conversion. So I need examples of workflows that might benefit |
Beta Was this translation helpful? Give feedback.
-
|
Just a thought before doing all this work. I too prefer to use polars, both for speed and because the api is much cleaner. However, I also use pandas when the data involved is small and it matches what my other imports want. I imagine there would only be very specific use-cases where someone would use edgartools and get a huge dataset back. In my case of company by company and filing by filing processing, polars would improve nothing. But while we're talking about speed, it may be time for you to do some profiling on edgartools. Even though i'm using locally cached data, I've noticed that edgartools is quite slow. To simply read about 2800 filings (no further processing on my end) takes about 12 minutes on Mac M4 Max with 64GB RAM. That seems very slow to me. I bet there are very specific bottlenecks in your code that account for most of this. |
Beta Was this translation helpful? Give feedback.
-
|
I agree that we need to spend some effort optimizing for speed and efficiency now that more people are using edgartools in pipelines. That being said the core was built on top of pyarrow from the start and so quite fast for say 10% of use cases. Since we already use Pyarrow it makes the case for Polars weaker here. The slower Pythonic parts for the remaining 90% is where the value is though with the custom data objects and data extractions and rendering. And Polars would not help much here either. Regardless if more that 10% of users want to have data in Polars for their workflows then it makes sense to support Polars. So we need to get a sense of what these use cases and workflows are |
Beta Was this translation helpful? Give feedback.
-
I use Polars and am fine with this arrangement. Switching from Pandas to Polars is relatively easy given the tools available within both Polars & Pandas, plus the enhancements of the current API and developer experience provided by EdgarTools. I personally feel that there are other higher priority items to clean up the code and ergonimics of the API (e.g. tooling around Q4 is improving and am glad to see that recently). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So it's another long term request for me. I understand that Edgartools runs with Pandas and that is great honestly. However, for me I began to see with larger financial datasets that I possess in my overall workflow that pandas is not fast enough on its own. If it's a minor suggestion, to allow a way for you to automatically allow a Polars Dataframe. A lot of my workflow is moving on to Polars. So for example what can work is
Income_df = income.statement_todataframe(add polars= Boolean)
Using a backend function to convert from pandas to Polars.
polars_df =pl. from_pandas (pandas dataframe)
I'm not sure how to implement this.
As a PS how is the textblock to dataframe feature. That issue was #336
Beta Was this translation helpful? Give feedback.
All reactions