-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Is your feature request related to a problem or challenge?
I happened to run some profiling tests during the sql parser upgrade
My profiles show that almost 13% of the overall sqllogictest runtime is consumed by cloning/dropping SessionState.
Basically I ran
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictestsAnd then ran instruments to capture some traces. Here is some obvious outliers:
Note that most of these queries are quite small (so the actual work to run the query is quite small)
However, the fact that so much time is spent copying the SessionState is embarassing
Describe the solution you'd like
I would like to improve performance by not copying the session state as much
What is going on is that the SessionContext has a mutable SessionState here:
Once a query is planned, the DataFrame gets its own copy of the state so it is no longer affected by changes to the SessionState.
Specifically, here:
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L776-L775
The call to state() results in a copy:
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1860-L1859
pub fn state(&self) -> SessionState {
let mut state = self.state.read().clone();
state.mark_start_execution();
state
}Describe alternatives you've considered
One idea would be to move from a Arc<RwLock<SessionState>> to just an Arc<SessionState> and use Arc::make_mut when it needed to be modified
So instead of
#[derive(Clone)]
pub struct SessionContext {
/// UUID for the session
session_id: String,
/// Session start time
session_start_time: DateTime<Utc>,
/// Shared session state for the session
state: Arc<RwLock<SessionState>>,
}Something like
#[derive(Clone)]
pub struct SessionContext {
/// UUID for the session
session_id: String,
/// Session start time
session_start_time: DateTime<Utc>,
/// Shared session state for the session
state: Arc<SessionState>, // <---- Note this is now just Arc
}And then instead of self.state.write() to modify the state, code could use Arc::make_mut(&mut self.state)
datafusion/datafusion/core/src/execution/context/mod.rs
Lines 1108 to 1109 in b6f7521
| let mut state = self.state.write(); | |
| state.config_mut().options_mut().set(&variable, &value)?; |
And the idea is that then this Arc could be copied around and shared unless it was actually modified
Additional context
No response