Skip to content

Improve performance of query execution by not clone'ing SessionState #20495

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

I happened to run some profiling tests during the sql parser upgrade

My profiles show that almost 13% of the overall sqllogictest runtime is consumed by cloning/dropping SessionState.

Basically I ran

INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests

And then ran instruments to capture some traces. Here is some obvious outliers:

Image Image

Note that most of these queries are quite small (so the actual work to run the query is quite small)

However, the fact that so much time is spent copying the SessionState is embarassing

Describe the solution you'd like

I would like to improve performance by not copying the session state as much

What is going on is that the SessionContext has a mutable SessionState here:

https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L297-L296

Once a query is planned, the DataFrame gets its own copy of the state so it is no longer affected by changes to the SessionState.

Specifically, here:
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L776-L775

The call to state() results in a copy:
https://github.com/apache/datafusion/blob/b6f7521752aec92e4a5b014be016fbe185b5bbc1/datafusion/core/src/execution/context/mod.rs#L1860-L1859

    pub fn state(&self) -> SessionState {
        let mut state = self.state.read().clone();
        state.mark_start_execution();
        state
    }

Describe alternatives you've considered

One idea would be to move from a Arc<RwLock<SessionState>> to just an Arc<SessionState> and use Arc::make_mut when it needed to be modified

So instead of

#[derive(Clone)]
pub struct SessionContext {
    /// UUID for the session
    session_id: String,
    /// Session start time
    session_start_time: DateTime<Utc>,
    /// Shared session state for the session
    state: Arc<RwLock<SessionState>>,
}

Something like

#[derive(Clone)]
pub struct SessionContext {
    /// UUID for the session
    session_id: String,
    /// Session start time
    session_start_time: DateTime<Utc>,
    /// Shared session state for the session
    state: Arc<SessionState>,  // <---- Note this is now just Arc
}

And then instead of self.state.write() to modify the state, code could use Arc::make_mut(&mut self.state)

let mut state = self.state.write();
state.config_mut().options_mut().set(&variable, &value)?;

And the idea is that then this Arc could be copied around and shared unless it was actually modified

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions