Skip to content

fix: Async Android scope sync#2107

Merged
bitsandfoxes merged 21 commits intomainfrom
feat/jni-timeout
Apr 17, 2025
Merged

fix: Async Android scope sync#2107
bitsandfoxes merged 21 commits intomainfrom
feat/jni-timeout

Conversation

@bitsandfoxes
Copy link
Copy Markdown
Contributor

@bitsandfoxes bitsandfoxes commented Apr 10, 2025

Problem Statement

The issue manifested in two places:

  1. Customer conversation at the booth with complaints that heavy use of logging is killing their games performance on Android.
  2. CI being flaky a lot with logs showing JNI execution timed out.

Analysis

After doing some naive and simple benchmarking, by logging 100 messages within one frame we get to 1.1707766s spent on that frame.

When looking at the ScopeObserver for Android we - i.e. AddBreadcrumb

public override void AddBreadcrumbImpl(Breadcrumb breadcrumb)
{
_jniExecutor.Run(() =>

see that we end up on the JniExecutor

public void Run(Action jniOperation, TimeSpan? timeout = null)

that makes blocking calls into the JNI with a timeout

_taskCompletionSource.Task.Wait(timeoutCts.Token);

that easily gets hit

_logger?.LogError("JNI execution timed out.");

Considerations

  1. Scope sync should not block or slow down the game.
  2. A regular log message eating up almost an entire frame to sync is unacceptable.
  3. There is a maxBreadcrumb limit in the first place. So potentially blocking the game just to have them synched down a layer is incurring all the cost for limited benefit.
  4. In case there is a crash happening while the scope is still in process of being synched: They are likely to time out right now anyway.
  5. Limiting the async scope sync to AddBreadcrumb for now. This happens under the assumption that any changes to the scope at runtime is triggered by the user and should end up on the native layer.

Resolution

After running some benchmarks I realized that the whole worker-thread implementation is causing way more overhead than it was doing any good. Instead, the JniExecutor is now merged into SentryJava since there was nothing left to du but handle attaching and detaching to the AndroidJNI. This has to happen depending on whether we're running on the main-thread or not. Detaching from a still code executing thread will cause a crash.

  1. Move everything related to calling into Java into SentryJava. - Instead of having the scope observer call the JniExecutor on it's own.
  2. Properly handle attaching/detaching to the JNI

Benchmarks

Runs Metric Baseline Old New New vs Old (Improvement) Baseline vs New
1 Median 0.01ms 8.32ms 0.10ms -8.22ms (99%) +0.09ms
10 Median 0.01ms 104.92ms 0.59ms -104.33ms (99%) +0.58ms
50 Median 0.06ms 544.89ms 2.16ms -542.73ms (99%) +2.10ms
100 Median 0.08ms 1095.85ms 3.13ms -1092.72ms (99%) +3.05ms
1000 Median 0.62ms 11079.85ms 16.83ms -11063.02ms (99%) +16.21ms

Baseline = No sync to the Java SDK
Old = JNI worker thread synchronization
New = This PR

Follow Ups

@bitsandfoxes bitsandfoxes marked this pull request as ready for review April 11, 2025 11:26
@bitsandfoxes bitsandfoxes changed the title fix: JNI timing out fix: Async Android scope sync Apr 11, 2025
Comment thread src/Sentry.Unity.Android/JniExecutor.cs Outdated
Comment thread .github/workflows/ci.yml Outdated
# Build using older Linux version to preserve sdk compatibility with old GLIBC
# See discussion in https://github.com/getsentry/sentry-unity/issues/1730 for more details
host: ubuntu-20.04
host: ubuntu-22.04
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a unrelated and temporary change to unblock CI in this PR.

Copy link
Copy Markdown
Member

@bruno-garcia bruno-garcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the device test cover any of this?
are sync'ing scope, and do we verify things got passed down?

Comment thread src/Sentry.Unity.Android/AndroidOptionConfiguration.cs Outdated
Comment thread src/Sentry.Unity.Android/JniExecutor.cs Outdated
Comment thread src/Sentry.Unity.Android/JniExecutor.cs Outdated
Comment thread src/Sentry.Unity.Android/JniExecutor.cs Outdated
Comment thread src/Sentry.Unity.Android/JniExecutor.cs Outdated
}
catch (Exception e)
{
// Adding the Sentry logger tag ensures we don't send this error to Sentry.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we send our own diagnostic logger stuff to Sentry? We shouldn't

Comment thread src/Sentry.Unity.Android/SentryJava.cs Outdated
@bitsandfoxes
Copy link
Copy Markdown
Contributor Author

bitsandfoxes commented Apr 16, 2025

Does the device test cover any of this? are sync'ing scope, and do we verify things got passed down?

Device tests make sure that the app does not crash and verifies the C# part of the SDK. To verify the sync itself we need #2099

@bitsandfoxes bitsandfoxes merged commit 1d4fe53 into main Apr 17, 2025
79 checks passed
@bitsandfoxes bitsandfoxes deleted the feat/jni-timeout branch April 17, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants