Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ and prevent pages from running.

t.skipIfDeviceDoesNotHaveFeature('timestamp-query');

if (numQuerySets === 65536) {
// Allow extra time for massive timestamp query allocation/cleanup churn.
t.setEndTestScopeTimeout(10000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I want to make this part of the "official" interface of the test fixture. It seems a bit fragile or easy to misuse because it's very specific to the test why this might need to be done.

Would it work to wait inside the test by just doing await t.queue.onSubmittedWorkDone() at the end before returning from the test function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try it not work for this timeout case. This failure was not because the test ended before the commands were completed, but rather because it didn't have enough time to finish the tests (includes waiting submitted commands done + error scop popping + celanup). I think what we need here is a mechanism to increase the execution time of cases that require long execution times.

To avoid exposing it as a public interface and thus causing misuse, how about make the framework to auto-detect slow tests and extend timeouts automatically, if the default timeout in endTestScope is hit, run the case again with twice the default timeout. However, this approach increases the overall runtime of the test suite if a network-related timeout occurs.

Another idea is whether we can move this case into the stress test src/stress/queries/timestamps.spec.ts. I see there are some stress tests for render/compute pass are unimplemented, It would be reasonable to set a separate timeout for them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... it occurs to me that the test fixture already does onSubmittedWorkDone at the end of tests, so I should have realized that wouldn't work.

I was thinking that if the popErrorScope timeout worked, onSubmittedWorkDone would also work. But maybe not anymore because of work in Dawn to make error scopes asynchronous?

Anyway, what about just wrapping the test body in t.device.pushErrorScope('validation')/await t.device.popErrorScope()? If error scopes are able to capture the slowness at the test-fixture level, they should work at the test level too.

To avoid exposing it as a public interface and thus causing misuse, how about make the framework to auto-detect slow tests and extend timeouts automatically, if the default timeout in endTestScope is hit, run the case again with twice the default timeout.

IMO that would add too much complexity to the CTS and would probably interact poorly with the browsers' test harnesses which have their own timeout systems. My objection to putting it in the public interface is not that strong :)

Another idea is whether we can move this case into the stress test src/stress/queries/timestamps.spec.ts. I see there are some stress tests for render/compute pass are unimplemented, It would be reasonable to set a separate timeout for them.

I thought about suggesting moving this to stress, but right now I don't think anyone is running those tests so I don't want to lose the coverage. If we can make it run reliably then I'd like to keep it in the conformance section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OOO this week and will update it next week.

}

const view = t
.createTextureTracked({
size: [1, 1, 1],
Expand Down
6 changes: 6 additions & 0 deletions src/webgpu/gpu_test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1460,6 +1460,12 @@ export class GPUTest extends GPUTestBase {
assert(this.provider !== undefined, 'internal error: GPUDevice missing?');
this.provider.expectDeviceLost(reason);
}

/** Adjust timeout used when releasing the device after a test finishes. */
setEndTestScopeTimeout(timeoutMs: number): void {
assert(this.provider !== undefined, 'internal error: GPUDevice missing?');
this.provider.setEndTestScopeTimeout(timeoutMs);
}
}

/**
Expand Down
19 changes: 16 additions & 3 deletions src/webgpu/util/device_pool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,14 @@ import { getDefaultLimits, kPossibleLimits } from '../capability_info.js';
// MUST_NOT_BE_IMPORTED_BY_DATA_CACHE
// This file should not be transitively imported by .cache.ts files

const kDefaultEndTestScopeTimeoutMs = 5000;

export interface DeviceProvider {
/** Adapter the device was created from. Cannot be reused; just for adapter info. */
readonly adapter: GPUAdapter;
readonly device: GPUDevice;
expectDeviceLost(reason: GPUDeviceLostReason): void;
setEndTestScopeTimeout(timeoutMs: number): void;
}

class TestFailedButDeviceReusable extends Error {}
Expand Down Expand Up @@ -351,6 +354,7 @@ class DeviceHolder implements DeviceProvider {
expectedLostReason?: GPUDeviceLostReason;
/** Number of test cases the device has been used for. */
testCaseUseCounter = 0;
private endTestScopeTimeoutMs = kDefaultEndTestScopeTimeoutMs;

// Gets a device and creates a DeviceHolder.
// If the device is lost, DeviceHolder.lost gets set.
Expand Down Expand Up @@ -395,6 +399,7 @@ class DeviceHolder implements DeviceProvider {
this.device.pushErrorScope('validation');
this.device.pushErrorScope('internal');
this.device.pushErrorScope('out-of-memory');
this.endTestScopeTimeoutMs = kDefaultEndTestScopeTimeoutMs;
}

/** Mark the DeviceHolder as expecting a device loss when the test scope ends. */
Expand All @@ -403,21 +408,29 @@ class DeviceHolder implements DeviceProvider {
this.expectedLostReason = reason;
}

setEndTestScopeTimeout(timeoutMs: number): void {
assert(this.state === 'acquired');
assert(timeoutMs >= 0, 'timeout must be non-negative');
this.endTestScopeTimeoutMs = timeoutMs;
}

/**
* Attempt to end test scopes: Check that there are no extra error scopes, and that no
* otherwise-uncaptured errors occurred during the test. Time out if it takes too long.
*/
endTestScope(): Promise<void> {
assert(this.state === 'acquired');
const kTimeout = 5000;

// Time out if attemptEndTestScope (popErrorScope or onSubmittedWorkDone) never completes. If
// this rejects, the device won't be reused, so it's OK that popErrorScope calls may not have
// finished.
//
// This could happen due to a browser bug - e.g.,
// as of this writing, on Chrome GPU process crash, popErrorScope just hangs.
return raceWithRejectOnTimeout(this.attemptEndTestScope(), kTimeout, 'endTestScope timed out');
return raceWithRejectOnTimeout(
this.attemptEndTestScope(),
this.endTestScopeTimeoutMs,
'endTestScope timed out'
);
}

private async attemptEndTestScope(): Promise<void> {
Expand Down