according to the multi-gpu code given in the readme, i used below to do interpolation
streams = []
cycle=2
engine_path="/workspace/tensorrt/ckpts_no_avx512/rife419_beta_v2_ensembleTrue_op20_fp16_clamp_sim_2160*3840.engine"
for i in range(cycle):
streams.append(core.std.SelectEvery(rife_trt(clip, multi=3, engine_path=engine_path, num_streams=4, device_id=i), cycle=cycle, offsets=i))
clip = core.std.Interleave(streams)
it seems like each stream is taking as input the whole clip, which might be redundant? And I also found that increasing the num of cycle does no help to make it faster.
using cycle=2:
Output 750 frames in 18.43 seconds (40.70 fps)me=00:00:03.99 bitrate=49365.4kbits/s speed=0.228x
Filtername Filter mode Time (%) Time (s)
Model parallel 227.65 41.95
Model parallel 227.23 41.87
Bicubic parallel 166.12 30.61
Bicubic parallel 53.94 9.94
VideoSource unordered 18.00 3.32
BlankClip parallel 0.09 0.02
BlankClip parallel 0.07 0.01
BlankClip parallel 0.07 0.01
BlankClip parallel 0.05 0.01
Interleave parallel 0.02 0.00
Interleave parallel 0.02 0.00
SelectEvery parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
DuplicateFrames parallel 0.00 0.00
DuplicateFrames parallel 0.00 0.00
Trim parallel 0.00 0.00
Loop parallel 0.00 0.00
Trim parallel 0.00 0.00
Loop parallel 0.00 0.00
Interleave parallel 0.00 0.00
Interleave parallel 0.00 0.00
<Destroyed> <unknown> 0.00 0.00
using cycle=4:
Output 750 frames in 20.00 seconds (37.49 fps)me=00:00:03.96 bitrate=49697.3kbits/s speed=0.209x
Filtername Filter mode Time (%) Time (s)
Bicubic parallel 170.90 34.19
Model parallel 83.73 16.75
Model parallel 83.39 16.68
Model parallel 79.77 15.96
Model parallel 78.67 15.74
Bicubic parallel 54.29 10.86
VideoSource unordered 15.86 3.17
BlankClip parallel 0.06 0.01
BlankClip parallel 0.06 0.01
BlankClip parallel 0.06 0.01
BlankClip parallel 0.05 0.01
BlankClip parallel 0.05 0.01
BlankClip parallel 0.05 0.01
BlankClip parallel 0.05 0.01
BlankClip parallel 0.05 0.01
SelectEvery parallel 0.04 0.01
Interleave parallel 0.02 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
Interleave parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
Interleave parallel 0.01 0.00
SelectEvery parallel 0.01 0.00
SelectEvery parallel 0.00 0.00
SelectEvery parallel 0.00 0.00
SelectEvery parallel 0.00 0.00
SelectEvery parallel 0.00 0.00
DuplicateFrames parallel 0.00 0.00
DuplicateFrames parallel 0.00 0.00
DuplicateFrames parallel 0.00 0.00
DuplicateFrames parallel 0.00 0.00
Trim parallel 0.00 0.00
Trim parallel 0.00 0.00
Trim parallel 0.00 0.00
Loop parallel 0.00 0.00
Trim parallel 0.00 0.00
Loop parallel 0.00 0.00
Loop parallel 0.00 0.00
Loop parallel 0.00 0.00
Interleave parallel 0.00 0.00
Interleave parallel 0.00 0.00
Interleave parallel 0.00 0.00
Interleave parallel 0.00 0.00
<Destroyed> <unknown> 0.00 0.00
time used per 'Model' drop half, but the total time even becomes longer. do you have any suggestion?
according to the multi-gpu code given in the readme, i used below to do interpolation
it seems like each stream is taking as input the whole clip, which might be redundant? And I also found that increasing the num of cycle does no help to make it faster.
using cycle=2:
using cycle=4:
time used per 'Model' drop half, but the total time even becomes longer. do you have any suggestion?