i looked pred_grad, trans_grad... so, warp-transducer's backprop like bptt?? or, i have to implement bptt?? but, that is so hard, because rnnt_loss output is batch meaning loss... not time sequence..
i looked pred_grad, trans_grad...
so, warp-transducer's backprop like bptt??
or, i have to implement bptt?? but, that is so hard, because rnnt_loss output is batch meaning loss... not time sequence..