How to use CE regularizer in chain-model training? 

Hi,
I wander is it appropriate to add CE regularizer(_**grad_xent**_) directly to _**grad**_ in chain model training?
As implemented as: [grad.add_mat(chain_opts.xent_regularize, grad_xent)](https://github.com/jzlianglu/pykaldi2/blob/5e988e5968aa9a5867f8179e6c53ea715ac46bdc/ops/ops.py#L129)  .

In kaldi's chain model recipe, e.g. [aishell s5](https://github.com/kaldi-asr/kaldi/blob/815101781e2efbb4baee09b5d00889f80a3c4735/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh#L111). The network architecture has two branches after layer [tdnn6](https://github.com/kaldi-asr/kaldi/blob/815101781e2efbb4baee09b5d00889f80a3c4735/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh#L111), one for chain-model[(output layer)](https://github.com/kaldi-asr/kaldi/blob/815101781e2efbb4baee09b5d00889f80a3c4735/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh#L115), the other one for CE[(output-xent layer)](https://github.com/kaldi-asr/kaldi/blob/815101781e2efbb4baee09b5d00889f80a3c4735/egs/aishell/s5/local/chain/tuning/run_tdnn_1a.sh#L127). 
Derivative matrix _**grad**_ is applied to "**_output_**", while _**grad_xent**_  is applied to "_**output-xent**_".  
If **_grad_xent_** is merged into **_grad_**, there will be no **_prefinal-xent--> output-xent_** branch at all.

![image](https://user-images.githubusercontent.com/14951566/67378660-30070780-f5ba-11e9-892f-62846f7ef170.png)










Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use CE regularizer in chain-model training? #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use CE regularizer in chain-model training? #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions