Skip to content

Questions regarding feature preprocessing, number of CTC outputs #1

@lifelongeek

Description

@lifelongeek

Hello

First of all, thanks for providing BLSTM + CTC as open source. I attribute yours in my future paper.

I successfully install dependencies (i.e. cudamat, gnumpy, npmat) and make execution files in ctc folder.

Since TIMIT is small enough DB to start, I was going to execute runTimit.sh but got error message below :

[kenkim@node3 ctc]$ ./runTimit.sh

  • optimizer=nesterov
  • momentum=.95
  • epochs=100
  • layers=2048,2048,2048,2048
  • step=1e-4
  • anneal=1.1
  • outfile=models/nesterov_layers_2048,2048,2048,2048_step_1e-4_mom_.95_anneal.bin
  • echo models/nesterov_layers_2048,2048,2048,2048_step_1e-4_mom_.95_anneal.bin
    models/nesterov_layers_2048,2048,2048,2048_step_1e-4_mom_.95_anneal.bin
  • python runNNet.py --layers 2048,2048,2048,2048 --optimizer nesterov --step 1e-4 --epochs 100 --momentum .95 --outFile models/nesterov_layers_2048,2048,2048,2048_step_1e-4_mom_.95_anneal.bin --anneal 1.1 --outputDim 62 --inputDim 943 --rawDim 943 --numFiles 19 --dataDir /home/kenkim/kaldi-trunk/egs/timit/s5/exp/nomral_nn_train/
    gnumpy: failed to use gpu_lock. Using board #0 without knowing whether it is in use or not.
    Using nesterov..
    Traceback (most recent call last):
    File "runNNet.py", line 171, in
    run()
    File "runNNet.py", line 93, in run
    data_dict,alis,keys,sizes = loader.loadDataFileDict(i)
    File "/home/kenkim/stanford-ctc-master/ctc/dataLoader.py", line 48, in loadDataFileDict
    data_mat, alis, keys, sizes = self.loadDataFile(filenum)
    File "/home/kenkim/stanford-ctc-master/ctc/dataLoader.py", line 30, in loadDataFile
    data = np.fromfile(datafile,np.float32).reshape(-1,self.rawsize)
    IOError: [Errno 2] No such file or directory: '/home/kenkim/kaldi-trunk/egs/timit/s5/exp/nomral_nn_train/feats12.bin'

Here is my questions.

Q1) Which source code can make feats12.bin?

Q2) In your paper, you use 33 characters for CTC output symbol.
Can you clarify what exactly they are? (i.e. alphabet(26) + blank(1) + something(6))

Q3) How did you preprocess 'non-character' in transcription of switchboard?
For example, 747, &, 20/20, _1_1 etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions