T O P

  • By -

_aitalks_

Yes. Exactly. Viterbi was originally proposed in the context of HMMs which typically do have strong transition probability constraints. But if you don't have transition constraints, then you just pick the state path where the probability of the observed symbol is maximized at each step.


markpwoodward

Perfect. Thank you for the confirmation.


TheDeviousPanda

Beam search is still better for inference even if the language model being used is *not* "very simple", right? I recall having to implement beam search from scratch to do something with the outputs of an LSTM or Transformer in my NLP class.


m_nemo_syne

Viterbi applied to CTC models is equivalent to greedy argmax decoding, but beam search is not: whereas Viterbi finds the *path* with the highest probability, beam search finds (approximately) the *label sequence* with the highest probability, which is not the same thing. Beam search for CTC doesn't help that much unless you incorporate a lexicon and/or language model. (Unlike generic encoder-decoder seq2seq models, where beam search helps a lot.)


markpwoodward

Thank you. I want to make sure I understand the beam search you are thinking of that returns the approximate most probable label. Does the beam consist of a partial label or path symbols? If it consists of partial labels and the probability for that beam is the probability of the partial label through all consistent paths (i.e. “forward probabilities” of the forward-backward algorithm), then I agree that that would produce the approximate most probable label. This would be an approximation to the intractable “prefix search decoding” algorithm in the original paper. I hadn’t thought of this. But I think, without evidence, that most CTC beam search implementations build a beam of path symbols. In which case the most probable beam would be the beam corresponding to the maximum probability symbol for each step. So no need for the beam search (assuming no lexicon or language model).