In most cases, the Baum-Welch algorithm is used to train a Hidden Markov model.
In many papers however, it is argued that the BW algorithm will optimize until it got stuck in a local optimum.
Does there exist an exact algorithm that actually succeeds in finding the global optimum (except from enumerating nearly all possible models and evaluating them)?
Of course for most applications, BW will work fine. We are however interested in finding lower bounds of the amount of information loss when reducing the number of states. Therefore we always need to generate the best model possible.
We are thus looking for an efficient NP-hard algorithm (that only enumerates over a (potentially) exponential number of extreme points) and not over a discretized number of floating points for each probability in the model.