# perplexity branching factor

So perplexity is a function of probability of the sentence. In general, perplexity is… The meaning of the inversion in perplexity means that whenever we minimize the perplexity we maximize the probability. Perplexity (Cont…) • There is another way to think about perplexity: as the weighted average branching factor of a language. The agreeing part: They are measuring the same thing. The perplexity measures the amount of “randomness” in our model. Perplexity is the probability of the test set, normalized by the number of words: $PP(W) = P(w_1w_2\ldots w_N)^{-\frac{1}{N}}$ 1.3.4 Perplexity as branching factor During the class, we don’t really spend time to derive the perplexity. The perplexity of a language model on a test set is the inverse probability of the test set, normalized by the number of words. We leave this calculation as an exercise to the reader. Perplexity is then 2 1 jxj log 2 p(x ) … Maybe perplexity is a basic concept that you probably already know? If the perplexity is 3 (per word) then that means the model had a 1-in-3 chance of guessing (on average) the next word in the text. Now this should be fairly simple, I did the calculation but instead of lower perplexity instead I get a higher one. • But, • a trigram language model can get perplexity … Perplexity as branching factor • If one could report a model perplexity of 247 (27.95) per word • In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word. Thus although the branching factor is still 10, the perplexity or weighted branching factor is smaller. Perplexity does offer some other intuitions, such as average branching factor [citation needed, don't feel like digging through papers right now, but it is there on a google search over perplexity literature]. • The branching factor of a language is the number of possible next words that can follow any word. Perplexity can therefore be understood as a kind of branching factor: “in general,” how many choices must the model make among the possible next words from V? Consider a simpler case where we have only one test sentence, x . 3.2.1 Perplexity. The perplexity (PP) is … It too has certain weaknesses which we discuss. Minimizing perplexity is equivalent to maximizing the test set probability. This post is for those who don’t. Another way to think about perplexity is seen as the weighted average branching factor of … For this reason, it is sometimes called the average branching factor. Information theoretic arguments show that perplexity (the logarithm of which is the familiar entropy) is a more appropriate measure of equivalent choice. An objective measure of the freedom of the language model is the perplexity, which measures the average branching factor of the language model (Ney et al., 1997). Perplexity is weighted equivalent branching factor. The higher the perplexity, the more words there are to choose from at each instant and hence the more difficult the task. Conclusion. Perplexity is an intuitive concept since inverse probability is just the "branching factor" of a random variable, or the weighted average number of choices a random variable has. I want to leave you with one interesting note. Using counterexamples, we show that vocabulary size and static and dynamic branching factors are all inadequate as measures of speech recognition complexity of finite state grammars. Perplexity (average branching factor of LM): Why it matters Experiment (1992): read speech, Three tasks • Mammography transcription (perplexity 60) “There are scattered calcifications with the right breast” “These too have increased very slightly” • General radiology (perplexity 140) …

Posted in Uncategorized.