To compute the weight for each field, you need to understand that each field or variable has two probabilities associated with it. They are called the "m" and "u" probabilities.
The "m" probability is that a field agrees given that the record pair being examined is a matched pair. This is
effectively one minus the error rate of the field. The more reliable a field is, the greater the "m" probability will be.
For example, in a sample of matched records, if street name disagrees 10% of the time due to a spelling error, then the "m" probability for this variable is 0.9 (1 - 0.1).
The "u" probability is that a field agrees given that the record pair being
examined is an unmatched pair. Since there are so many more unmatched pairs
possible than matched pairs, this effectively is the probability that the field agrees at random.
The probability that the street name variable agrees at random is about 0.01.
Therefore, the weight for a field is computed as the logarithm to the base two of the ratio of "m" and "u."
For example, assuming that the "m" and "u" probabilities of street name are 0.9 and 0.01, respectively, the weight for street name is log2(0.9/0.01) = 6.49.