PyCM's distance
method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's distance
method, a distance measure can be chosen from DistanceType
. The measures' names are chosen based on the namig style suggested in [1].
from pycm import ConfusionMatrix, DistanceType
cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})
cm.distance(metric=DistanceType.AMPLE)
Anderberg's D [4].
cm.distance(metric=DistanceType.Anderberg)
Andres & Marzo's Delta correlation [5].
cm.distance(metric=DistanceType.AndresMarzoDelta)
Baroni-Urbani & Buser I similarity [6].
cm.distance(metric=DistanceType.BaroniUrbaniBuserI)
Baroni-Urbani & Buser II correlation [6].
cm.distance(metric=DistanceType.BaroniUrbaniBuserII)
Batagelj & Bren distance [7].
cm.distance(metric=DistanceType.BatageljBren)
Baulieu I distance [8].
cm.distance(metric=DistanceType.BaulieuI)
Baulieu II similarity [8].
cm.distance(metric=DistanceType.BaulieuII)
Baulieu III distance [8].
cm.distance(metric=DistanceType.BaulieuIII)
Baulieu IV distance [9].
cm.distance(metric=DistanceType.BaulieuIV)
Baulieu V distance [9].
cm.distance(metric=DistanceType.BaulieuV)
Baulieu VI distance [9].
cm.distance(metric=DistanceType.BaulieuVI)
Baulieu VII distance [9].
cm.distance(metric=DistanceType.BaulieuVII)
Baulieu VIII distance [9].
cm.distance(metric=DistanceType.BaulieuVIII)
Baulieu IX distance [9].
cm.distance(metric=DistanceType.BaulieuIX)
Baulieu X distance [9].
cm.distance(metric=DistanceType.BaulieuX)
Baulieu XI distance [9].
cm.distance(metric=DistanceType.BaulieuXI)
Baulieu XII distance [9].
cm.distance(metric=DistanceType.BaulieuXII)
Baulieu XIII distance [9].
cm.distance(metric=DistanceType.BaulieuXIII)
Baulieu XIV distance [9].
cm.distance(metric=DistanceType.BaulieuXIV)
Baulieu XV distance [9].
cm.distance(metric=DistanceType.BaulieuXV)
Benini I correlation [10].
cm.distance(metric=DistanceType.BeniniI)
Benini II correlation [10].
cm.distance(metric=DistanceType.BeniniII)
cm.distance(metric=DistanceType.Canberra)
Clement similarity [13].
cm.distance(metric=DistanceType.Clement)
Consonni & Todeschini I similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniI)
Consonni & Todeschini II similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniII)
Consonni & Todeschini III similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniIII)
Consonni & Todeschini IV similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniIV)
Consonni & Todeschini V correlation [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniV)
Dennis similarity [15].
cm.distance(metric=DistanceType.Dennis)
Digby correlation [16].
cm.distance(metric=DistanceType.Digby)
Dispersion correlation [17].
cm.distance(metric=DistanceType.Dispersion)
Doolittle similarity [18].
cm.distance(metric=DistanceType.Doolittle)
Eyraud similarity [19].
cm.distance(metric=DistanceType.Eyraud)
cm.distance(metric=DistanceType.FagerMcGowan)
Faith similarity [22].
cm.distance(metric=DistanceType.Faith)
Fleiss-Levin-Paik similarity [23].
cm.distance(metric=DistanceType.FleissLevinPaik)
cm.distance(metric=DistanceType.ForbesI)
Forbes II correlation [26].
cm.distance(metric=DistanceType.ForbesII)
Fossum similarity [27].
cm.distance(metric=DistanceType.Fossum)
Gilbert & Wells similarity [28].
cm.distance(metric=DistanceType.GilbertWells)
cm.distance(metric=DistanceType.Goodall)
Goodman & Kruskal's Lambda similarity [31].
cm.distance(metric=DistanceType.GoodmanKruskalLambda)
Goodman & Kruskal Lambda-r correlation [31].
cm.distance(metric=DistanceType.GoodmanKruskalLambdaR)
Guttman's Lambda A similarity [32].
cm.distance(metric=DistanceType.GuttmanLambdaA)
Guttman's Lambda B similarity [32].
cm.distance(metric=DistanceType.GuttmanLambdaB)
Hamann correlation [33].
cm.distance(metric=DistanceType.Hamann)
Harris & Lahey similarity [34].
cm.distance(metric=DistanceType.HarrisLahey)
Hawkins & Dotson similarity [35].
cm.distance(metric=DistanceType.HawkinsDotson)
Kendall's Tau correlation [36].
cm.distance(metric=DistanceType.KendallTau)
Kent & Foster I similarity [37].
cm.distance(metric=DistanceType.KentFosterI)
Kent & Foster II similarity [37].
cm.distance(metric=DistanceType.KentFosterII)
1- C. C. Little, "Abydos Documentation," 2018.
2- V. Dallmeier, C. Lindig, and A. Zeller, "Lightweight defect localization for Java," in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "An evaluation of similarity coefficients for software fault localization," in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
5- A. M. Andrés and P. F. Marzo, "Delta: A new measure of agreement between two raters," British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
6- C. Baroni-Urbani and M. W. Buser, "Similarity of binary data," Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
7- V. Batagelj and M. Bren, "Comparing resemblance measures," Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
8- F. B. Baulieu, "A classification of presence/absence based dissimilarity coefficients," Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
9- F. B. Baulieu, "Two variant axiom systems for presence/absence based dissimilarity coefficients," Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
10- R. Benini, Principii di demografia. Barbera, 1901.
11- G. N. Lance and W. T. Williams, "Computer programs for hierarchical polythetic classification (“similarity analyses”)," The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
12- G. N. Lance and W. T. Williams, "Mixed-Data Classificatory Programs I - Agglomerative Systems," Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
13- P. W. Clement, "A formula for computing inter-observer agreement," Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
14- V. Consonni and R. Todeschini, "New similarity coefficients for binary data," Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.
15- S. F. Dennis, "The Construction of a Thesaurus Automatically From," in Statistical Association Methods for Mechanized Documentation: Symposium Proceedings, 1965, vol. 269: US Government Printing Office, p. 61.
16- P. G. Digby, "Approximating the tetrachoric correlation coefficient," Biometrics, pp. 753-757, 1983.
17- IBM Corp, "IBM SPSS Statistics Algorithms," ed: IBM Corp Armonk, NY, USA, 2017.
18- M. H. Doolittle, "The verification of predictions," Bulletin of the Philosophical Society of Washington, vol. 7, pp. 122-127, 1885.
19- H. Eyraud, "Les principes de la mesure des correlations," Ann. Univ. Lyon, III. Ser., Sect. A, vol. 1, no. 30-47, p. 111, 1936.
20- E. W. Fager, "Determination and analysis of recurrent groups," Ecology, vol. 38, no. 4, pp. 586-595, 1957.
21- E. W. Fager and J. A. McGowan, "Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types," Science, vol. 140, no. 3566, pp. 453-460, 1963.
22- D. P. Faith, "Asymmetric binary similarity measures," Oecologia, vol. 57, pp. 287-290, 1983.
23- J. L. Fleiss, B. Levin, and M. C. Paik, Statistical methods for rates and proportions. john wiley & sons, 2013.
24- S. A. Forbes, On the local distribution of certain Illinois fishes: an essay in statistical ecology. Illinois State Laboratory of Natural History, 1907.
25- A. Mozley, "The statistical analysis of the distribution of pond molluscs in western Canada," The American Naturalist, vol. 70, no. 728, pp. 237-244, 1936.
26- S. A. Forbes, "Method of determining and measuring the associative relations of species," Science, vol. 61, no. 1585, pp. 518-524, 1925.
27- E. G. Fossum and G. Kaskey, "Optimization and standardization of information retrieval language and systems," SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV, 1966.
28- N. Gilbert and T. C. Wells, "Analysis of quadrat data," The Journal of Ecology, pp. 675-685, 1966.
29- D. W. Goodall, "The distribution of the matching coefficient," Biometrics, pp. 647-656, 1967.
30- B. Austin and R. R. Colwell, "Evaluation of some coefficients for use in numerical taxonomy of microorganisms," International Journal of Systematic and Evolutionary Microbiology, vol. 27, no. 3, pp. 204-210, 1977.
31- L. A. Goodman, W. H. Kruskal, L. A. Goodman, and W. H. Kruskal, Measures of association for cross classifications. Springer, 1979.
32- L. Guttman, "An outline of the statistical theory of prediction," The prediction of personal adjustment, vol. 48, pp. 253-318, 1941.
33- U. Hamann, "Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen," Willdenowia, pp. 639-768, 1961.
34- F. C. Harris and B. B. Lahey, "A method for combining occurrence and nonoccurrence interobserver agreement scores," Journal of Applied Behavior Analysis, vol. 11, no. 4, pp. 523-527, 1978.
35- R. P. Hawkins and V. A. Dotson, "Reliability Scores That Delude: An Alice in Wonderland Trip Through the Misleading Characteristics of Inter-Observer Agreement Scores in Interval Recording," 1973.
36- M. G. Kendall, "A new measure of rank correlation," Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.
37- R. N. Kent and S. L. Foster, "Direct observational procedures: Methodological issues in naturalistic settings," Handbook of behavioral assessment, pp. 279-328, 1977.