Please cite us if you use the software

Distance/Similarity

PyCM's distance method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's distance method, a distance measure can be chosen from DistanceType. The measures' names are chosen based on the namig style suggested in [1].

from pycm import ConfusionMatrix, DistanceType
cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})
$$TP \rightarrow True Positive$$$$TN \rightarrow True Negative$$$$FP \rightarrow False Positive$$$$FN \rightarrow False Negative$$$$POP \rightarrow Population$$

AMPLE

AMPLE similarity [2] [3].

$$sim_{AMPLE}=|\frac{TP}{TP+FP}-\frac{FN}{FN+TN}|$$
cm.distance(metric=DistanceType.AMPLE)
{0: 0.6, 1: 0.3, 2: 0.17142857142857143}
  • Notice : new in version 3.8

Anderberg's D

Anderberg's D [4].

$$sim_{Anderberg} = \frac{(max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))- (max(TP+FP,FP+TN)+max(TP+FP,FN+TN))}{2\times POP}$$
cm.distance(metric=DistanceType.Anderberg)
{0: 0.16666666666666666, 1: 0.0, 2: 0.041666666666666664}
  • Notice : new in version 3.8

Andres & Marzo's Delta

Andres & Marzo's Delta correlation [5].

$$corr_{AndresMarzo_\Delta} = \Delta = \frac{TP+TN-2 \times \sqrt{FP \times FN}}{POP}$$
cm.distance(metric=DistanceType.AndresMarzoDelta)
{0: 0.8333333333333334, 1: 0.5142977396044842, 2: 0.17508504286947035}
  • Notice : new in version 3.8

Baroni-Urbani & Buser I

Baroni-Urbani & Buser I similarity [6].

$$sim_{BaroniUrbaniBuserI} = \frac{\sqrt{TP\times TN}+TP}{\sqrt{TP\times TN}+TP+FP+FN}$$
cm.distance(metric=DistanceType.BaroniUrbaniBuserI)
{0: 0.79128784747792, 1: 0.5606601717798213, 2: 0.5638559245324765}
  • Notice : new in version 3.8

Baroni-Urbani & Buser II

Baroni-Urbani & Buser II correlation [6].

$$corr_{BaroniUrbaniBuserII} = \frac{\sqrt{TP \times TN}+TP-FP-FN}{\sqrt{TP \times TN}+TP+FP+FN}$$
cm.distance(metric=DistanceType.BaroniUrbaniBuserII)
{0: 0.58257569495584, 1: 0.12132034355964261, 2: 0.1277118490649528}
  • Notice : new in version 3.8

Batagelj & Bren

Batagelj & Bren distance [7].

$$dist_{BatageljBren} = \frac{FP \times FN}{TP \times TN}$$
cm.distance(metric=DistanceType.BatageljBren)
{0: 0.0, 1: 0.25, 2: 0.5}
  • Notice : new in version 3.8

Baulieu I

Baulieu I distance [8].

$$sim_{BaulieuI} = \frac{(TP+FP) \times (TP+FN)-TP^2}{(TP+FP) \times (TP+FN)}$$
cm.distance(metric=DistanceType.BaulieuI)
{0: 0.4, 1: 0.8333333333333334, 2: 0.7}
  • Notice : new in version 3.8

Baulieu II

Baulieu II similarity [8].

$$sim_{BaulieuII} = \frac{TP^2 \times TN^2}{(TP+FP) \times (TP+FN) \times (FP+TN) \times (FN+TN)}$$
cm.distance(metric=DistanceType.BaulieuII)
{0: 0.4666666666666667, 1: 0.11851851851851852, 2: 0.11428571428571428}
  • Notice : new in version 3.8

Baulieu III

Baulieu III distance [8].

$$sim_{BaulieuIII} = \frac{POP^2 - 4 \times (TP \times TN-FP \times FN)}{2 \times POP^2}$$
cm.distance(metric=DistanceType.BaulieuIII)
{0: 0.20833333333333334, 1: 0.4166666666666667, 2: 0.4166666666666667}
  • Notice : new in version 3.8

Baulieu IV

Baulieu IV distance [9].

$$dist_{BaulieuIV} = \frac{FP+FN-(TP+\frac{1}{2})\times(TN+\frac{1}{2})\times TN \times k}{POP}$$
cm.distance(metric=DistanceType.BaulieuIV)
{0: -41.45702383161246, 1: -22.855395541901885, 2: -13.85431293274332}
  • The default value of k is Euler's number $e$
  • Notice : new in version 3.8

Baulieu V

Baulieu V distance [9].

$$dist_{BaulieuV} = \frac{FP+FN+1}{TP+FP+FN+1}$$
cm.distance(metric=DistanceType.BaulieuV)
{0: 0.5, 1: 0.8, 2: 0.6666666666666666}
  • Notice : new in version 3.8

Baulieu VI

Baulieu VI distance [9].

$$dist_{BaulieuVI} = \frac{FP+FN}{TP+FP+FN+1}$$
cm.distance(metric=DistanceType.BaulieuVI)
{0: 0.3333333333333333, 1: 0.6, 2: 0.5555555555555556}
  • Notice : new in version 3.8

Baulieu VII

Baulieu VII distance [9].

$$dist_{BaulieuVII} = \frac{FP+FN}{POP + TP \times (TP-4)^2}$$
cm.distance(metric=DistanceType.BaulieuVII)
{0: 0.13333333333333333, 1: 0.14285714285714285, 2: 0.3333333333333333}
  • Notice : new in version 3.8

Baulieu VIII

Baulieu VIII distance [9].

$$dist_{BaulieuVIII} = \frac{(FP-FN)^2}{POP^2}$$
cm.distance(metric=DistanceType.BaulieuVIII)
{0: 0.027777777777777776, 1: 0.006944444444444444, 2: 0.006944444444444444}
  • Notice : new in version 3.8

Baulieu IX

Baulieu IX distance [9].

$$dist_{BaulieuIX} = \frac{FP+2 \times FN}{TP+FP+2 \times FN+TN}$$
cm.distance(metric=DistanceType.BaulieuIX)
{0: 0.16666666666666666, 1: 0.35714285714285715, 2: 0.5333333333333333}
  • Notice : new in version 3.8

Baulieu X

Baulieu X distance [9].

$$dist_{BaulieuX} = \frac{FP+FN+max(FP,FN)}{POP+max(FP,FN)}$$
cm.distance(metric=DistanceType.BaulieuX)
{0: 0.2857142857142857, 1: 0.35714285714285715, 2: 0.5333333333333333}
  • Notice : new in version 3.8

Baulieu XI

Baulieu XI distance [9].

$$dist_{BaulieuXI} = \frac{FP+FN}{FP+FN+TN}$$
cm.distance(metric=DistanceType.BaulieuXI)
{0: 0.2222222222222222, 1: 0.2727272727272727, 2: 0.5555555555555556}
  • Notice : new in version 3.8

Baulieu XII

Baulieu XII distance [9].

$$dist_{BaulieuXII} = \frac{FP+FN}{TP+FP+FN-1}$$
cm.distance(metric=DistanceType.BaulieuXII)
{0: 0.5, 1: 1.0, 2: 0.7142857142857143}
  • Notice : new in version 3.8

Baulieu XIII

Baulieu XIII distance [9].

$$dist_{BaulieuXIII} = \frac{FP+FN}{TP+FP+FN+TP \times (TP-4)^2}$$
cm.distance(metric=DistanceType.BaulieuXIII)
{0: 0.25, 1: 0.23076923076923078, 2: 0.45454545454545453}
  • Notice : new in version 3.8

Baulieu XIV

Baulieu XIV distance [9].

$$dist_{BaulieuXIV} = \frac{FP+2 \times FN}{TP+FP+2 \times FN}$$
cm.distance(metric=DistanceType.BaulieuXIV)
{0: 0.4, 1: 0.8333333333333334, 2: 0.7272727272727273}
  • Notice : new in version 3.8

Baulieu XV

Baulieu XV distance [9].

$$dist_{BaulieuXV} = \frac{FP+FN+max(FP, FN)}{TP+FP+FN+max(FP, FN)}$$
cm.distance(metric=DistanceType.BaulieuXV)
{0: 0.5714285714285714, 1: 0.8333333333333334, 2: 0.7272727272727273}
  • Notice : new in version 3.8

Benini I

Benini I correlation [10].

$$corr_{BeniniI} = \frac{TP \times TN-FP \times FN}{(TP+FN)\times(FN+TN)}$$
cm.distance(metric=DistanceType.BeniniI)
{0: 1.0, 1: 0.2, 2: 0.14285714285714285}
  • Notice : new in version 3.8

Benini II

Benini II correlation [10].

$$corr_{BeniniII} = \frac{TP \times TN-FP \times FN}{min((TP+FN)\times(FN+TN), (TP+FP)\times(FP+TN))}$$
cm.distance(metric=DistanceType.BeniniII)
{0: 1.0, 1: 0.3333333333333333, 2: 0.2}
  • Notice : new in version 3.8

Canberra

Canberra distance [11] [12].

$$sim_{Canberra} = \frac{FP+FN}{(TP+FP)+(TP+FN)}$$
cm.distance(metric=DistanceType.Canberra)
{0: 0.25, 1: 0.6, 2: 0.45454545454545453}
  • Notice : new in version 3.8

Clement

Clement similarity [13].

$$sim_{Clement} = \frac{TP}{TP+FP}\times\Big(1 - \frac{TP+FP}{POP}\Big) + \frac{TN}{FN+TN}\times\Big(1 - \frac{FN+TN}{POP}\Big)$$
cm.distance(metric=DistanceType.Clement)
{0: 0.7666666666666666, 1: 0.55, 2: 0.588095238095238}
  • Notice : new in version 3.8

Consonni & Todeschini I

Consonni & Todeschini I similarity [14].

$$sim_{ConsonniTodeschiniI} = \frac{log(1+TP+TN)}{log(1+POP)}$$
cm.distance(metric=DistanceType.ConsonniTodeschiniI)
{0: 0.9348704159880586, 1: 0.8977117175026231, 2: 0.8107144632819592}
  • Notice : new in version 3.8

Consonni & Todeschini II

Consonni & Todeschini II similarity [14].

$$sim_{ConsonniTodeschiniII} = \frac{log(1+POP)-log(1+FP+FN)}{log(1+POP)}$$
cm.distance(metric=DistanceType.ConsonniTodeschiniII)
{0: 0.5716826589686053, 1: 0.4595236911453605, 2: 0.3014445045412856}
  • Notice : new in version 3.8

Consonni & Todeschini III

Consonni & Todeschini III similarity [14].

$$sim_{ConsonniTodeschiniIII} = \frac{log(1+TP)}{log(1+POP)}$$
cm.distance(metric=DistanceType.ConsonniTodeschiniIII)
{0: 0.5404763088546395, 1: 0.27023815442731974, 2: 0.5404763088546395}
  • Notice : new in version 3.8

Consonni & Todeschini IV

Consonni & Todeschini IV similarity [14].

$$sim_{ConsonniTodeschiniIV} = \frac{log(1+TP)}{log(1+TP+FP+FN)}$$
cm.distance(metric=DistanceType.ConsonniTodeschiniIV)
{0: 0.7737056144690831, 1: 0.43067655807339306, 2: 0.6309297535714574}
  • Notice : new in version 3.8

Consonni & Todeschini V

Consonni & Todeschini V correlation [14].

$$corr_{ConsonniTodeschiniV} = \frac{log(1+TP \times TN)-log(1+FP \times FN)}{log(1+\frac{POP^2}{4})}$$
cm.distance(metric=DistanceType.ConsonniTodeschiniV)
{0: 0.8560267854703983, 1: 0.30424737289682985, 2: 0.17143541431350617}
  • Notice : new in version 3.8

References

1- C. C. Little, "Abydos Documentation," 2018.
2- V. Dallmeier, C. Lindig, and A. Zeller, "Lightweight defect localization for Java," in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "An evaluation of similarity coefficients for software fault localization," in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
5- A. M. Andrés and P. F. Marzo, "Delta: A new measure of agreement between two raters," British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
6- C. Baroni-Urbani and M. W. Buser, "Similarity of binary data," Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
7- V. Batagelj and M. Bren, "Comparing resemblance measures," Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
8- F. B. Baulieu, "A classification of presence/absence based dissimilarity coefficients," Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
9- F. B. Baulieu, "Two variant axiom systems for presence/absence based dissimilarity coefficients," Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
10- R. Benini, Principii di demografia. Barbera, 1901.
11- G. N. Lance and W. T. Williams, "Computer programs for hierarchical polythetic classification (“similarity analyses”)," The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
12- G. N. Lance and W. T. Williams, "Mixed-Data Classificatory Programs I - Agglomerative Systems," Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
13- P. W. Clement, "A formula for computing inter-observer agreement," Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
14- V. Consonni and R. Todeschini, "New similarity coefficients for binary data," Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.