Similarity | Method | Mean sens.* (IC 95%) | Recall @1% (%) | Recall @10% (%) | Precision @1% (%) | Precision @10% (%) | Mean AUROC (IC 95%) | Mean AUPRC (IC 95%) |
---|---|---|---|---|---|---|---|---|
Baseline | RidgeReg | 82% [76–88] | 59 | 81 | 25 | 3.4 | 93% [91–95] | 46% [36–56] |
SVM | 81% [75–87] | 54 | 81 | 23 | 3.4 | 91% [89–93] | 45% [36–54] | |
RF | 78% [73–83] | 55 | 78 | 23 | 3.3 | 91% [89–93] | 47% [39–55] | |
XGBoost | 82% [76–88] | 61 | 82 | 25 | 3.4 | 93% [91–95] | 42% [33–51] | |
Lin similarity | RidgeReg | 65% [57–73] | 23 | 65 | 10 | 2.7 | 85% [82–88] | 6.0% [4.0–8.0] |
SVM | 68% [60–76] | 32 | 62 | 13 | 2.6 | 84% [81–87] | 15% [9–21] | |
RF | 76% [68–84] | 44 | 76 | 18 | 3.2 | 90% [87–93] | 21% [13–29] | |
XGBoost | 63% [51–75] | 18 | 62 | 8 | 2.6 | 85% [81–89] | 6.0% [3.0–9.0] | |
Restricted hier. sim | RidgeReg | 76% [71–81] | 49 | 76 | 20 | 3.2 | 92% [90–94] | 30% [21–39] |
SVM | 71% [54–88] | 43 | 72 | 18 | 3.0 | 84% [72–96] | 31% [23–39] | |
RF | 85% [79–91] | 59 | 85 | 25 | 3.5 | 93% [90–96] | 43% [35–51] | |
XGBoost | 86% [80–92] | 41 | 86 | 17 | 3.6 | 96% [94–98] | 35% [23–47] | |
fastText Embd | RidgeReg | 81% [76–86] | 48 | 79 | 20 | 3.3 | 90% [86–94] | 35% [25–45] |
SVM | 81% [76–86] | 52 | 79 | 22 | 3.3 | 91% [88–94] | 33% [23–43] | |
RF | 76% [70–82] | 50 | 77 | 21 | 3.2 | 89% [86–92] | 30% [20–40] | |
XGBoost | 77% [70–84] | 38 | 77 | 16 | 3.2 | 88% [83–93] | 19% [13–25] | |
CODER Embd | RidgeReg | 84% [78–90] | 57 | 86 | 24 | 3.6 | 91% [88–94] | 35% [25–45] |
SVM | 78% [72–84] | 53 | 77 | 22 | 3.2 | 89% [86–92] | 40% [30–50] | |
RF | 74% [68–80] | 48 | 74 | 20 | 3.1 | 88% [85–91] | 35% [23–47] | |
XGBoost | 72% [66–78] | 39 | 72 | 16 | 3.0 | 87% [83–91] | 19% [12–26] |