Skip to main content

Table 3 Performance of each classifier combined with each semantic similarity method in the test set

From: Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity

Similarity

Method

Mean sens.* (IC 95%)

Recall @1% (%)

Recall @10% (%)

Precision @1% (%)

Precision @10% (%)

Mean AUROC (IC 95%)

Mean AUPRC (IC 95%)

Baseline

RidgeReg

82% [76–88]

59

81

25

3.4

93% [91–95]

46% [36–56]

SVM

81% [75–87]

54

81

23

3.4

91% [89–93]

45% [36–54]

RF

78% [73–83]

55

78

23

3.3

91% [89–93]

47% [39–55]

XGBoost

82% [76–88]

61

82

25

3.4

93% [91–95]

42% [33–51]

Lin similarity

RidgeReg

65% [57–73]

23

65

10

2.7

85% [82–88]

6.0% [4.0–8.0]

SVM

68% [60–76]

32

62

13

2.6

84% [81–87]

15% [9–21]

RF

76% [68–84]

44

76

18

3.2

90% [87–93]

21% [13–29]

XGBoost

63% [51–75]

18

62

8

2.6

85% [81–89]

6.0% [3.0–9.0]

Restricted hier. sim

RidgeReg

76% [71–81]

49

76

20

3.2

92% [90–94]

30% [21–39]

SVM

71% [54–88]

43

72

18

3.0

84% [72–96]

31% [23–39]

RF

85% [79–91]

59

85

25

3.5

93% [90–96]

43% [35–51]

XGBoost

86% [80–92]

41

86

17

3.6

96% [94–98]

35% [23–47]

fastText Embd

RidgeReg

81% [76–86]

48

79

20

3.3

90% [86–94]

35% [25–45]

SVM

81% [76–86]

52

79

22

3.3

91% [88–94]

33% [23–43]

RF

76% [70–82]

50

77

21

3.2

89% [86–92]

30% [20–40]

XGBoost

77% [70–84]

38

77

16

3.2

88% [83–93]

19% [13–25]

CODER Embd

RidgeReg

84% [78–90]

57

86

24

3.6

91% [88–94]

35% [25–45]

SVM

78% [72–84]

53

77

22

3.2

89% [86–92]

40% [30–50]

RF

74% [68–80]

48

74

20

3.1

88% [85–91]

35% [23–47]

XGBoost

72% [66–78]

39

72

16

3.0

87% [83–91]

19% [12–26]

  1. The bold values indicate high performance
  2. RF random forest, SVM support vector machine, Sens. sensitivity
  3. *Sensitivity for a specificity of 90%