Scoring Algorithms¶
CPSign currently leverages the two machine learning libraries LibLinear and LibSvm, and all their implemented algorithms. These can be used by the nonconformity measures of Conformal Predictors or VennABERS to produce the final Predictors. On this page you can find more details about all the currently implemented scoring algorithms.
Overview¶
All machine learning methods implements the interface MLAlgorithm
and one or more of the interfaces located in the package com.arosbio.modeling.ml.algorithms
. All currently available classes are registered as services and can be retrieved using Java ServiceLoader functionality, e.g., ServiceLoader.load(MLAlgorithm.class);
. All available parameters for each individual algorithm can be set using setters on each class, but can also (for the convenience of gridsearch and the CLI) be retrieved using the Configurable
interface, using the [getset]ConfigParameters(..)
methods. The ConfigParameter
objects can tell you, e.g., which parameters that will be included in a parameter tuning by default when GridSearch#search(..)
is called without an explicit parameter grid. Furthermore, the parameters of each algorithm is set to appropriate defaults for cheminformatics in conjunction with the Signatures Molecular Descriptor (where earlier studies has produced such defaults), or the defaults used in their respective implementations in remaining cases. For further details about each algorithm, we refer to the respective websites of LIBLINEAR and LIBSVM. The Following two tables outline the algorithms and their implemented interfaces.
Regression algorithms
Name 
SVR 

LinearSVR 
x 
EpsilonSVR / NuSVR 
x 
Classification algorithms
MultiLabelClassifier 
BinaryClassifier 
ScoringClassifier 
PseudoProbabilisticClassifier 


LinearSVC 

C_SVC / NuSVC 

PlattScaledC_SVC
PlattScaledNuSVC


LogisticRegression 
Regression algorithms¶
LinearSVR¶
Support Vector Regression (SVR) implemented in LIBLINEAR. Restricted to a linear kernel and optimized for fast training and predictions for linear kernel SVM. When having very large data sets this algorithm is preferred as runtime is much quicker than the other regression algorithms.
EpsilonSVR / NuSVR¶
Support Vector Regression (SVR) implemented in LIBSVM. The difference between these two algorithms is that the standard cost/C
parameter (used in EpsilonSVR
) is reparameterized into nu
(used in NuSVR
). The nu
parameter should be in the range [0..1]
. Supports the following kernels: (0) LINEAR, (1) POLY, (2) RBF, (3) SIGMOID.
Classification algorithms¶
LinearSVC¶
Support Vector Classification (SVC) implemented in LIBLINEAR. Restricted to a linear kernel and optimized for fast training and predictions for linear kernel SVM. Typically the fastest algorithm to train, and thus most appropriate when dealing with large data sets.
C_SVC / NuSVC¶
Support Vector Classification (SVC) implemented in LIBSVM. The difference between these two algorithms is that the standard cost/C
parameter (used in C_SVC
) is reparameterized into nu
(used in NuSVC
). The nu
parameter should be in the range [0..1]. Supports the following kernels: (0) LINEAR, (1) POLY, (2) RBF, (3) SIGMOID.
PlattScaledC_SVC / PlattScaledNuSVC¶
These two algorithms are based on the C_SVC
and NuSVC
algorithms, but performs an internal 5fold data set split and uses Platt Scaling to output probability estimates for each class. These are generally slower to train, but needed in cases when using a nonconformity measure that requires probabilities as input.
LogisticRegression¶
Logistic regression implemented in LIBLINEAR, produces probability estimates. Note that this algorithm uses the default hyperparameters and have not been tuned to work optimally with any particular kind of descriptor.