Chem Predictors¶
A ChemPredictor
is a wrapper class that is put on top of a Predictor
, adding useful functionality for directly dealing with chemical data for both training and prediction tasks.
Instantiation¶
There is one wrapper class for each Predictor
type:
Predictor class |
Wrapper class |
Description |
---|---|---|
CPClassifier |
ChemCPClassifier |
ACP/CCP/TCP classification models |
CPRegressor |
ChemCPRegressor |
ACP/CCP regressor models |
AVAPClassifier |
ChemVAPClassifier |
Venn-ABERS type models |
Loading data & Predict¶
Molecular data can be loaded directly using the ChemDataset
class, which performs descriptor calculation on the fly, or using the ChemPredictor
instance which then stores some meta data. The ChemPredictor
wrapper classes outlined in the last section holds a ChemDataset
instance so that data and the predictor is kept jointly. Loading data is most easily achieved using the convenience classes SDFile
, CSVFile
and JSONFile
that allows you to get an iterator of molecules. Note that you can load data from multiple files, simply by calling the add(Iterator)
several times, potentially using different file formats.
ChemCPClassifier clf = new ChemCPClassifier(acp);
// Read from SDF v2000/v3000
Iterator<IAtomContainer> data = new SDFile(file).getIterator();
// Define the property in the SDF to read labels from
String property = ...
// Map textual labels to numerical values
NamedLabels labels = new NamedLabels("active","inactive");
// Load using wrapper method in ChemCPClassifier class
clf.addRecords(data,
property,
labels);
// Load using the ChemDataset directly
clf.getDataset().add(data,
property,
labels);
Loading data in this way automatically loads it in the “normal” dataset, whereas adding the RecordType.CALIBRATION_EXCLUSIVE
or RecordType.MODELING_EXCLUSIVE
as a forth argument to the add()
or addRecords()
methods will mark the data to either be used exclusively for calibration of the predictions or training of the underlying scoring model.
Saving and loading predictor models¶
Both the precomputed data and the finished trained predictor model can be of interest to save. The precomputed data can be saved in case it is desired to train different predictors, possibly using different scoring implementations or parameters. The trained predictor model can be used for later predictions, deployed as a micro service etc. Serializing (saving) either data or prediction models are done using the ModelSerializer
class - by calling one of the saveModel
or saveDataset
methods. Loading them back into memory is done using the same class, using one of the loadDataset
or loadChemPredictor
methods.
Image generation¶
To get visual results from the predictions (i.e. of the significant signature), please refer to the Image rendering page.