Chem Predictors

A ChemPredictor is a wrapper class that is put on top of a Predictor, adding useful functionality for directly dealing with chemical data for both training and prediction tasks.

Instantiation

There is one wrapper class for each Predictor type:

Predictor class

Wrapper class

Description

CPClassifier

ChemCPClassifier

ACP/CCP/TCP classification models

CPRegressor

ChemCPRegressor

ACP/CCP regressor models

AVAPClassifier

ChemVAPClassifier

Venn-ABERS type models

Loading data & Predict

Molecular data can be loaded directly using the ChemDataset class, which performs descriptor calculation on the fly, or using the ChemPredictor instance which then stores some meta data. The ChemPredictor wrapper classes outlined in the last section holds a ChemDataset instance so that data and the predictor is kept jointly. Loading data is most easily achieved using the convenience classes SDFile, CSVFile and JSONFile that allows you to get an iterator of molecules. Note that you can load data from multiple files, simply by calling the add(Iterator) several times, potentially using different file formats.

ChemCPClassifier clf = new ChemCPClassifier(acp);

// Read from SDF v2000/v3000
Iterator<IAtomContainer> data = new SDFile(file).getIterator();
// Define the property in the SDF to read labels from
String property = ...
// Map textual labels to numerical values
NamedLabels labels = new NamedLabels("active","inactive");

// Load using wrapper method in ChemCPClassifier class
clf.addRecords(data,
			property,
			labels);
// Load using the ChemDataset directly
clf.getDataset().add(data,
            property,
            labels);

Loading data in this way automatically loads it in the “normal” dataset, whereas adding the RecordType.CALIBRATION_EXCLUSIVE or RecordType.MODELING_EXCLUSIVE as a forth argument to the add() or addRecords() methods will mark the data to either be used exclusively for calibration of the predictions or training of the underlying scoring model.

Saving and loading predictor models

Both the precomputed data and the finished trained predictor model can be of interest to save. The precomputed data can be saved in case it is desired to train different predictors, possibly using different scoring implementations or parameters. The trained predictor model can be used for later predictions, deployed as a micro service etc. Serializing (saving) either data or prediction models are done using the ModelSerializer class - by calling one of the saveModel or saveDataset methods. Loading them back into memory is done using the same class, using one of the loadDataset or loadChemPredictor methods.

Image generation

To get visual results from the predictions (i.e. of the significant signature), please refer to the Image rendering page.