Home > Documentation

How to cite this service

Gütlein, Martin; Kramer, Stefan
Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
Journal of Cheminformatics; 8:60; 2016; DOI: 10.1186/s13321-016-0173-z
Please cite this paper to support CoFFer.


The CoFFer web service predicts chemical compounds and provides information to help interpreting predictions. The available QSAR models are built with circular fingerprints that are mined on the respective training dataset. This service shows that filtered (instead of folded) fingerprint fragments can yield very predictive models, while at the same time, contain useful information when trying to understand model predictions. Please refer to our publication for details.

Filtering of Circular Fingerprint Fragments

We selected the CDK implementation of Extended-Connectivity Fingerprints as circular fingerprint fragments. Instead of reducing the large amount of features by fingerprint folding, we applied a supervised filtering approach: the method removes redundant (non-closed) fragments as well as fragments that are uncorrelated to the target endpoint. Please refer to our publication for details.


The machine learning library WEKA was used to build three types of classifiers (Support Vector Machines , Random Forests , and naive Bayes ). When making a prediction for a two class problem (e.g., with class values 'active' and 'in-active'), the models provide a probability estimate that expresses the confidence of the classifier. A value close to 100% indicates that the classifier is very confident, whereas a value close to 50% means that the classifier is very unsure about the predicted compound activity.

Ranking of Fragments

Our service ranks fragments according to their importance for predicting the query compound. This is computed by swapping the feature value of the fragment and re-classifying the compound. Moreover, features are highlighted as "activating" or "de-activating":
* A feature is marked as "activating" if it is originally present and a re-classification with swapped feature value leads to a lower probability of being active. Also, a feature is marked as "activating" if it was originally absent in the query compound and the predicted probability with swapped feature value leads to a higher active probability.
* Otherwise, we consider the feature to be "de-activating".

When swapping feature values for a fragment, our method takes the compound structure into account:
* If the evaluated fragment is originally present in a compound, then super-fragments (that extend this fragment) will be switched off as well when evaluating the importance of the fragment. Additionally, sub-fragments that are included in this fragment and do not match the compound at a different location are disabled.
* Accordingly, if the evaluated fragment is originally absent in the compound and is switched on for evaluation, then all sub-fragments (that are contained within this fragment) are switched on simultaneously. Please refer to our publication for details.

Coloring of Predicted Compounds

When predicting a query compound (with a single model), the service highlights activating and de-activating parts within the query compound. Therefore, the weight of each present fragment is summed up for all atoms and bonds that match the fragment. Subsequently, the weights are used as input for a color gradient that ranges from blue (deactivating) to white (neutral) to red (activating). Please refer to our publication for details.


The models have been validated with a 3 times repeated, nested 10-fold cross-validation. The inner level of cross-validation was used for model selection (to decide on the selected algorithm, parameters and number of features). The outer level of cross-validation was used to estimate the predictivity of the model. The published models are build on the entire dataset.

Applicability Domain

A QSAR model should only be applied to compounds that lie within its applicability domain (AD), i.e., to compounds that are similar to the structures within the training dataset.
Each model of this service includes a distance based method to compute its AD. Query compounds are excluded from from the AD if the distance to the training dataset compounds is too high.
The distance of query compounds and training dataset compounds is computed as the mean Tanimoto distance to its 3 nearest neighbors in the training dataset. The Tanimoto distance is calculated with the same structural features that are used by the respective QSAR model.
For comparing the distance of the query compound to the distance within training dataset, we compute the probability that the distance is higher than the training dataset distances. Therefore, the training dataset distance distribution is fitted to a normal distribution. If the cumulative probability P(X ≤ x) > 0.99 the compound is outside the AD. If P > 0.95 the compound is possibly outside the AD. If P ≤ 0.95, the compound is inside the AD.

Source Code

This open-source project is implemented in Java and mainly based on the two libraries CDK and WEKA .
Our source code is provided under AGPL license on GitHub. The main libraries are:
* cdk-lib : Mining fingerprints and depict matches
* cfp-miner : Building QSAR models with circular fingerprint features
* coffer : This web service.


This service and the source code is released under AGPL . It is provided in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.


The service can be programmatically accessed via REST .
The API definition of this service can be found here , it is compliant with the OpenTox API definition for model predictions.

REST Example

The example below shows how to use the OpenTox API to predict a compound. It uses the command line tool curl as REST client.

1. A list if available model URIs can be derived with media-type "text/uri-list"

REST call
curl -H "Accept:text/uri-list" http://coffer.informatik.uni-mainz.de

2. A GET request to a model URI returns basic model properties in Json format

REST call
curl http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse
"dc.identifier" : http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse,
"ot.dependentVariables" : http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/feature/measured,
"ot.predictedVariables" : http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/feature/predicted http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/feature/probability,
"rdf.type" : "ot.Model"

3. Compounds can be encoded as URI via "http://coffer.informatik.uni-mainz.de/compound/<url-encoded-smiles>"

REST call
curl -H "Accept:chemical/x-daylight-smiles" http://coffer.informatik.uni-mainz.de/compound/O%3DC%28NC3%3DCC2%3DC%28C%3DC3%29C1%3DCC%3DC%28NC%28C%29%3DO%29C%3DC1C2%29C

4. A prediction via POST call to a model URI returns a redirect to the prediction result

REST call
curl -v -X POST -d "compound_uri=http://coffer.informatik.uni-mainz.de/compound/O%3DC%28NC3%3DCC2%3DC%28C%3DC3%29C1%3DCC%3DC%28NC%28C%29%3DO%29C%3DC1C2%29C" http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse
< Location: http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/prediction/58bd5898ef2bbd4dfce18de0ecf62c07

5. Accessing the prediction returns predicted class and probability

REST call
curl http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/prediction/58bd5898ef2bbd4dfce18de0ecf62c07
"dc.identifier" : "http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/prediction/58bd5898ef2bbd4dfce18de0ecf62c07",
"rdf.type" : "ot.Dataset",
"ot.dataEntry" : {
"ot.compound" : "http://coffer.informatik.uni-mainz.de/compound/O%3DC%28NC3%3DCC2%3DC%28C%3DC3%29C1%3DCC%3DC%28NC%28C%29%3DO%29C%3DC1C2%29C",
"ot.values" : [ {
"ot.feature" : "http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/feature/predicted",
"ot.value" : "active"
}, {
"ot.feature" : "http://coffer.informatik.uni-mainz.de/CPDBAS_Mouse/feature/probability",
"ot.value" : 0.9985994877678762
} ]