Résumé

The need to predict phage-bacteria interactions is a nowadays concern to overcome bacterial resistance issue; public genome databases contain highly imbalanced datasets which have hindered this task. Throughout this paper we will investigate, implement and evaluate One-Class Learning algorithms in order to predict phage-bacteria interactions using only positive samples. We will use the programming language Python aided by Scikit-Learn, Tensorflow and keras to develop the machine learning models and test them with real phage-bacteria interactions datasets. We trained the models using cross validation technique generating a gridsearch with all the datasets to find several combinations of hyperparameters available. Furthermore, we optimized those hyperparameters by using Pareto fronts based on seven different performance metrics, improving the efficiency of each algorithm for a given dataset. To refine each algorithm's performance separately we used the ensemble learning technique with an odd number of algorithms by simple voting. Finally, we managed to achieve an overall performance of 80% in predicting phage-bacteria interactions trained only with positive classes, this percentage in practice means that when a patient has an infection resistant to antibiotics, we have 80% of saving the life rather than maybe a 0% while finding the correct phage for the pathogenic host.

Einzelheiten

Aktionen

PDF