ML-India interview series - Prof. Minta Thomas, Assistant Professor and Head of Analytics, Aegis School of Business, Data Science and Telecommunication

Minta is an assistant Professor and Head of Analytics, Aegis School of Business, Data Science and Telecommunication. She completed her PhD in Engineering (Informatics) from University of Leuven – KU Leuven, Belgium and her M. Phil. in Bioinformatics from Kerala University. Her areas of expertise are Perl, Bioperl, Matlab, R, Machine Learning, Cancer Informatics, Data mining, Bioinformatics and Computational Biology. She is an expert in developing algorithms for processing large scale data sets such as genomics, microarray and mass spectrometry images etc. She is also proficient in signal processing and dimensionality reduction methods and is also working in Information Retrieval and Pattern Recognition domain.

Minta on..

Applications of her work
Importance of prior knowledge for new students

ML India: We’d like to start off with a brief on your background and how you got into machine learning.

Minta: I did my graduation in computer applications from Marian College, Kuttikanam in 2003. During the final year of my undergraduate studies, my professor, Dr. Gladston Raj motivated me to pursue advanced studies in computer science and mathematics. After I completed my master’s degree in computer science from Mahatma Gandhi University in 2005, I got selected for an M. Phil in Bioinformatics at the University of Kerala, Trivandrum. I completed it under the guidance of Dr. Achuthsankar S Nair. At that time, bioinformatics was as popular as data science is today. A lot of people were interested in it but not everyone knew what was exactly happening in that area. When I started, I didn’t have much idea about research in this domain either, but my coursework in M. Phil helped me develop a deep understanding of it. My project mentor, Prof. Ramakrishnan Ramaswamy from the School of Physical Sciences, JNU, encouraged me to work on ‘Wavelet Analysis’ in Computational Biology for miscRNA structure prediction. This is how I got introduced to machine learning. I learned MATLAB, apart from other programming languages and tools in Bioinformatics, and I believe it is one of the better tools that people are using in machine learning. After I completed my M.Phil, I got an offer for a Ph.D. from K U Leuven Belgium, and I continued to extend my work in the field of computational biology and bioinformatics. I worked in the ESAT- STADIUS lab (Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics). My professor, Dr. Bart De Moor, who is from an electrical engineering background, suggested that I should work in the area of linear algebra. It was quite a new territory for me because I was from a computer science background and it took me some time to get used to the new language. I gradually started working on problems related to dimensionality reduction and data integration. He guided me on areas like singular value decomposition, generalized singular value decomposition, and their applications in bioinformatics. All these problems that I was working on were based on linearly separable data. But since most of the practical problems use real-time data which is non-linearly separable, I started working on non-linear data integration and classification problems which invariably involved machine learning. I took some courses in statistical, ML and probability, and started working on the theoretical aspects of research. Prof. Johan Suykens, an expert in machine learning, also guided me on how I should proceed in this area. We proposed different algorithms that improved upon the existing algorithms for dimensionality reduction, data integration and classification, and published our work in several international journals. Most of the work that I did up till this point was theoretical and only related to algorithm proposal. After working for nearly 3 years on the theoretical aspects, I switched to applications part of ML and started working on its bioinformatics and chemoinformatics applications. After completing my PhD, I returned to India and was looking to get a job in Mumbai in the academic domain. Aegis had started a data science school at that point in time, and were offering really exciting courses, and I joined Aegis as an Assistant Professor and Head Analytics.

ML India: Could you elaborate on some areas in bioinformatics where you have applied your algorithms?

Minta: In one of the applications, we proposed a data integration strategy called the weighted LS-SVM classifier. This algorithm offered a single mathematical framework for data integration and classification problems, hence providing solutions for many real bioinformatics applications. Compared with the existing approaches, our approach was a simple mathematical framework for kernel-based data integration. We have applied this algorithm for predicting breast cancer. We used clinical and microarray data from breast cancer patients and in-model developments to predict breast cancer.

Another application that we worked on is in the field of chemoinformatics. Machine learning techniques have been widely used in drug discovery and development. In chemoinformatics, machine learning has been used in QSAR studies. In a generalized machine learning algorithm, modern QSAR is characterized by the use of chemical descriptors based on the structure of chemical compounds. To build the models, we first converted the structural description of the compounds into a numerical representation. We then proposed a new chemical descriptor from the connection table of compounds in terms of two vectors: one corresponding to the atoms and the other to the bonds of each compound. This helped us to come up with a new machine learning approach for the identification of biologically active compounds, that is, the weighted chemical descriptors of molecular structure, which identified the inhibitors on Salmonella and Pseudomonas biofilms formation more accurately than other existing chemical descriptors.

ML-India: What were the differences that you noticed in the environment of labs in India and abroad?

Minta: In India, I feel there aren’t a lot of options to collaborate with people in different research areas. While I was in Belgium, I noticed that these collaborations came really easy. Collaborations were happening even between students of different departments. For instance, I was in the engineering department and I collaborated with people from the computer science, chemistry and bioinformatics departments. There were lots of weekly and monthly meetings announced along with the topics of discussion, and we had the option to go and attend these meetings and discuss various interdisciplinary ideas. The second advantage, in my case especially since I was working in Bioinformatics, was that I got a chance to interact with doctors and get hands on the actual data which proved to be really helpful in my research and development of algorithms. We also had lots of industry collaborations, and students got a chance to work in real time environment of big research projects which were being run by big companies. I don’t feel we have that many similar opportunities in India. Of course, a lot of startups are coming to the market, and they are collaborating with institutes, but we still need an increase in such collaborations.

Also, I feel that many students in India are still following the traditional educational system where they are looking to study so that they can earn better. This is a reason for many people to switch to data science from their field of study since ‘data science’ has become the most popular field to be working in. People need to realize that they need to have a strong foundation in mathematics, statistics and programming for successfully pursuing data science and that there is no end to the learning in this field. They will need to go in depth of concepts to come up with innovative work. Therefore, I encourage people who are actually interested in data science to pick it as a career. There is a lack of awareness about data science among people in India. They have an interest in data science but they dont know if it will be a suitable career choice for them. I guess the meetups, like the ones that ML-India holds, will definitely help increase the awareness and bring clarity of thought among such people.

ML India: How does any previous knowledge play a role when they actually study data science courses?

Minta: I think previous knowledge plays really helps students to pick data science concepts really quickly. For example, a student from statistics background would have a strong foundation in theoretical aspects, but would need to spend more time to understand how he can use that at the applications level in data science. On the other hand, if the person has a strong programming background, he would need to spend more time to understand statistics and mathematics. In either case, their previous knowledge of a subject would play an important role in helping them get a grab on new concepts. I believe that data science is an interdisciplinary area where we really need expertise in different domains to churn out efficient results and all of the different skills have their own role. For instance, a data scientist focusing on the machine learning aspect, particularly, should have the desire to play with data using a number of different techniques and languages, together with the analytical skills to question and solve problems iteratively. Therefore, for people coming from different backgrounds into data science, the only thing that matters is how quick they are at picking up concepts and relating to their applications with data.

“Anyone who wants to pursue ML should love data” -Minta

ML-India: If undergraduates want to get into machine learning, what are the basic steps they should follow to successfully get into this field, what can they do in terms of internships or courses, competitions etc?.

Minta: Any student who wants to pursue data science should have a strong background in mathematics, statistics and probability. Before becoming a data scientist, one needs to be a good data analyst. One should love the data, be able to understand the important patterns, and define the characteristics of the data. At undergraduate level, students should first build this foundation through traditional coursework, and then they could take up advanced courses in this field to gain more knowledge on applying these basic concepts and build useful applications. For example, regression analysis is a common technique that we use in statistics, which is also an important part of the machine learning process. The concept is the same but there is a difference between the statistical point of view and the machine learning point of view. So if one has a strong niche of basic statistical concepts with a good programming background, he/she can easily apply these techniques to solve an ML problem effectively. Also, various ML competitions would help the students develop a practical outlook and expose them to diverse insights on solving the same problem.

ML India: Great! Thanks a lot, Minta for taking time out to talk to ML-India. We wish you all the best in your research.

Machine Learning India

ML-India interview series - Prof. Minta Thomas, Assistant Professor and Head of Analytics, Aegis School of Business, Data Science and Telecommunication

Related Articles