Minta is an assistant Professor and Head of Analytics, Aegis School of Business, Data Science and Telecommunication. She completed her PhD in Engineering (Informatics) from University of Leuven – KU Leuven, Belgium and her M. Phil. in Bioinformatics from Kerala University. Her areas of expertise are Perl, Bioperl, Matlab, R, Machine Learning, Cancer Informatics, Data mining, Bioinformatics and Computational Biology. She is an expert in developing algorithms for processing large scale data sets such as genomics, microarray and mass spectrometry images etc. She is also proficient in signal processing and dimensionality reduction methods and is also working in Information Retrieval and Pattern Recognition domain.

Minta on..

**ML India:** We’d like to start off with a brief on your background and how you got into machine learning.

**Minta:** I did my graduation in computer applications from Marian College, Kuttikanam in 2003. During the
final year of my undergraduate studies, my professor, Dr. Gladston
Raj motivated me to pursue advanced studies in computer science and mathematics. After I completed my master’s degree in computer science
from Mahatma Gandhi University in
2005, I got selected for an M. Phil in Bioinformatics at the University of Kerala, Trivandrum. I completed it under the guidance of Dr. Achuthsankar S Nair. At that time,
bioinformatics was as popular as data science is today. A lot of
people were interested in it but not everyone knew what was exactly happening in that area. When I started, I didn’t have much idea about
research in this domain either, but my coursework in M. Phil helped me
develop a deep understanding of it. My project mentor,
Prof. Ramakrishnan Ramaswamy
from the School of Physical Sciences, JNU, encouraged me to
work on ‘Wavelet Analysis’
in Computational Biology for miscRNA
structure prediction. This is how I got introduced to machine
learning. I learned MATLAB, apart from other programming
languages and tools in Bioinformatics, and I believe it is one of the
better tools that people are using in machine learning. After I
completed my M.Phil, I got an offer for a Ph.D. from K U Leuven Belgium, and I continued to extend my
work in the field of computational biology and bioinformatics. I worked in the ESAT- STADIUS lab (Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics). My
professor, Dr. Bart De Moor, who is
from an electrical engineering background, suggested that I should work
in the area of linear algebra. It was quite a new territory for me
because I was from a computer science background and it took me some
time to get used to the new language. I gradually started working on
problems related to dimensionality reduction and
data integration. He
guided me on areas like singular value decomposition, generalized
singular value decomposition, and their applications in bioinformatics.
All these problems that I was working on were based on linearly
separable data. But since most of the practical problems use real-time
data which is non-linearly separable, I started working on non-linear
data integration and classification problems which invariably involved
machine learning. I took some courses in statistical, ML and
probability, and started working on the theoretical aspects of research.
Prof. Johan Suykens, an
expert in machine learning, also guided me on how I should proceed in
this area. We proposed different algorithms that improved upon the existing
algorithms for dimensionality reduction, data integration and
classification, and published our work in several international
journals. Most of the work that I did up till this point was theoretical
and only related to algorithm proposal. After working for nearly 3 years on the
theoretical aspects, I switched to applications part of ML and started
working on its bioinformatics and chemoinformatics applications. After completing
my PhD, I returned to India and was looking to get a job in Mumbai in
the academic domain. Aegis had started a data science school at that
point in time, and were offering really exciting courses, and I joined
Aegis as an Assistant Professor and Head Analytics.

**ML India:** Could you elaborate on some areas in bioinformatics
where you have applied your algorithms?

**Minta:** In one of the applications, we proposed a data integration strategy called the weighted
LS-SVM
classifier. This algorithm offered a single mathematical framework for
data integration and classification problems, hence providing solutions
for many real bioinformatics applications. Compared with the existing
approaches, our approach was a simple mathematical framework for kernel-based data integration. We have applied this algorithm for predicting
breast cancer. We used clinical and microarray data from breast cancer
patients and in-model developments to predict breast cancer.

Another application that we worked on is in the field of chemoinformatics. Machine learning techniques have been widely used in drug discovery and development. In chemoinformatics, machine learning has been used in QSAR studies. In a generalized machine learning algorithm, modern QSAR is characterized by the use of chemical descriptors based on the structure of chemical compounds. To build the models, we first converted the structural description of the compounds into a numerical representation. We then proposed a new chemical descriptor from the connection table of compounds in terms of two vectors: one corresponding to the atoms and the other to the bonds of each compound. This helped us to come up with a new machine learning approach for the identification of biologically active compounds, that is, the weighted chemical descriptors of molecular structure, which identified the inhibitors on Salmonella and Pseudomonas biofilms formation more accurately than other existing chemical descriptors.

**ML-India:** What were the differences that you noticed in the environment of
labs in India and abroad?

**Minta:** In India, I feel there aren’t a lot of options to collaborate
with people in different research areas. While I was in Belgium, I noticed that these
collaborations came really easy. Collaborations were happening even
between students of different departments. For instance, I was in the
engineering department and I collaborated with people from the computer
science, chemistry and bioinformatics departments. There were lots of
weekly and monthly meetings announced along with the topics of
discussion, and we had the option to go and attend these meetings and
discuss various interdisciplinary ideas. The second advantage, in my
case especially since I was working in Bioinformatics, was that I got a
chance to interact with doctors and get hands on the actual data which
proved to be really helpful in my research and development of
algorithms. We also had lots of industry collaborations, and students
got a chance to work in real time environment of big research projects
which were being run by big companies. I don’t feel we have that many
similar opportunities in India. Of course, a lot of startups are coming to the market, and they are collaborating with institutes, but we still need an increase in such collaborations.

Also, I feel that many students in India are still following the traditional educational system where they are looking to study so that they can earn better. This is a reason for many people to switch to data science from their field of study since ‘data science’ has become the most popular field to be working in. People need to realize that they need to have a strong foundation in mathematics, statistics and programming for successfully pursuing data science and that there is no end to the learning in this field. They will need to go in depth of concepts to come up with innovative work. Therefore, I encourage people who are actually interested in data science to pick it as a career. There is a lack of awareness about data science among people in India. They have an interest in data science but they dont know if it will be a suitable career choice for them. I guess the meetups, like the ones that ML-India holds, will definitely help increase the awareness and bring clarity of thought among such people.

**ML India:** How does
any previous knowledge play a role when they actually study data science
courses?

**Minta:** I think previous knowledge plays really helps students to pick data science
concepts really quickly. For example, a student from statistics
background would have a strong foundation in theoretical aspects, but
would need to spend more time to understand how he can use that at the
applications level in data science. On the other hand, if the person has
a strong programming background, he would need to spend more time to
understand statistics and mathematics. In either case, their previous knowledge of a subject would play an important role in helping them get a grab on new concepts. I believe that data science is an interdisciplinary area where we
really need expertise in different domains to churn out efficient
results and all of the different skills have their own role. For
instance, a data scientist focusing on the machine learning aspect,
particularly, should have the desire to play with data using a number of
different techniques and languages, together with the analytical skills
to question and solve problems iteratively. Therefore, for people coming
from different backgrounds into data science, the only thing that
matters is how quick they are at picking up concepts and relating to
their applications with data.

“Anyone who wants to pursue ML should love data”-Minta

**ML-India:** If undergraduates want to get into machine learning, what are
the basic steps they should follow to successfully get into this field,
what can they do in terms of internships or courses, competitions
etc?.

**Minta:** Any student who wants to pursue data science should have a strong
background in mathematics, statistics and probability. Before becoming a
data scientist, one needs to be a good data analyst. One should love the
data, be able to understand the important patterns, and define the
characteristics of the data. At undergraduate level, students should
first build this foundation through traditional coursework, and then
they could take up advanced courses in this field to gain more knowledge
on applying these basic concepts and build useful applications. For
example, regression analysis is a common technique that we use in
statistics, which is also an important part of the machine learning
process. The concept is the same but there is a difference between the
statistical point of view and the machine learning point of view. So if
one has a strong niche of basic statistical concepts with a good
programming background, he/she can easily apply these techniques to
solve an ML problem effectively. Also, various ML competitions would
help the students develop a practical outlook and expose them to diverse
insights on solving the same problem.

**ML India:** Great! Thanks a lot, Minta for taking time out to talk to ML-India. We wish you all the best in your research.