Machine Learning India bio photo

Machine Learning India

Fostering data science and machine learning in India

Email Mailing List Twitter Github

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here

Dr. Vasudeva Varma is a professor and the Dean (Research & Development) at IIIT Hyderabad, India. He obtained his Ph.D from the Department of Computer and Information Sciences, University of Hyderabad in 1996.

His research interests are in the broad areas of Information Retrieval, Extraction and Access. More specifically, social media analysis, cross language information access, summarization and semantic search. He also works in the areas of Cloud Computing and Reuse in software engineering.


Vasudeva on ..


ML India: We’d like to start off by understanding your background, how you got into the machine learning space and your association with IIIT-H?

Vasudeva: I have initiated my work in the area of Natural Language Processing at IIT Kanpur in 1991, when I was working with Prof Rajeev Singhal (then the HoD of CS department at IIT-K and now the Director of IIT-BHU). I developed a computational model for understanding metaphors which later became my PhD thesis. In 1993, Prof. Sangal and other members of NLP group moved to University of Hyderabad for a joint initiative between IIT-K and University of Hyderabad. I completed my PhD at the University of Hyderabad and then moved to the US where I worked in the technology vertical of a large financial firm in New York. Later, I moved to Silicon Valley and worked for a start-up for 5 years. Life had totally changed with 16-18 hour work days, but the excitement never went away thinking of the fact that our work would impact millions of people. But at heart I was always an academic and wanted to come back to a university setting and work in India. The opportunity came when my colleague and I decided to start a development centre in Hyderabad and Delhi. I also started teaching a course at IIIT-H (Prof Sangal was the director of IIIT-H at that time) as a guest faculty which I found to be a really satisfying transformation. In 2001, we folded our start-up and I decided to join IIIT-H full time.

ML India: What is the kind of work you were doing in the start-up in the valley?

Vasudeva: I was working at InfoDream, where my work focused on extracting information in the HR space like extracting information from unstructured/semi structured CVs, performances and appraisal forms, emails etc., and create a structured database on which queries could be performed. This was closely related to my PhD work. The work there actually motivated me to take up such problems in an academic setting and work on them in a more involved way. NLP was necessary to execute these kinds of tasks but was not sufficient. That’s when I started researching about ontologies and other solutions and came across data analytics and started experimenting with statistical methods like categorization, processing etc. Later, I co-founded a start-up that was focused on similar lines of semantic search where, apart from text, we could perform searches on other rich media like pictures, videos and speech.

ML India: Once you’d taken up the faculty position at IIIT-H, how did you start pursuing your research in ML?

Vasudeva: My work here was a natural extension to the past experience that I had in the valley. Working in start-ups is sort of a roller coaster ride since you have to meet deadlines, satisfy investors and what not. But I realized that this actually shaped all of my career choices later on. While I was a guest lecturer at IIIT-H, I saw the need to focus on deeper aspects of learning. When I joined full-time, I started a Search and Information Extraction lab where many difficult problems were dealt with a strong academic rigor. This motivated me to take up these large, relevant and important problems in an academic setting in a more involved and immersive way.

[Top]

ML India: Now that you have the advantage of having seen specific areas of ML, like unstructured data analysis and data science in general, maturing over the last 15 years, what are your general thoughts on the quality of ML research happening in India?

Vasudeva: It’s like two sides of a coin. On one side, there is a huge jump in the quality of research happening in academia. There are lots of motivated students in top institutions driven towards research. Companies like Facebook, Google and Aspiring Minds are doing very exciting and impactful work in the industry. On the other hand, there are some companies which are abusing ML’s popularity without completely understanding what it means. But I think that’s okay, it’s good that everyone is talking about it.

ML India: Could you talk about the inception and evolution of SETU Software Systems and what are the various problem statements being tackled under it?

Vasudeva: SETU (now Veooz) started in a very interesting manner. My first PhD student, Dr. Prasad Pingali, was working on the Indian language retrieval problem. We really wanted to execute this on a large scale - across documents over the internet and across multiple languages. We developed a crawler that crawled through all Indian language content around 2003-2004, a period when platforms like Hadoop and Nutch were not very popular. We were essentially crawling all possible Indian language content across the world. Be it the Indian scripts like Devanagri or content written in Romanised scripts. There are lots of variations in the way that Indian languages exist and are used, and solving these problems was very hard. We used some of the popular techniques that processed speech into text to solve these problems and published our work in conferences like WWW. When we were demonstrating this particular technology in our annual R&D showcase, Rediff’s CTO caught a look at this and said that this was going be big since it was giving beautiful results when tested on random queries. He immediately offered the licence and resources for its development and we happened to start SETU at that point in time. From there on, we quickly went on to create commercialized enterprise solutions for all Indian language content and also for most of the world language content which share similar characteristics of Indian languages. We have built search engines for roughly 200 languages completely based on statistical methods and machine learning.

ML India: Any other problems you’ve been working on apart from those as part of SETU which have a similar flavour of IR, ML or NLP?

Vasudeva: I’ve been working on the multi document query dependent summarization problem. NIST had floated a challenge on multi query summarization. Many institutions including Stanford, Columbia, MSR, IBM research etc., who specialized in summarization area, participated in this competition regularly. We participated for the first time in 2005 and we ranked 4th, 6th, 8th in various categories when we were expecting to be at the bottom of the table. For a year we focused on improving our work, and in 2006 we ranked 1 in all the categories beating all the big players. The flavour of the competition changes every year. For example, in a subsequent year, progressive summarization was the agenda where a new set of documents was related to an older set. Thus, summarization had to be done keeping in mind the perspective that we’d been through the previous set.

There was a continuation to this competition called TAC, where there were various KBP (knowledge based population) tasks. The idea was, if you were given billions of documents and given a Wikipedia like page, you have to read the page and create an info box for that page automatically. You need to understand the named entities in the page and link these entities to the documents in which they occured. We beat some top competitors here too. These are the problems which are being focused on in our lab.

ML India: Could you talk about the genesis of IIIT-H Foundation and how it got executed into something that we see today? And, also about one of the initiative Aavishkar, which incubates ML in core technology based start-ups?

Vasudeva: When IIIT-H started in 1998, one of the guiding visions that was set was that it was supposed to function and act like a Stanford to the Silicon Valley and an MIT to the Boston area. Essentially, it was supposed to work with the entrepreneurial and the industrial ecosystem very closely.

Initially, we started IIIT-H Foundation as an incubator alone. We started the Centre for Innovation and Entrepreneurship (CIE) in 2008 and SETU was the first company to be incubated there. IIIT-H has been very strong in the research environment. We nurtured many companies, one was on speech, one on computer vision and some on text analysis. After 2-3 years we were around 15 companies and we made a major call to provide this environment to people who really wanted to set up their own companies. That is where we literally grew and by 2014 we were around 72 companies, which made IIIT-H Foundation the largest academic incubator in the country. It worked really well and created a dynamic and vibrant environment in the campus. It was at this time when the government started T-Hub where IIIT-H, ISB and NALSAR came together to start an incubator; many of our companies were transferred to T-Hub.

We wanted to provide everything that is required to improve a start-up’s chances of success. This is where we launched Aavishkar. It’s an accelerator program which helps deep technology companies coming from anywhere with no restrictions in terms of the background of the founders, or geography of origin. Deep technology companies are the ones to which we can add value through the research happening in IIIT-H.

IIIT-H Foundation’s goal is to provide a technology marketplace and that the research developed in IIIT-H should be practiced in a manner which can be easily consumed by various companies.

[Top]

ML India: Could you talk about any other short term or long term visions for IIIT-H Foundation?

Vasudeva: As soon as you create a marketplace for technology, you also need to give the advantage to the companies that are licensing this technology. They should be able to stand easily on the shoulders of the work done by the researchers and be able to quickly add value to it, which is what we focus on. Also, not many ecosystem players are aware of the impact of deep technology companies. So we need investors who can think in this direction and larger partner companies that could consume the products from the deep technology companies. This is another value that we want to add. So, an investor ecosystem, partner ecosystem and consumer ecosystem need to be developed, which is a challenge. We are working with players in the industry and in the investing market to promote and nurture these companies.

ML India: How can one apply to these incubation programs?

Vasudeva: We believe we are a very agile organisation, and most of the things require an email to be sent to the addresses mentioned on our website. They can also look at our research partnerships. If these sources are unresponsive, I’d share my own email for relevant queries.

ML India: Any other specific projects that you probably have in mind in this domain which you would want to talk about?

Vasudeva: IIIT-H is very strong in AI related stuff like multi agent systems, robotics, NLP, computer vision and cognitive sciences. One project deals with automatically coming up with commentary text from the videos. This program is supposed to improve itself over time so that it can analyse the changes in pattern. For instance, it will acknowledge the change in the technique and the style of a shot played by a particular cricketer. Also, multi-agent systems is another area in which every individual is trying to grab information and coordinate the decisions in the tasks that they are doing.

Infact, a recent development has been TCS setting up the Kohli Central Intelligence System (KCIS) for us, which is expected to exponentially increase the activities in further areas of AI.

We have really influential people on the advisory committee and they’re dreaming really big things. Through our research we want to be known as one of the top 25 institutions in the world making impact in AI and CS. This is our mission for 2025. We hope these kinds of activities will help us reach that goal.

ML India: Do you think there are any low hanging interventions that could be used to improve the ecosystem of ML in India?

Vasudeva: I was fortunate to be a founding members of iKDD, India Chapter of KDD, which runs Conference on Data Sciences (CoDS), India’s conference in machine learning, which is trying to push data science and instil the culture of ML research among the Indian audience. There are various communities which keep in sync and are helping each other. For example, NLP has ICON and database community has COMAD.

However, the set of people active in pushing forward these initiatives is very small. Given the size of our country and the need to push this idea forward, we need an increased participation in such initiatives. But compared to what the scenario was 5 years back, we have made good progress and I believe this will grow in the times to come.

[Top]

ML India: Any concluding comments?

Vasudeva: We’re in the middle of very exciting times. Data as a problem is definitely increasing, everyone is talking about it. I think we should be targeting towards smart data within the big data, which can be more meaningful and efficient in solving essential problems. A great environment for data science exists nonetheless, and students today have ample resources and platforms to pursue and demonstrate their work. They should leverage such opportunities and create the future, rather than following the future.

ML India: It was great talking you, Vasudeva. Thank you for your time and words. All the very best to you from ML-India!