Machine Learning India bio photo

Machine Learning India

Fostering data science and machine learning in India

Email Mailing List Twitter Github

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here

India lags…Indian academia does less machine learning papers than a single university in China and all the new companies do lesser than a single company in the USA.

The power of machine learning

Machine learning is the science of learning to do tasks by observing examples. It is transforming the world by enabling machines do all sorts of ‘intelligent’ tasks such as understanding images, human speech, predicting preferences, diseases and many others. With tremendous amount of data, interconnectedness, sophisticated algorithms and huge processing power in small devices, machines do things which were beyond their reach until recently. On the other hand, machines are still unable to do many tasks which humans do effortlessly, say understanding a story – this constitutes the next big challenge for machines, well, the humans that build these machines! There is a renewed, although periodic, fear that machines will put people out of jobs- leaving only the very highly skilled in the job marketplace. [1] For the first time, some experts in AI/ML whisper fears of machines challenging the human civilization [2]!

Whatever the future may hold, machine learning has created efficiencies in the marketplace, leading to new hugely successful business models. The use of recommender systems [3] is just one powerful example, others being in tagging images/videos [4] and those in education technology such as automatic grading/feedback [5] on tests. There has been a sudden spur in better performing machine learning techniques, viz. deep learning. Current businesses rely more and more on data and the predictive power associated with building predictive models. All successful digital product/services companies have a machine learning or a data sciences team and hang around in machine learning conferences. This includes the surprising visit of Mr. Zuckerberg in NIPS a couple of years back. Together with this, new machine learning techniques coupled with advances in neurosciences is helping us understand the probable basis and mechanism of intelligence better. These techniques have made a substantial business impact and there is today hope to better understand intelligence.

Machine learning a wholesome opportunity for India: Satisfies 6 important criteria

In some way, it has never been so exciting! Where should India be, as machines become more intelligent? It is simple – it should be making the most of the opportunity. We need to participate and contribute to high quality research, innovation and also convert new results into effective business models.  The opportunity is global – the location of a digital business doesn’t constrain its market [6] – a company in a Bangalore or a Gurgaon could serve the whole world. Notwithstanding, ML can help solve number of local problems. For instance, it can help find and monitor the poverty landscape in a country in a cheap way [7]. Machine learning is not just a scientific or an academic pursuit. The economy and society can get great returns by the research and innovation in ML.

India needs to work with the world as an equal partner to lead the next generation of disruptive innovation in machine learning. The good news is that it doesn’t require expensive infrastructure – a person with just a laptop and say MATLAB can write 1000 (or lesser) lines of code which can revolutionize how things are done. [8] The fundamental skills required are those in math and coding, in which Indians do have an enviable past record. These need to be combined with tons of creativity, passion and a system’s approach to things.

To sum up, machine learning is one of the few fields which fulfill the following six criteria:

  1. It has potential to lead several disruptive innovations in the coming decade;
  2. It has huge impact to businesses and the economy
  3. It can help provide necessary elements to solve local problems including those in poverty and social development
  4. It has potential to lead to new scientific results not known to humanity before
  5. It does not require expensive infrastructure and has moderate to low gestation period;
  6. The primary skills it need are already in the DNA and culture of India, may I, for the last 2000 years.

What is more- it is cool, young and fashionable!

We need to grab the ML opportunity! But, where are we today?

Before I delve into it objectively, let me mull over a sort of strange historic connection India has had with machine learning. The way of science in India was inductive and empirical – say calculations of which stars and planets will show up when or how to find roots of a certain equations. For many of these, we did not have a proof or any formalism around it [9]. In contrast, the West brought in formal knowledge based methods and theories to explain, prove and generalize the empirical [10]. These were much more effective and reduced interest in the magical inductive results. Today, we find for a lot of more complex problems, these empirical methods do much better than the formal ones – that is the learning in AI from the last couple of decades. We have come a full circle.

Figure 1: Universities by number of publications. Please refer to ml-india.org/insights for more details

Let us start by looking at academic research. We do find pockets of excellence. There is some critical mass of machine learning researchers in IISc. They can collaborate to tackle hard and interesting problems and publish world class results. Also, in some of the IITs, such as IIT Delhi, we find some people in machine learning including those who have returned with a foreign education. These are complemented with research labs of mostly foriegn companies such as Xerox Research, IBM Research, TCS Innovation Labs and Microsoft Research. Some ecosystem building initiatives have already started - IKDD CODS [11], a conference in data science with world class standards, started in India in 2013.

We objectively looked at the total papers in the top 13 machine learning conferences in the last 15 years (see figure 1). All put together, India produces lesser papers than a single world class university in China – the Tshingua university. We do one-third of Carnegie Mellon University of USA. Our total research papers have been 745, China is 3956 and USA is 19,000+. We rank 15th by the number of documents we produce, with a Singapore, a Spain, an Israel and a Canada bettering us (see figure 2). We have a lot to catch up and probably a disruptive mission based approach can make this happen.

Figure 2: Country by number of publications. Please refer to ml-india.org/insights for more details

Let us look at industrial research, where the picture is even starker. China produces 10x more papers, with Singapore being twice and US being 50 times ahead. New companies [12] are the engine of growth and have the potential to create disruption and maximum business value. In India, we could only find 12 papers from 5 new companies in the last 15 years! Couple of them are from my group at Aspiring Minds - our work on programming assessment and spoken English assessment at KDD. Another notable is Strand Life Sciences, with some high quality work in Bioinformatics, Infibeam in e-commerce and S&I Engineering pushing the frontiers in computer vision. Both Strand and SI are companies that came out of IISc and are great examples to emulate. Compare this to USA or China: a Linkedin has 17, Facebook 15, a Baidu 22, Alibaba 4, Tencents 3, Renren 2 followed by a long tail of companies. No wonder MIT Technology Review’s top 50 smartest companies have 7 out of the top 10 from US, 3 from China and none from India! One may note that much of the open source software and libraries for storing, handling, searching and analyzing big data has come from new companies and not universities. Most recently, Google made its machine learning library public. Recommender systems were also advanced in the industry [13].

Startups need to invest in research specifically in the area of machine learning. They need to follow the examples of (and even outperform) a Facebook, a Baidu, a Tencents and an Aspiring Minds. These companies have separate data science research groups. This will let them have a sustainable business edge, be disruptive and globally competitive. Otherwise, they we will be ripped off, once again [14], by new innovations happening in other parts of the world. The speed of innovation and disruption is much faster now – we need to keep running to be at the same pace!

So what do we need to do? Probably a lot…

To make a small beginning, my colleagues and I at Aspiring Minds, put together ‘ML India’ – a place for the machine learning community in India to interact, share, develop ideas, continuously assess itself and achieve new heights. No surprises, we take a data visualization [15] approach to it! If you do machine learning in India, do let us know and we will highlight you on ml-india.org. If you have something interesting, share on the ml-india@googlegroups.com mail-list. This is just a beginning – ml-india.org will do a lot more for the community, both through online and offline media. Keep watching!

Wait for another post on what we can do to super accelerate our machine learning efforts. Till then check out how, for the first time in the world, we are teaching 12-15 year old kids in India supervised learning [16]!

- Varun Aggarwal with inputs from Shashank Srikant and Harsh Nisar

To participate in a discussion on this, view the article on Linkedin


[1]The Second Machine Age - By Erik Brynjolfsson‎

[2] Read Gates/Hawkings

[3] https://en.wikipedia.org/wiki/Recommender_system

[4] http://www.programmableweb.com/news/rapid-rise-deep-learning-computer-vision-technology/analysis/2015/06/19

[5] http://research.aspiringminds.com/publications/

[6] https://www.linkedin.com/pulse/winner-takes-all-many-winners-varun-aggarwal

[7] http://krvarshney.github.io/pubs/VarshneyCANSXS_big2015.pdf

[8] Well, it could require computer clusters and farms. However, a ton of innovation doesn’t require it. Also, such resources are easy to ‘summon’ these days, say using amazon servers.

[9] Strangely we did have the best of formal methods in metaphysics, but not as much in physical sciences!

[10] I am reminded of Minsky – Occam’s razor is a bad idea to understand intelligence – unfortunately intelligence isn’t simple to understand.

[11] http://ikdd.acm.org

[12] Loosely defined here as companies that started within the last 15 years.

[13] ideeinc.com, imagga.com, wit.ai, alchemyapi.com, diffbot.com

[14] Remember the industrial revolution?

[15] http://ml-india.org/insights

[16] http://www.datasciencekids.org