Can India provide an army of data scientists to the world, like the IT engineers that it provided in the last decade?
In the last two decades, we have seen a huge impact of information technology (IT) on businesses. Today, almost all business transactions and processes, internal and external, happen on a computer or mobile and by the use of a network, primarily the Internet. The penetration of smartphones has extended this automation to the last mile i.e. to the consumers. Large network-based information systems facilitate various business processes such as sales order, financial transactions, human resource management, customer service management and so on—the Aadhaar project being a splendid example of IT reaching the last mile.
The IT revolution has made businesses much more efficient by allowing transactions to be fast, error-free and trackable. It has not necessarily made them ‘intelligent’ but it has paved the way for mining intelligence by creating a wealth of digitized data. For instance, product manufacturers know what product is sold on a given day, at what time and at which outlet. A lot of transactions happen naturally on the web, through e-commerce.
In the coming decade, i.e. the decade of data science, we will analyse these data on a large scale to identify trends, find anomalies and most importantly, predict future trends. But let us pause here to understand what data science is. Suppose you have the transcripts of the pitch of sales people in a company including the ones that have led to a sale. You wish to predict which sales pitches are good and what makes them good. Traditionally, one will probably use a sales coach to understand this.
However, the data science way is remarkably different—it uses unstructured data, i.e. transcripts, and derives features such as the length of the call, counts of courteous expressions like “may I”, “please”; counts of words about product value—“automatic”, “fast” and so on. Using these features and statistics, an algorithm builds a model to predict the success of the pitch. For instance, the algorithm may discover that pitches with a good number of positive words, good number of product value words and with a moderate call length leads to success. A totally new sales pitch can then be checked against such conditions to predict if it is good or not.
How is this revolutionary? We can find out whether a new sales person is ready to be put on the job or needs further training based on his/her pitch quality. This model can be linked to the sales process, where it provides personalized feedback for improvement to each salesperson immediately after a sales call. Eventually, one day, the algorithm will replace the salesperson!
Better algorithms, more processing power and the availability of large data sets have enabled highly accurate models. However, this process of data science itself is hardly automated. Initially, a lot of work is required to convert unstructured data in to useful models and integration of these models into the information systems. The person responsible for all of this is the formidable data scientist.
Can India provide an army of data scientists to the world, like the IT engineers that it provided in the last decade? Can this become the engine of our economic growth in the coming decade? It is not only a huge opportunity, but a challenge, too.
First, we need trained manpower. Data scientists need to be adept in programming skills, database skills and basic statistics. Today, we hardly have people who understand both computer science and statistics. Our undergraduate education needs to take up data science courses in a big way. A year ago, we took a bold step of introducing data science for classes V to VIII. These kids successfully built their own friends’ predictor (www.datasciencekids.org). India should aspire to be a leader in providing data science education early on.
Second, we need to be among the leaders in research and innovation in machine learning. Though India became an IT services powerhouse without any great technology innovation, this will not hold for data science services, for innovation and disruption have gathered rapid pace in recent years.
We need to use the latest technology in our services and also be the innovators. Data science service offering will not only be people driven but also product driven. Unfortunately, there is a huge gap here. Our analysis (ml-india.org) showed that Tsinghua University in China alone produces more machine learning papers in top conferences than all universities in India put together. Similarly, companies in the US have taken a lead in machine learning innovation.
Lastly and most importantly, we need entrepreneurs who can go across the globe and solicit data science business for their companies in India—the new Murthys and Premjis. Let us get ourselves ready to become the world’s data science services powerhouse.
Varun Aggarwal is co-founder and chief technology officer of Aspiring Minds, a company that uses data science to enable meritocracy in the labour market.