Machine Learning India bio photo

Machine Learning India

Fostering data science and machine learning in India

Email Mailing List Twitter Github

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here

Data Sets


We wish that data sets from India are readily available to practitioners across the world for research and development purposes. We have hosts some data sets below. Want us to host your open data set? Write to us!

Name Link/Email Institution/ Organisation Description
Multiword Bengali Expression IIT Kharagpur
Soil and Water Assessment Tool Soil and Water Assessment Tool
Data Meet Data Meet
Collaborative Research in Computational
Neuroscience CRCNS Multiple datasets
Reserve Bank Of India Govt. Of India This Section provides data on various aspects of Indian economy, banking and finance. While the current data defined as data for the past one year is available at the links provided below, researchers may also access data series available in the Database on Indian Economy link available on this page.
Central Board of Excise and Customs Govt. Of India
iAWE Nipun Batra (IIIT-D) Our dataset characterises the unique aspects of energy, water and network in India. It was published as a part of our Buildsys 2013 paper.
National Portal Of India Govt. Of India The objective behind the Portal is to provide a single window access to the information and services being provided by the Indian Government for citizens and other stakeholders.
Bangalore Open Data It is a repository of data, code and related artifacts that I have collected in my personal capacity ( My personal introductory blog post ). Its targeted at data enthusiasts, data scientists, researchers and developers who are interested in public data related to Bangalore.
DataMeet Repository Multiple datasets
Indian Geo-platform (ISRO) Govt. Of India
India Trading Economics Data
Census Of India Govt. Of India The Indian Census is the largest single source of a variety of statistical information on different characteristics of the people of India. To scholars and researchers in demography, economics, anthropology, sociology, statistics and many other disciplines, the Indian Census has been a fascinating source of data.
Survey of India Govt. Of India
National Data Bank Govt. Of India The National Data Bank of Socio-Religious categories is developed with a view to provide users access to all data, pertaining to various aspects of socio-economic life of population falling in different social/religious categories, from a single window.
IIT Delhi Iris Database IIT Delhi The IIT Delhi Iris Database mainly consists of the iris images collected from the students and staff at IIT Delhi, New Delhi, India.
Extreme Classification Repository Microsoft Research The objective in extreme multi-label learning is to learn a classifier that can automatically tag a datapoint with the most relevant subset of labels from an extremely large label set. This page provides benchmark datasets and code that can be used for evaluating the performance of extreme multi-label algorithms.
IIIT 5K-word dataset IIITH Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images.
Online Handwriting Recognition IIITH We have prepared a dataset containing 1,560 characters and 400 words with intention of providing common benchmark for air handwriting character recognition and allied research.
Classification of Boundaries of an RGBD Image IIITH We use both image and depth cues to infer the labels of edge pixels. We start with a set of edge pixels obtained from an edge detection algorithm and the goal is to assign one of the four labels to each of these edge pixels. Each edge pixel is uniquely mapped to one of the contour segments. Contour segments are sets of linked edge pixels.
Sports-10K and TV Series-1M Video Datasets IIITH We introduce two large video datasets namely Sports-10K and TV series-1M to demonstrate scene text retrieval in the context of video sequences. The first one is from sports video clips, containing many advertisement signboards, and the second is collection of TV series frames, contains more than 1 million frames.
India Statistical Data Quandle Multiple datasets
India Statistical Data Knoema Multiple datasets
AMEO 2015 Aspiring Minds It is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality.
Data Science For Kids Aspiring Minds A dataset containing kids' rating of random face cards on a scale of 1-5 according to their inclination to befriend the person on the card. These cards had distinguishing feature sets like old names & new names, gender and hobby type.
Aadhaar data catalog Govt. Of India Aadhaar data catalog is a place to view numerous Datasets generated in UIDAI ecosystem. It will help you to surface out your own research, application on the data which is collected at national level. Datasets are available in the form of CSV.
Open Government Data (OGD) Platform India Govt. Of India It is a platform for supporting Open Data initiative of Government of India. The portal is intended to be used by Government of India Ministries/ Departments their organizations to publish datasets, documents, services, tools and applications collected by them for public use.
Code Data Set + Programming Features API Mail to: Aspiring Minds We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem.