Machine Learning India bio photo

Machine Learning India

Fostering data science and machine learning in India

Email Mailing List Twitter Github

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here

Data Sets

Introduction

We wish that data sets from India are readily available to practitioners across the world for research and development purposes. We have hosts some data sets below. Want us to host your open data set? Write to us!


Name Link/Email Institution/ Organisation Description
Multiword Bengali Expression http://cse.iitkgp.ac.in/~tanmoyc/Tagged_MWE IIT Kharagpur
Soil and Water Assessment Tool http://swat.tamu.edu/software/links/india-dataset/ Soil and Water Assessment Tool
Data Meet nisha@datameet.org Data Meet
Collaborative Research in Computational
Neuroscience
https://crcns.org/data-sets CRCNS Multiple datasets
Reserve Bank Of India https://www.rbi.org.in/Scripts/Statistics.aspx Govt. Of India This Section provides data on various aspects of Indian economy, banking and finance. While the current data defined as data for the past one year is available at the links provided below, researchers may also access data series available in the Database on Indian Economy link available on this page.
Central Board of Excise and Customs https://www.icegate.gov.in/DailyList/DL Govt. Of India
iAWE http://iawe.github.io/ Nipun Batra (IIIT-D) Our dataset characterises the unique aspects of energy, water and network in India. It was published as a part of our Buildsys 2013 paper.
National Portal Of India https://india.gov.in/ Govt. Of India The objective behind the Portal is to provide a single window access to the information and services being provided by the Indian Government for citizens and other stakeholders.
Bangalore Open Data http://openbangalore.org/ It is a repository of data, code and related artifacts that I have collected in my personal capacity ( My personal introductory blog post ). Its targeted at data enthusiasts, data scientists, researchers and developers who are interested in public data related to Bangalore.
DataMeet Repository https://github.com/datameet Multiple datasets
Indian Geo-platform (ISRO) http://bhuvan.nrsc.gov.in/data/download/index.php Govt. Of India
India Trading Economics Data http://www.tradingeconomics.com/india/indicators
Census Of India http://www.censusindia.gov.in/2011census/population_enumeration.html Govt. Of India The Indian Census is the largest single source of a variety of statistical information on different characteristics of the people of India. To scholars and researchers in demography, economics, anthropology, sociology, statistics and many other disciplines, the Indian Census has been a fascinating source of data.
Survey of India http://www.surveyofindia.gov.in/pages/show/86-mapsdata Govt. Of India
National Data Bank http://mospi.gov.in/national_data_bank/index.htm Govt. Of India The National Data Bank of Socio-Religious categories is developed with a view to provide users access to all data, pertaining to various aspects of socio-economic life of population falling in different social/religious categories, from a single window.
IIT Delhi Iris Database http://www4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm IIT Delhi The IIT Delhi Iris Database mainly consists of the iris images collected from the students and staff at IIT Delhi, New Delhi, India.
Extreme Classification Repository http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository.html Microsoft Research The objective in extreme multi-label learning is to learn a classifier that can automatically tag a datapoint with the most relevant subset of labels from an extremely large label set. This page provides benchmark datasets and code that can be used for evaluating the performance of extreme multi-label algorithms.
IIIT 5K-word dataset http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html IIITH Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images.
Online Handwriting Recognition https://cvit.iiit.ac.in/research/projects/cvit-projects/online-handwriting-recognition-using-depth-sensors IIITH We have prepared a dataset containing 1,560 characters and 400 words with intention of providing common benchmark for air handwriting character recognition and allied research.
Classification of Boundaries of an RGBD Image https://cvit.iiit.ac.in/research/projects/cvit-projects/semantic-classification-of-boundaries-of-an-rgbd-image IIITH We use both image and depth cues to infer the labels of edge pixels. We start with a set of edge pixels obtained from an edge detection algorithm and the goal is to assign one of the four labels to each of these edge pixels. Each edge pixel is uniquely mapped to one of the contour segments. Contour segments are sets of linked edge pixels.
Sports-10K and TV Series-1M Video Datasets https://cvit.iiit.ac.in/research/projects/cvit-projects/sports-10k-and-tv-series-1m-video-datasets IIITH We introduce two large video datasets namely Sports-10K and TV series-1M to demonstrate scene text retrieval in the context of video sequences. The first one is from sports video clips, containing many advertisement signboards, and the second is collection of TV series frames, contains more than 1 million frames.
India Statistical Data https://www.quandl.com/collections/india Quandle Multiple datasets
India Statistical Data https://knoema.com/atlas/India Knoema Multiple datasets
AMEO 2015 http://research.aspiringminds.com/resources/ Aspiring Minds It is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality.
Data Science For Kids https://drive.google.com/folderview?id=0B5e-wnFrLgTEUm9jaDc2ODV5Z3M&usp=sharing Aspiring Minds A dataset containing kids' rating of random face cards on a scale of 1-5 according to their inclination to befriend the person on the card. These cards had distinguishing feature sets like old names & new names, gender and hobby type.
Aadhaar data catalog https://data.uidai.gov.in/uiddatacatalog/dataCatalogHome.do Govt. Of India Aadhaar data catalog is a place to view numerous Datasets generated in UIDAI ecosystem. It will help you to surface out your own research, application on the data which is collected at national level. Datasets are available in the form of CSV.
Open Government Data (OGD) Platform India https://data.gov.in/ Govt. Of India It is a platform for supporting Open Data initiative of Government of India. The portal is intended to be used by Government of India Ministries/ Departments their organizations to publish datasets, documents, services, tools and applications collected by them for public use.
Code Data Set + Programming Features API Mail to: research@aspiringminds.com Aspiring Minds We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem.