Data Sets
Introduction
We wish that data sets from India are readily available to practitioners across the world for research and development purposes. We have hosts some data sets below. Want us to host your open data set? Fill the following form !
Name | Link/Email | Institution/ Organisation | Description |
---|---|---|---|
Multiword Bengali Expression | http://cse.iitkgp.ac.in/~tanmoyc/Tagged_MWE | IIT Kharagpur | |
Soil and Water Assessment Tool | http://swat.tamu.edu/software/links/india-dataset/ | Soil and Water Assessment Tool | |
Data Meet | nisha@datameet.org | Data Meet | |
Collaborative Research in Computational Neuroscience | https://crcns.org/data-sets | CRCNS | Multiple datasets |
Reserve Bank Of India | https://www.rbi.org.in/Scripts/Statistics.aspx | Govt. Of India | This Section provides data on various aspects of Indian economy, banking and finance. While the current data defined as data for the past one year is available at the links provided below, researchers may also access data series available in the Database on Indian Economy link available on this page. |
Central Board of Excise and Customs | https://www.icegate.gov.in/DailyList/DL | Govt. Of India | |
iAWE | http://iawe.github.io/ | Nipun Batra (IIIT-D) | Our dataset characterises the unique aspects of energy, water and network in India. It was published as a part of our Buildsys 2013 paper. |
National Portal Of India | https://india.gov.in/ | Govt. Of India | The objective behind the Portal is to provide a single window access to the information and services being provided by the Indian Government for citizens and other stakeholders. |
Bangalore Open Data | http://openbangalore.org/ | It is a repository of data, code and related artifacts that I have collected in my personal capacity ( My personal introductory blog post ). Its targeted at data enthusiasts, data scientists, researchers and developers who are interested in public data related to Bangalore. | |
DataMeet Repository | https://github.com/datameet | Multiple datasets | |
Indian Geo-platform (ISRO) | http://bhuvan.nrsc.gov.in/data/download/index.php | Govt. Of India | |
India Trading Economics Data | http://www.tradingeconomics.com/india/indicators | ||
Census Of India | http://www.censusindia.gov.in/2011census/population_enumeration.html | Govt. Of India | The Indian Census is the largest single source of a variety of statistical information on different characteristics of the people of India. To scholars and researchers in demography, economics, anthropology, sociology, statistics and many other disciplines, the Indian Census has been a fascinating source of data. |
Survey of India | http://www.surveyofindia.gov.in/pages/show/86-mapsdata | Govt. Of India | |
National Data Bank | http://mospi.gov.in/national_data_bank/index.htm | Govt. Of India | The National Data Bank of Socio-Religious categories is developed with a view to provide users access to all data, pertaining to various aspects of socio-economic life of population falling in different social/religious categories, from a single window. |
IIT Delhi Iris Database | http://www4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm | IIT Delhi | The IIT Delhi Iris Database mainly consists of the iris images collected from the students and staff at IIT Delhi, New Delhi, India. |
Extreme Classification Repository | http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository.html | Microsoft Research | The objective in extreme multi-label learning is to learn a classifier that can automatically tag a datapoint with the most relevant subset of labels from an extremely large label set. This page provides benchmark datasets and code that can be used for evaluating the performance of extreme multi-label algorithms. |
IIIT 5K-word dataset | http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html | IIITH | Query words like billboards, signboard, house numbers, house name plates, movie posters were used to collect images. The dataset contains 5000 cropped word images from Scene Texts and born-digital images. |
Online Handwriting Recognition | https://cvit.iiit.ac.in/research/projects/cvit-projects/online-handwriting-recognition-using-depth-sensors | IIITH | We have prepared a dataset containing 1,560 characters and 400 words with intention of providing common benchmark for air handwriting character recognition and allied research. |
Classification of Boundaries of an RGBD Image | https://cvit.iiit.ac.in/research/projects/cvit-projects/semantic-classification-of-boundaries-of-an-rgbd-image | IIITH | We use both image and depth cues to infer the labels of edge pixels. We start with a set of edge pixels obtained from an edge detection algorithm and the goal is to assign one of the four labels to each of these edge pixels. Each edge pixel is uniquely mapped to one of the contour segments. Contour segments are sets of linked edge pixels. |
Sports-10K and TV Series-1M Video Datasets | https://cvit.iiit.ac.in/research/projects/cvit-projects/sports-10k-and-tv-series-1m-video-datasets | IIITH | We introduce two large video datasets namely Sports-10K and TV series-1M to demonstrate scene text retrieval in the context of video sequences. The first one is from sports video clips, containing many advertisement signboards, and the second is collection of TV series frames, contains more than 1 million frames. |
India Statistical Data | https://www.quandl.com/collections/india | Quandle | Multiple datasets |
India Statistical Data | https://knoema.com/atlas/India | Knoema | Multiple datasets |
AMEO 2015 | http://research.aspiringminds.com/resources/ | Aspiring Minds | It is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality. |
Data Science For Kids | https://drive.google.com/folderview?id=0B5e-wnFrLgTEUm9jaDc2ODV5Z3M&usp=sharing | Aspiring Minds | A dataset containing kids' rating of random face cards on a scale of 1-5 according to their inclination to befriend the person on the card. These cards had distinguishing feature sets like old names & new names, gender and hobby type. |
Aadhaar data catalog | https://data.uidai.gov.in/uiddatacatalog/dataCatalogHome.do | Govt. Of India | Aadhaar data catalog is a place to view numerous Datasets generated in UIDAI ecosystem. It will help you to surface out your own research, application on the data which is collected at national level. Datasets are available in the form of CSV. |
Open Government Data (OGD) Platform India | https://data.gov.in/ | Govt. Of India | It is a platform for supporting Open Data initiative of Government of India. The portal is intended to be used by Government of India Ministries/ Departments their organizations to publish datasets, documents, services, tools and applications collected by them for public use. |
Code Data Set + Programming Features API | mailto: research@aspiringminds.com | Aspiring Minds | We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem. |