Machine Learning India bio photo

Machine Learning India

Fostering data science and machine learning in India

Email Mailing List Twitter Github

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here

Resources

1. Aspiring Minds’ Employability Outcomes 2015 (AMEO 2015)
It is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality.

Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data can be used not only to make an accurate salary predictor, but also to understand what influences salary and job titles in the labor market.

Click here to download the dataset.

Click here to view Sahil Shekhar’s Master’s thesis at Harvard Kennedy School, which uses AMEO.

Click here to view the project report, that uses AMEO, by a group of Machine Learning class’ students at NYU.

2. Data Science For Kids
We conduct machine learning workshop with kids where, as part of the exercise, they are supposed to rate random face cards on a scale of 1-5 according to their inclination to befriend the person on the card. These cards had distinguishing feature sets like old names & new names, gender and hobby type. To read more about the workshop and its experiences, click here.

We collected this ratings data which you can use to design your own predictor and compare the results. To download the dataset, click here.

3. Programming Features API
In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem. We show how the features work wonders with supervised learning. A million more things can be done by our features than naive ones like keyword counts, AST height, syntax errors, etc. They are currently available for C, C++, Java and Python. Want to try them out? Happy to provide an API. Write to us!

4. Code Data Set
We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. Want to play with it? Write to us!

5. Open data Resources We’ve also listed out some websites which provide open data on a variety of subjects. Curious researchers can download and fiddle with data from these sources and probably come up with some enticing problems that could be solved using data science and machine learning.

https://www.data.gov/
https://www.quandl.com/collections/india
https://data.gov.in/
https://india.gov.in/
http://www.archive.india.gov.in/spotlight/spotlight.php#tab=tab-1
http://mospi.nic.in/Mospi_New/site/home.aspx
http://www.surveyofindia.gov.in/
http://censusindia.gov.in/
http://www.tradingeconomics.com/india/indicators
http://planningcommission.nic.in/
http://www.mapsofindia.com/
http://planningcommissionarchive.nic.in/
http://bhuvan.nrsc.gov.in/bhuvan_links.php
http://www.transparency.org/country#IND
http://data.worldbank.org/country/india
https://github.com/datameet/india-election-data
https://github.com/datameet/maps
https://github.com/datameet/pincodes
http://openbangalore.org/
https://github.com/datameet/openpostbox
https://github.com/datameet/Bihar-AC-Election-Report-Cards
http://www.transparentchennai.com/database/
https://data.uidai.gov.in/uiddatacatalog/dataCatalogHome.do
https://www.icegate.gov.in/DailyList/DL
https://github.com/datameet/twentyfourteen-child
https://github.com/datameet/Pune_Municpal_Data
https://github.com/datameet/Pune_wards
https://github.com/datameet/pune
https://github.com/datameet/openpostboxindia
https://github.com/datameet/quest2
https://github.com/datameet/cricket
https://github.com/datameet/opendata.json
https://github.com/datameet/logo
https://github.com/datameet/plenario
https://github.com/datameet/datascience
https://github.com/datameet/daksh
https://rbi.org.in/
https://knoema.com/
https://wellcome.ac.uk/what-we-do/topics/data-sharing
https://crcns.org/
http://www.fuse-data.com/data-markets/#datamarket
http://www.findthedata.com/
https://factual.com/
http://www.google.ca/publicdata/directory
http://opendata.arcgis.com/
http://www.worldbank.org/404_response.htm

Apart from this, we have tested more than two million candidates in language, cognitive skills, personality and functional skills. We have various data sets of company performance on individuals whom we have assessed. If you can think of joint research projects around this data, do write to us.
Also, we are a data hungry group, so if you can provide us any kind of data and have joint research goals in mind, do write to us.


ML-India newsletter

Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here
Also, follow @ml_india on Twitter!