1. Aspiring Minds’ Employability Outcomes 2015 (AMEO 2015)
It is a unique dataset which contains engineering graduates’ employment outcomes (salaries, job titles and job locations) along with standardized assessment scores in three fundamental areas - cognitive skills, technical skills and personality.
Coupled with biodata information, AMEO 2015 provides an opportunity for a unique and comprehensive study of the entry level labor market. The data can be used not only to make an accurate salary predictor, but also to understand what influences salary and job titles in the labor market.
Click here to download the dataset.
Click here to view Sahil Shekhar’s Master’s thesis at Harvard Kennedy School, which uses AMEO.
Click here to view the project report, that uses AMEO, by a group of Machine Learning class’ students at NYU.
2. Data Science For Kids
We conduct machine learning workshop with kids where, as part of the exercise, they are supposed to rate random face cards on a scale of 1-5 according to their inclination to befriend the person on the card. These cards had distinguishing feature sets like old names & new names, gender and hobby type. To read more about the workshop and its experiences, click here.
We collected this ratings data which you can use to design your own predictor and compare the results. To download the dataset, click here.
3. Programming Features API
In our KDD 2014 paper, we describe a new grammar to extract meaningful features from program which are highly predictive of the algorithm used to solve the problem. We show how the features work wonders with supervised learning. A million more things can be done by our features than naive ones like keyword counts, AST height, syntax errors, etc. They are currently available for C, C++, Java and Python. Want to try them out? Happy to provide an API. Write to us!
4. Code Data Set
We have a data set of more than 100,000 codes in C, C++ and Java. We also have data sets of human graded codes in C and Java for various problems. Want to play with it? Write to us!
5. Open data Resources We’ve also listed out some websites which provide open data on a variety of subjects. Curious researchers can download and fiddle with data from these sources and probably come up with some enticing problems that could be solved using data science and machine learning.
Apart from this, we have tested more than two million candidates in language,
cognitive skills, personality and functional skills. We have various data sets
of company performance on individuals whom we have assessed. If you can think
of joint research projects around this data, do write to us.
Also, we are a data hungry group, so if you can provide us any kind of data and have joint research goals in mind, do write to us.
Are you a student, professor, CEO or Maschinenmensch? Subscribe to ml-india's google group to join the discussion and recieve updates, news and resources about India's ml-ecosystem. Click here
Also, follow @ml_india on Twitter!