Software & Data
Linguistic Features in Text (LiFT)
A library for extracting linguistic features from textual data.
SoftwareChinese Educational Short Answers (CESA)
Chinese short answers collected in physics and computer science domains from students of Zhengzhou University.
DatasetASAP_ZH
Original ASAP dataset in prompt 1, 2 and 10 translated into Chinese using the Google Translate API.
DatasetASAP_DE
Original ASAP dataset in prompt 1, 2 and 10 translated into German and rescored.
DatasetImplicitness Annotations of English Short Answers
Short answer data annotated for linguistic phenomena pertaining to the implicitness of language.
DatasetOverview of Datasets for Automatic Scoring
| Corpus_Name | Link/Paper | Population | Language | Task | Modality | prompts | answers | labels |
|---|---|---|---|---|---|---|---|---|
| ASAP-AES | Dataset | high school students (grade 7 to 10) | English | Essay | ? | 7 | 13000 | varies per prompt (holistic only) |
| ASAP-AES++ | Dataset Paper | high school students (grade 7 to 10) | English | Essay | ? | 6 | 13000 | varies per prompt (with fine-grained traits) |
| Kaggle ELL Feedback Prize | Dataset Competition | high school students (grade 8 to 12) | English | Essay | ? | 1 | 2700 | 1-5 for 6 trait scores |
| ASAP-SAS | Dataset | high school students (mainly grade 10) | English | Short-answers (SLA, biology, sciences) | ? | 10 | 0-2/0-3 | |
| ASAP-DE | Paper | crowdworkers (unclear) | German | Short-answers (SLA, biology, sciences) | typewritten | 3 | 903 | 0-2/0-3 |
| Powergrading | Paper | unclear | English | short-answers (US immigration test) | typewritten | 10 | 6980 | binary |
| SRA beetle | Paper Paper | students (native) | English | Short-answers (sciences) | typewritten | 56 | 3000 | 2/3/5-way entailment labels |
| SRA SciEntsBank | Paper Paper | students (native) | English | Short-answers (sciences) | handwritten | 197 | 10000 | 2/3/5-way entailment labels |
| AR-ASAG | Paper Dataset | university students (native?) | Arabic | Short-answers (cybercrime) | ? | 48 | 2133 | 0-5 points |
| PT-SAG | Paper | school students (12-14 years) | Portuguese | Short-answers (biology) | typewritten, some handwritten | 15 | 3675 | 0-3 points |
| CSSAG | Paper | university students (mostly native) | German | short answers (computer science) | typewritten | 31 | 1926 | mostly 0-1 or 0-2 in 0.5 steps |
| CS (Mohler) | Paper | CS studentsin the US | English | Short-answers (computer science) | typewritten | 21 | 630 | 0-5 points (0.5 point steps) |
| CREG | Paper | language learners | German | Short-answers (Reading comprehension) | handwritten | 177 | 1032 | binary +5 diagnostic labels |
| CREE | Paper | language learners | English | Short-answers (Reading comprehension) | handwritten | 62 | 566 | binary + 6 diagnostic labels |
| Indonesian SAS/Ukara | Dataset Paper | native? | Indonesian | Short-answers (opinion questions) | unknown | 2 | 1032 | binary |
| SweLL | Paper | language learners | Swedish | Essay | handwritten | unknown | growing | CEFR levels |
| Falko | Homepage | language learners (+ native controll group) | German | Essay (argumentative) | typewritten | 4 | 248 (+95 from natives) | B2/C1/C2 |
| COPLE-2 | Paper | language learners | Portuguese | Essay (various genres) | handwritten | multiple | 966 | A1/A2/B1/B2/C1 |
| CESA | Paper | university students | Chinese | Short-Answer (physics & computer science) | typewritten | 5 | 1800 | 0-2 points |
| ASAP-ZH | Paper | high school students | Chinese | Short-Answer (sciences) | handwritten | 3 | 942 | 0-2/0-3 |
| SAF | Paper Dataset | college students, job applicants | English, German | Short-Answer (tech, pre-job training) | typewritten | 54 | 4519 | 0.0-1.0 |
| GLUPS | Paper Dataset | school children 11 to 12 years | Arabic | Short-Answer (religious) | typewritten | 18 | 1276 | 0-2 points |
| MindReading | Paper Dataset | school children 7 to 14 years | English | short-answers (explain behaviour) | handwritten | 10 | 11311 | 0-2 points |
| Essay-BR | Paper Dataset | high school | Portuguese | essay | typewritten | 86 | 4570 | 0-1000 |
| AES-ENEM | Paper Dataset | high school | Portuguese | essay | typewritten | 127 | 3,586 | 5 traits (0-200 points) |