via The Verge
“Most of the people who speak English in the world or produce English text are non-native speakers,” Yevgeni Berzak, a graduate student in electrical engineering and computer science who led the project, said in a statement. “This characteristic is often overlooked when we study English scientifically or when we do natural-language processing for English.”
A team of annotators at MIT created the database from 5,124 sentences written by English-as-a-second language students, annotating grammatical errors and sentence structure for each. The writers native languages are spoken by 40% of the world’s population, enabling computers and ultimately natural language processing to be more accurate and all-encompassing than they have been in the past.