There are currently around 7,000 languages being used today, with one language dying every two weeks. UNESCO says that half of the world’s languages may vanish in a century’s time. And, in my home country of India, 220 languages have died in the last 50 years, and 197 languages are endangered.
Open science is advancing scientific research by enabling individuals and organizations to collaborate and exchange knowledge that improves each other’s work. One area that could use this kind of help is native languages around the world.
As a kid, I grew up in the small town in the Indian state of Odisha where at least 30% of the population spoke Santali, a language that the majority of the people, the remaining 70% (including my family), never attempted to learn. The majority of the population spoke Odia. We didn’t have to learn Santali because Odia was spoken in school and the greater professional marketplace.
Language dominance happens, and it’s not that this isn’t okay, it’s that we need to make sure we are preserving the old languages as well so that we don’t lose the stories and bits of culture that are inherent in that language. How can we do that? For the Santali language, we were not prepared, so it took almost 15 years for the first Unicode-compliant (a global and universal encoding standard) font to be released for it by Google. Then, I led a project to build the second Unicode-compliant and Open Font License (OFL)-licensed font for Santali. (Designed by Indian typographer Pooja Saxena, and the input tools were developed by Jnanaranjan Sahoo and Nasim Ali.)
Languages die out for multiple reasons. Some are:
- In a multicultural and multilingual society, official and major languages have the advantage of being funded by the government. They are used to educate in schools and for scientific research, for instance.
- Minority languages are typically spoken by the indigenous natives or by those living in poverty.
- There is a lack of digital tools to help societies and cultures continue and share their native, minority languages. As storytelling is done more and more online, the need for digital tools and other standards becomes more important.
How the open source way can save these languages:
- We need a map of languages around the world with the associated challenges of preserving each language.
- We need collaboration tools and technologies for working closely with endangered language communities.
- We need Open Educational Resources for manuals and other materials to train people how to do language documentation.
- We need documentation of open multimedia content about the present (e.g., news coverage of current incidents and other socio-economic happenings) and the past (folk literature and cultural phenomenon) with an annotation to widen the scope of the languages.
- We need digital tools to build linguistic applications (e.g., Machine Learning and Natural Language Processing tools like text-to-speech that can be used to amplify the multimedia content and help build digital accessibility tools like screen reader).
There is work being done, and more to do. Groundbreaking language preservation initiatives are happening now to preserve native languages in digital forms. Some of the notable ones are:
- Living Tongues Institute for Endangered Languages
- Open Language Archives Community (OLAC)
- TVMalintzi, a YouTube-based television for Nahuatl
- Research and journalism on the subject, like The future of America’s endangered languages by Matt Hansen
- Global Oneness Project
- Wikitongues that helps communities document video-narratives in native languages
- National Endowment For The Humanities & Native Americans
- Openwords, an open source language learning app
- Digital Language Diversity Project (DLDP), for building OERs for language preservation
- Treasure Language Storytelling and a group of other language preservation projects by linguist Dr. Steven Bird
- StoryCorps for recording storytelling
Considering the rate of language extinction is so high, there is a need for constant intervention. I have personally started a project called OpenSpeaks to build OERs and help citizen archivists create multimedia language documentation. I hope to curate content for a broader audience and use that content to build digital accessibility and other tools that will help native language speakers access knowledge and contribute to the commons.
About the author
This article was created and distributed under Creative Commons Licensing (CCBY SA 2.0) and initially published on opensource.com.