Abstract |
In a text-based discovery and analytical environment, high quality textual representation is needed to support discovery of and research on spoken content. The increased representation of human thoughts and ideas as digitally represented speech highlights the need for efficient generation of high quality text representations of spoken content. The most cost-effective method of producing textual representations is speech recognition systems. While much progress has been made in speaker-dependent (e.g., speaker trained) speech recognition systems, they produce poor quality results when applied in domain agnostic and speaker independent contexts (e.g., digitally recorded spoken content posted to the web). Results generated by domain-agnostic and speaker-independent language models are not usable for discovery or analysis. The poor quality results are due in part to the misalignment of domain-specific vocabularies and the domain-agnostic dictionaries used for acoustic pattern matching in speech recognition systems. The field of speech recognition is complex. Language models comprise only one of the four major components of speech recognition systems. Current speech recognition systems use language models which typically represent a non-domain specific vocabulary of 1,000 words. This is considered to be a large language space in speech recognition systems. This paper reports on exploratory research designed to test quality improvements that may be achieved by developing domain-focused phonemic vocabularies. The research relies on human knowledge engineering methods to model domain-specific languages. The research leverages the Atlas.ti application to extract and model religious language. The Logios application is used to convert the text vocabulary of 25,000 words to phonemic representation. The research focuses on digitally recorded spoken religious sermons as the test corpus.
|