Julius is a lightweight open-source Speech Recognition Engine
Table of Content
"Julius" is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. It is primarily written for C programming language.
The algorithm is based on 2-pass tree-trellis search, which fully incorporates major decoding techniques such as tree-organized lexicon, 1-best / word-pair context approximation, rank/score pruning, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc.
Julius main platforms are Linux and other Unix-based system, as well as Windows, Mac, Androids and other platforms.
Features
- An open-source LVCSR software (BSD 3-clause license).
- Real-time, hi-speed, accurate recognition based on 2-pass strategy.
- Low memory requirement: less than 32MBytes required for work area (<64MBytes for 20k-word dictation with on-memory 3-gram LM).
- Supports LM of N-gram with arbitrary N. Also supports rule-based grammar, and word list for isolated word recognition.
- Language and unit-dependent: Any LM in ARPA standard format and AM in HTK ascii hmm definition format can be used.
- Highly configurable: can set various search parameters. Also alternate decoding algorithm (1-best/word-pair approx., word trellis/word graph intermediates, etc.) can be chosen.
- List of major supported features:
- On-the-fly recognition for microphone and network input
- GMM-based input rejection
- Successive decoding, delimiting input by short pauses
- N-best output
- Word graph output
- Forced alignment on word, phoneme, and state level
- Confidence scoring
- Server mode and control API
- Many search parameters for tuning its performance
- Character code conversion for result output.
- (Rev. 4) Engine becomes Library and offers simple API
- (Rev. 4) Long N-gram support
- (Rev. 4) Run with forward / backward N-gram only
- (Rev. 4) Confusion network output
- (Rev. 4) Arbitrary multimodel decoding in a single thread.
- (Rev. 4) Rapid isolated word recognition
- (Rev. 4) User-defined LM function embedding
- DNN-based decoding, using front-end module for frame-wise state probability calculation for flexibility.
Licenses
This code is made available under the modified BSD License (BSD-3-Clause License).