Cheshire Continues

One of Mad Wonder’s very first projects, Cheshire, is continuing development. That’s the good news. But there is also, let’s call it “less” good news.

The original version of Cheshire was just a graphical wrapper over someone else’s command-line program. While it was possible to achieve some decent results with this setup, it was never a satisfying solution. After getting some more experience working with C++, and specifically with importing and working with DLLs, we decided to shift the revised version of Cheshire in a different direction.

After some experimentation, we were able to implement a working solution using a custom C++ wrapper for the SphinxBase and PocketSphinx libraries. The wrapper is primarily focused on providing accessible mapping to higher-level languages, so that PocketSphinx can be effectively leveraged using more modern code. Our current prototype for testing this implementation is constructed in Python, with a custom object-oriented Python library calling the C++ processes and mapping the returned data to Python objects.

The upside to all of this is we now have a fairly effective solution using an open-source voice recognition engine that has reasonable performance and is extremely cross-platform compatible. This improves on the previous solution in several ways. The original version of Cheshire would only work on Windows, as that was the only platform compatible with the command-line program it used.

The drawback to this change is that PocketSphinx can be a bit more difficult to work with, and requires a bit more heavy-lifting in adjusting the performance, and most importantly in dealing with transcriptions and dictionaries. Loading in a full language dictionary causes the program to run much slower, and won’t provide as accurate results. Trying to load in a transcription requires that you parse the transcription, convert it to a compatible dictionary, and load that into the interpreter. This is entirely possible, but it requires some extra work, as none of it is automatic. This is the step we are presently working on. It requires a few extra C++ functions, and mainly some extra work on text parsing in Python.

Because of having to shift to a new interpreter, it will be a while before we have any prototypes to report on. The initial plan is to first get a working prototype with a workflow geared toward internal use. We have some plans for producing an animated series of tutorials and essays. Getting a version of Cheshire that we can use for those specific purposes in a timely and efficient manner is what we’re shooting for now. As more progress is made, we will begin shifting over to refining the interface and making the user-experience easier and smoother for general use. We are hoping to eventually have versions of Cheshire that will be able to run as plug-ins for multiple graphical applications and game-engines.