Graphic Animation and Speech Synthesis Applications for the Auditory and Visually Disabled by Josef E. Pfauntsch and Charles M. Shub University of Colorado at Colorado Springs 1. ABSTRACT Traditionally, computer science courses focus on specific concepts. Entry level courses precede more advanced courses, knowledge is built in stages. Most com- puter science departments offer students several areas of interest to choose from. Often a department's curriculum is struc- tured to assure that a breadth of knowledge is attained by participating students. This requirement sometimes fost- ers shallow understanding of many areas. Integrating learned materials into a course that allows students to apply theory and derive solutions for problems where textbook solutions are not available is still considered a novel experiment. Special projects and independent study courses offer prospective graduates an opportunity to combine learned skills in a useful application. One such project is described here. 2. INTRODUCTION Perhaps the most important driving force behind the desire to master the computing science skills is to contribute somehow by developing an application for use by oth- ers. Programs of educational or commercial value are no longer trivial in nature. User friendly interfaces for many programs comprise half of the effort that must be spent during the engineering cycle. These interfaces require detailed knowledge in the areas of human engineering, display Applications for the Disabled Page 2 layout, and raster graphics. The other half of the effort goes into the design and development of data structures and algorithms of the application. The growth in the complexity of creating new applications contributes to the need for multiple talents. It is for this rea- son that teams are created to "put together" applications. Team members often have some overlapping skills. These are used intuitively to keep track of development events and to question design alternatives. More importantly, each team member usually has a special area of interest or special skills. That specialty is usually the reason for the individual to be selected as a contributing team member. Smith [SMIT85] refers to this as specialization and division of labor. An alternative to the team effort for design applications is the individual pro- gression approach, where the project is assigned to an individual. While the team approach may result in a shorter time for the development cycle, the individual approach tends to produce a more versatile computer professional. Projects often con- tribute to the total understanding of com- puter systems. This paper shows some of the advantages of the second approach by describing a project undertaken at the University of Colorado at Colorado Springs. 3. THE PROJECT The project is described in three phases. First the requirements are detailed. Next comes the specifications designed to meet the given requirements. Finally, the specifics of the implementation are presented. Applications for the Disabled Page 3 3.1 Project Requirements The requirements were deliberately vague to allow for creativity and innovation. They consisted of delineation of an appli- cation area, computer hardware selection, and a programming language. These require- ments are elaborated below. 3.1.1 Application_Requirements: The direction for this project was minimal. The intent here was to take advantage of student incentive, as can be done with students who are high achievers. The directions were to use the capabili- ties of the Commodore Amiga Personal Com- puter [PECK85] for advancing the science of computer education. Specifically, the direction was to use the Amiga PC to meet a special education need of those students with physical disabilities. No other gui- dance was provided. Given the proliferation of educational software, special education is perhaps one of the few remaining areas in education where a specific innovative application can be targeted for automated teaching. This may be due to many factors. Important in the context of this project, however, was that a specific focus within special education could be readily defined in a short time. Another advantage was that with appropriate focus, a project could be completed within a year. Unfortunately, the dynamics of an evolving operating sys- tem and programming environment for the Amiga made the project take longer than expected. These factors also contributed to the eventual need to seek assistance during certain phases of the project. The choice of special education as an area of focus was motivated by two factors. First, there is a profound lack of educational software for special education. [NAIM87] Second was a deep personal commitment to special education. Both authors have han- dicapped children. This provides unusual insight into the challenges facing handi- capped children. Applications for the Disabled Page 4 3.1.2 Hardware_Selection_Requirements: Three factors involved the selection of the Amiga PC. First, and perhaps most sig- nificantly, was its capability to render graphics. Second was the low cost of less than a thousand dollars per system coupled with its availability at the University. Finally, the Amiga could generate syn- thesized speech. 3.1.3 Software_Selection_Requirements: There was no language or development environment specified. The absence of a programming environment requirement left the choice for software to the student. The language used at UCCS to teach algo- rithms and data structures is Modula2. This was the key to selecting that same language as the development tool. In retrospect, the software selection was wrong, since the operating system was written in the "C" programming language and the system interfaces assumed "C" language conventions. 3.2 Project Specification From the minimal requirements, the project specification was brainstormed. Communi- cation among deaf individuals and hearing individuals can be enhanced by providing a written word (either from a keyboard or a data file) to manual signs translator. Most deaf individuals can communicate through manual signs. Also helpful would be word to lip animation and word to audi- tory output translators. Not only would such a system ease communication, but it could also be helpful in teaching the International Sign Alphabet as well as the correct lip formations for production of sound. Instrumental in this choice was a recent compilation of papers by IEEE. [LEVI80] Applications for the Disabled Page 5 3.3 Project Implementation The major functional components of the system include the graphic display of the signs, the speech output, the lip anima- tion components, the text to speech trans- lator, the text display, and the user interface. Also of importance are the internal data structures and external data file structures used to support and imple- ment these functions. These are described below. 3.3.1 Graphic_Display_Development: The design of the hand signs took almost a complete semester. Over a hundred thousand vectors were input manually dur- ing that period. Diligence must be a shared trait among developers. Three gen- erations of graphic hands evolved and died. The final set of vectors, colors, and data types is less than 20,000 bytes of data. The functions that operate on the data include line drawing, polygon draw- ing, area filling, and color changes. The hand display was placed into a window structure so it could be run as a stand- alone process. Several methods for displaying the signs were explored. The criterion was realism. Initially, in an attempt to be as realis- tic as possible, anatomically correct bone structures were generated. These were then used to establish sets of permissible positions for each of the digits. A skin was then draped over the bones and anima- tion of the hand was attempted. While the representation of the hand was correct, any movement involving more than a single digit was too slow. This method was then abandoned and a new approach developed. Since the bones were not visible anyway, the skin covering was used as the next data structure. Performance of the anima- tion process improved, however, the visual effect of a graphic hand undergoing a metamorphic transformation was not realis- tic. Imagine a finger shrinking to assume a closed position and another finger grow- ing to become extended. The next approach Applications for the Disabled Page 6 included a "base" hand. Here a core hand was developed that was used as an inter- mediate step between any transition from one position to the next. The resulting displays were much improved over the pre- vious attempts. Evaluation of the graphic hands by special educators provided the clue to the final phase of hand design. The best visual effects for the hands was one where the base hand was removed from the sequence, and completely formed signs were displayed without intermediate steps. This is because the human eye does not focus on the intermediate positions, but focuses on the final position. The final hand shape is more important than how it arrived at that shape. This is the way characters are signed and understood. They are representations of the character they sign. The method used to input the hands was innovative enough to be included here. The first abortive attempt involved tracing hands with grease pencil onto transparen- cies and using a mouse to digitize the tracings. This did not provide enough realism. Free hand drawing was also tried and found lacking. The eventual solution involved photographing the sign alphabet as displayed by the daughter of the first author. A 35mm slide projector was used to project the photographs on the screen, and the outline was digitized using mouse input. After all vectors were complete, color was added. Once the decision to preserve the existing signs was made, an effort was made to improve them and to reduce the number of vectors to an absolute minimum. Now each hand was generated on a newly blanked win- dow background and redrawn for each occurrence. The display capability of the Amiga allowed the redrawing of hands in real-time. Actually, the entire alphabet ("a".."z"), the Arabic numerals ("0".."9"), and the international sign for "I love you" [GUST80] [RIEK78] can be Applications for the Disabled Page 7 displayed in under three seconds - fast enough to render the displays incomprehen- sible. Since such speed was thought never to be needed, the processor could be used for other uses. The graphic mouth display was easy to gen- erate following the thorough introduction into vector graphics during the hand design phase. The lips are generated algo- rithmically from mouth width and mouth height parameters supplied during speech output from the voice generator. [ERBE78] These lips were placed into a separate window and can be turned on and off. Several versions of lips were evaluated and a final version was decided on within weeks. 3.3.2 Speech_Output_and_Lip_Animation: The challenge was to integrate the exist- ing parts with the added capabilities of producing synthesized speech and synchron- izing lip animation in real time. The experts in special education advised that residual hearing within most profoundly deaf individuals can be fine tuned and used effectively. Visual reenforcement of the spoken words through lip animation could also be used to increase comprehen- sion of voice signals. It is important that lip animation and speech output be synchronized to achieve realism. This phase of the development cycle was quite complex. As a result, a thorough understanding of system deadlock was gained. Process communication for speech output was linked to the graphic driver. For each set of mouth shape parameters generated, a new pair of lips had to be generated and drawn. Initially the graphic process interfered with fluid speech. The graphic display functions underwent sur- gery to improve performance. In line code was added to reduce the frequency of calls to library routines and the number of vec- tors that had to be drawn and filled was reduced. When additional improvements in the graphic display module were no longer possible, the display still flickered and Applications for the Disabled Page 8 was unsatisfactory. The next step was to balance process priorities between speech and graphic out- put. This approach allowed evaluation of acceptable speech and realistic graphics - but not at the same time. Finally, as a last resort, the speech output device was allowed to drop some of the transitive mouth shapes without an attempt to draw each one akin to the simplifications made in the hand displays. This solution was acceptable. Now there were fewer lip shapes generated without any adverse effects. Performance and appearance results were determined through observa- tion. 3.3.3 Text_to_Speech_Translation: Software that translates English text into speech phonemes and subsequently into syn- thesized speech output is provided with the Amiga PC libraries. A soft voice model of the human vocal production mechanism was developed by the Advanced Research Projects Agency under Department of Defense funding. [FONS81] [PECK85] This model with minor modifications is avail- able for use on the Amiga. This involves representing the international phonetic alphabet as ascii character combinations, since most keyboards do not offer phonetic symbology. The predictability of the English language does not lend itself to complete algo- rithmic interpretation, so an exception processing routine was designed. This word exception module filters all text sent to the translator and substitutes words matched within the exception word list. How this is done is discussed in the sec- tion on exception lists. The user decides what sounds right and by including the modified words into the exception list forces the translation of words into dialectic or lingual preferences. This could easily be used to customize accents and foreign languages. Data entry is by adding words to the exception word list using a text editor. Applications for the Disabled Page 9 This module enhanced the usefulness of the application. Signing, speaking, and lip animation were complete. Functions to allow the user to select text files from directories were added. The system could now be used by visually disabled students to speak text files for them. It is an effective text to speech translator. 3.3.4 Text_Display: Text is displayed visually through use of a low resolution text font and through a vector generated large display font. Dur- ing the signing of text files, the text is scrolled smoothly across the bottom of the screen from right to left. During the spelling of words from one of the word files, the large vector display fonts are used. The vector font was created using the same mechanism used to generate the hands. These characters are two inches in height and are thickened for increased boldness using offsets and redrawing each character several times. 3.3.5 Human_Computer_Interface: This module is the heart of the system. Here the desired states for program flow are input by the user and processed by the program. This module doubled the effort spent for this project. The size of the source code also doubled. In retrospect, the time invested here was most worth while. It is the man-machine interface that makes or breaks an application. A brief description is provided below. The Amiga is designed with the user in mind, it comes with mouse and keyboard. The mouse has two buttons. Pressing the right mouse button can cause an applica- tion to respond with a drop-down menu. The menu structure for this application is comprised of three options, namely the System Menu, the Options Menu, and the Input Selection Menu. The System Menu provides alternatives for requesting information, allows the user to reset all parameters to the default param- eters, gives the user access to the system Applications for the Disabled Page 10 color tables, lets the system run in a demonstration mode, and can terminate the application. The color manipulation function requires explaination. This utility was incor- porated from a public domain source and adapted for use. There are thirty two colors available. Each can be selected by the user and modified at will. The red, green, and blue color components that comprise the color can be manipulated using a slide potentiometer manipulated by the mouse. A student can thereby modify the colors used to display the hands, lips, text, and backgrounds to add a per- sonal touch. The Options Menu allows the user to select from voice control, display timer control, practice or test control, and word length selections. The voice control module opens a window with additional options that allow control of volume, speech rate, voice pitch, and sampling frequency. There is an option to allow the user to escape and return to default values, set robotic or human voice synthesis (this is actually monotonic or with emphasis), to activate the lips or turn them off, and change between male and female voices. The timer options allow the students to adjust the display times for the hands, for the duration of examination display data, duration of time between display and answer, text scroll speed, and an escape to use old or default settings. The student can also select from practice or examination drills, and can select the desired length of the words for display. A random word length function is also included. The Input Menu Selection allows the stu- dent to toggle between regular or special words. The special word list will be described in more detail later. An idle graphic display function allows the display of graphic line draw routines at Applications for the Disabled Page 11 random when the machine is not in other use. For text to speech translation the hand display window can be deactivated. Input for translation and lip/hand display can be selected to come from the keyboard or from text files. 3.3.6 Data_Structures: The self imposed constraints to support real time animation forced efficient access to data structures for graphic display. Indexed arrays of dynamic display lists were chosen for the hand animation and large font text display. The lip shapes are generated algorithmically because of their dynamic behavior. The exception module search algorithm was implemented using multiway trees for rapid access to exception words. [KNUT69] Nested multiway trees are being considered for improving performance if needed. 3.3.7 Directory_File_Structures: There are two directories associated with this application for input. The first is the word directory. Here the word lists for drill or practice are stored. Here is also the special word list and the excep- tion dictionary. The other directory con- tains the text files that can be translated and displayed graphically using hands and lips. The text directory merely allows a user access to any text file while the applica- tion is running. The word directory, on the other hand deserves further explana- tion. There are nine regular word lists present, these range from two characters in length to ten characters in length. A student (or teacher) can alter the content of any word list at will. They can be expanded or shrunk. The application loads and indexes them dynamically. The special word list is somewhat different. It is there to allow the teaching of context frames. Multiple words are allowed on a single line. These words can be related, such as animals in a zoo. They can also be sound-alike or look-alike words. The words in this list can be of different length. Applications for the Disabled Page 12 This capability was added as a hind-sight, but for good reasons. The authors never thought that a student attempting to learn signing would develop a system to score well without learning the words, but the sequential nature of spelling words char- acter by character lends itself to partial recognition. One need only remember the final character to peg the word when the multiple choices are displayed. The exception word list was created to allow a student to re-arrange the word in an English like manner, without forcing the student to learn yet another language. (phonetics) For example, the word child is translated and spoken quite well, but children is spoken as child ren. To modify the way children is pronounced by the sys- tem one merely spells the word the way it sounds, so children becomes chill dren. This exception word list can be used quite effectively to include acronyms or to speak languages not supported by the text to speech algorithm. 4. FUTURE DIRECTIONS Digitized speech reproduction would be ideal for teaching speech. It does require massive amounts of storage to achieve the vocabulary needed to make a program effective. Synthesized speech was found to be an acceptable alternative to digitized speech, and quite usable when augmented with the exception capability. While it is not part of the application at this time, digitized speech output may become another project in the future. 5. CONCLUSIONS This project presented many challenges. There were times when instant insanity waited around the corner. It is now done, and we are richer - in knowledge. We are also more appreciative of the complexities of the human speech production mechanisms and believe that many years will pass Applications for the Disabled Page 13 before language processing is conquered. This project has also contributed to spe- cial education, a field often neglected for more profitable ventures. We feel good about the progress and the project. 6. REFERENCES [ERBE78] Erber, Norman P., and DeFilippo, Carol Lee, "Voice/Mouth Syn- thesis and Tactual/Visual Per- ception of /pa, ba, ma/," J. Accoust. Soc. Am., Vol 64, No. 4, October, 1978, pages 1015 - 1019. [FONS81] Fons, Katherine, and Gargagli- ano, Tim, "Articulate Automata: An Overview of Voice Synthesis," Byte, Vol. 6, No. 2, February, 1981, Pages 164 - 187. [GUST80] Gustason, Gerilee, Pfetzing, Donna, and Zawolkow, Esther, "Signing Exact English, Modern Signs Press, Los Alamitos, CA., 1980. [KNUT69] Knuth, Donald E., "The Art of Computer Programming; Volume I: Fundamental Algorithms," Addison Wesley, 1969. [LEVI80] Levitt, Harry, Pickett, James M., and Houde, Robert A., "Sen- sory Aids for the Hearing Impaired," IEEE Press, 1980. [NAIM87] Naiman, Adeline, "A Hard Look at Educational Software," Byte, Vol. 12, No. 2, February, 1987. [PECK85] Peck, Rob, Sassenrath, Carl, and Deyl, Susan, "Amiga Rom Kernel Manual," Commodore-Amiga, Inc., 1985. [RIEK78] Riekehof, Lottie L., "The Joy of Signing," Gospel Publishing, Springfield, MO., 1978. Applications for the Disabled Page 14 [SMIT85] Smith, Adam, "Wealth of Nations," Random House Edition.