Ethan R. Duni

Home Faculty Students Publications Projects Album

Thesis Title

High-rate Optimized Quantization Structures and Speaker-Dependent Wideband Speech Coding

Thesis Abstract

Modern coding applications, such as wideband speech, are characterized by sources with large dimensions and unknown statistics, complicated distortion measures, and the need for high-quality quantization. However, the complexity of quantization systems must be kept in check as the dimension grows, requiring flexible quantization structures. These structures, in turn, require an automatic training method that can infer statistics from example data and balance the various factors to optimize performance. The development of efficient, flexible quantization structures also opens up new coding applications, such as speaker-dependent coding. This approach promises improved performance but presents a variety of implementational challenges. The first part of this dissertation presents a variety of structured quantizers which strike different balances between complexity and performance. This includes the scalar transform coder, which is augmented with a flexible companding scalar quantizer based on Gaussian Mixtures. Next, a variety of extensions to the Gaussian Mixture Vector Quantizer (GMVQ) system for recursive coding are examined. Training techniques for these systems are developed based on High-Rate quantization theory, which provides a tractable objective function for use in automatic design. This replaces ad-hoc methods used for design of structured quantizers with a data-driven approach which is able to incorporate various distortion measures and structures. The performance of the systems is demonstrated on the problem of wideband speech spectrum coding.

The second part of this dissertation considers speaker-dependent wideband speech coding. Using the GMVQ system and training approach developed in the first portion, a study of the performance benefits of speaker-dependent coding in the CELP framework is undertaken. The three main types of CELP parameters (spectrum, adaptive codebook and fixed codebook) are all investigated, and the gains quantified. Next, a number of implementational issues related to speaker-dependent coding are addressed. A safety-net approach is utilized to provide robustness, and its implementation in the context of GMVQ is explored. A variety of online training architectures are presented which strike different balances between training complexity, communications overhead and performance. As components of these architectures, techniques for training on quantized data and recursive learning are examined

Year of Graduation: 2007