MPEG-4 Audio Transceiver
H.T. How, T.H. Liew and L. Hanzo
Dept. of Electr. and Comp. Sc.,Univ. of Southampton, SO17 1BJ, UK.
Tel: +44-1703-593 125, Fax: +44-1703-594 508
Email: [email protected],http://www-mobile.ecs.soton.ac.uk
Abstract|AnMPEG-4basedSpace-Time(ST)coded
Or-thogonal Frequency Division Multiplexing (OFDM) audio
transceiver is proposed. The high-quality MPEG-4 audio
codec is operated at bit rates between 16 and 64 kbit/s
perchannelandcombinedwithturbochannelcodesaswell
as space time codes. As expected, the space-time coding
scheme, using two transmitters and one receiver
outper-formed the conventional one transmitter, one receiver
ar-rangementbyabout4dBinchannelSNRterms,when
main-taininganerror-freeaudioquality.
I. Introduction
Themostsevere limitationsofwirelesscommunications
areimposedbythehostile,time-variantnatureofthe
wire-less channel. Hence a range of channel-qualitycontrolled
adaptive features havebeen incorporatedin a numberof
major cellular standards worldwide [1], including IS-95
CDMA, cdma2000andUMTS W-CDMA, IS-136TDMA,
theGeneral PacketRadioService ofGSM andin the
En-hancedDataratesforGlobalEvolution(EDGE)schemes.
Further near-instantaneouslyadaptivesystems havebeen
proposedin[2]-[4].
In this contribution the underlying trade-os of using
themulti-rateMPEG-4Transform-domainWeighted
Inter-leaved Vector Quantization (TWINVQ) audio codec [5],
in conjunction with turbo-coded and space-time coded,
adaptiveBinary PhaseShift Keying(BPSK),Quadrature
Phase Shift Keying (QPSK), 16-levelQuadrature
Ampli-tudeModulation(16QAM)assistedOrthogonalFrequency
Division Multiplex(OFDM)systemsareinvestigated.
Thepaperisstructuredasfollows. SectionIIprovidesa
briefsystemoverview,whiletheassociatedsystem
param-etersaretabulatedinTablesIandII. SectionIIIdescribes
the space-time coding scheme applied in our system.
Fi-nally, before concluding, our system performance results
aresummarisedinSectionV.
II. System Overview
Figure 1 shows the schematic of the turbo-coded and
space-time coded OFDMsystem[2]. Thesourcebits
gen-erated by the MPEG-4TWINVQ encoder [5] are passed
totheturboencoderusing thehalf-rate,constraintlength
threeturboconvolutionalencoderTC(2,1,3),employingan
octalgeneratorpolynomialof(7,5). Theencodedbitswere
channelinterleavedandpassedtothemodulator. Inorder
VTCFall,NewJersey,USA,2001
to counteract thenear instantaneouschannelquality
uc-tuations, burst-by-burst adaptive Quadrature Amplitude
Modulation (QAM) was combined with space-time
cod-ing [2]. Specically, the choice of the modulation scheme
tobeusedbythetransmitterforitsnextOFDMsymbolis
determinedbythechannelqualityestimateofthereceiver
basedon thecurrentOFDM symbol. Here,perfect
chan-nelqualityestimationandperfectsignallingoftherequired
modemmodeswereassumed. Inordertosimplifythetask
ofsignallingtherequiredmodulationmodes,weemployed
thesubband-adaptiveOFDM transmissionschemeof[14].
Morespecically,thetotalOFDM symbolbandwidthwas
dividedintoequi-widthsubbands,hostingagivennumber
ofsubcarriersandhavingasimilar channel quality,where
thesamemodem modewasassigned. Themodulated
sig-nals were then passed to the encoder of the space-time
blockcodeG
2
[6],[2],whichemploystwotransmittersand
onereceiver. Thespace-timeencodedsignalswereOFDM
modulatedandtransmittedbythecorrespondingantennas.
The received signals were OFDM demodulated and
passedtothespace-timedecoders. LogarithmicMaximum
A Posteriori(Log-MAP) decoding [7], [2] of the received
space-timesignalswasperformed,inordertoprovide
soft-outputs for the TC(2,1,3) turbo decoder. The received
bits were then channel deinterleaved and passed to the
TC decoder, which again, employs the Log-MAP
decod-ingalgorithm.Thedecodedbitswerenallypassedtothe
TWINVQ decoder for obtaining the reconstructed audio
signal.
TableIandIIgivesanoverviewoftheproposedsystem's
parameters. Thetransmissionparametershave been
par-tiallyharmonisedwiththoseoftheTDD-modeofthe
Pan-EuropeanUMTSsystem[8]. Thesamplingrateisassumed
to be1.3 MHz,leadingto a1024 subcarrierOFDM
sym-bol. Thechannelmodelusedwasthefour-pathCOST207
Typical Urban (TU) channel impulse response (CIR) [9],
whereeachimpulsewassubjectedtoindependentRayleigh
fadinghavinganormalisedDopplerfrequencyof2:2510 6
,
correspondingto apedestrianscenarioatawalkingspeed
of3mph.
Thechannelencoderisaconvolutionalconstituent
cod-ingbasedturboencoder[10],employingblockturbo
inter-leaversand a pseudo-random channel interleaver. Again,
theconstituentRecursiveSystematicConvolutional(RSC)
gen-Source
Encoder
Channel
Interleaver
Modulator
IFFT
IFFT
Channel
Source
Decoder
Turbo
Decoder
Channel
Deinterleaver
Demodulator
Space Time
Decoder
FFT
FFT
Encoder
Space Time
Turbo
Encoder
Fig.1. Schematicoverviewoftheturbo-codedandspace-timecodedOFDMsystem.
ChannelCodedBits/Packet 928 1856 3712
ChannelCodedBitRate(kbit/s) 39.9 79.9 159.9
SourceCodedBits/Packet 372 743 1486
SourceCodedBitRate(kbit/s) 16 32 64
ModulationMode BPSK QPSK 16QAM
MinimumChannelSNR for1%FER(dB) 4.3 7.2 12.4
MinimumChannelSNR for5%FER(dB) 2.7 5.8 10.6
TABLEI
GenericSystemParameters
erator polynomial of (7,5) [2]. Eight iterations are
per-formed at the decoder, utilising the MAP-algorithm and
the Log-Likelihood Ratio (LLR) soft inputs provided by
thedemodulator[2].
TheMPEG-4audioschemeisthemostrecent
compres-sion standard completed in 1999 [5], which supports the
encoding of speech signals at bit rates from 2 kbit/s to
24 kbit/s. It is alsocapable of encoding general audioat
ratesrangingfromverylowbitratestohighrates.
Specif-ically,audiocodingissupportedatarateof8kbit/sanda
bandwidthof4kHz,butthestandardalsoincludes
broad-castqualityaudiomonauralaswellasmultichannel
stereo-phoniccongurations[5]. TheTWINVQscheme[5],[11]is
oneoftheencodingtoolsdesignedforcodingaudiosignals
in MPEG-4, which was standardised for encoding audio
signals at extremely low bit rates. Hence the MPEG-4
TWINVQ[11], [12] codec has beenemployed in the
pro-posedsystem,whichcanbeprogrammedtooperateatthe
bitratesbetween16and64kbit/s,asshowninTableI.
III. Space-Time Coding
Traditionally,themosteectivetechniqueofcombating
fading has been the exploitation of diversity [13].
Diver-sitytechniques canbedividedintothreebroadcategories,
namelytemporaldiversity, frequencydiversityandspatial
diversity. Temporal and frequencydiversity introduce
re-dundancyinthetimeand/orfrequencydomain,which
re-sults in a loss of bandwidth eÆciency. Examples of
spa-tialdiversityare constitutedbymultiple transmit-and/or
receive-antenna based systems. Transmit-antenna
diver-sity relies on employing multiple antennas at the
trans-mitterandhenceitissuitablefordownlinktransmissions,
sincehavingmultipletransmitantennasatthebasestation
is certainly feasible. By contrast, receive-antenna
diver-SystemParameters Value
CarrierFrequency 1.9GHz
SamplingRate 3.78MHz
Channel
ImpulseResponse COST207
NormalisedDoppler Frequency 2:2510 6
OFDM
NumberofSubcarriers 1024
OFDMSymbols/Packet 1
OFDMSymbolDuration (1024+64)
x1/(3:7810 6
)
GuardPeriod 64samples
ModulationScheme BPSK/QPSK/16QAM
Space TimeCoding
Numberoftransmitters 2
Numberofreceivers 1
Channel Coding TurboConvolutional
ConstraintLength 3
CodeRate 0.5
GeneratorPolynomials 7,5
TurboInterleaverLength 464/928/1856/2784
DecodingAlgorithm LogMAP
NumberofIterations 8
Source Coding MPEG-4TWINVQ
BitRates(kbit/s) 16-64
AudioFrameLength(ms) 23.22
SamplingRate(kHz) 44.1
TABLEII
1 Tx 1 Rx
0
128
256
384
512
Subcarrier
index
0
25
50
75 100
125
150
Transmission
frame
(Time)
4
5
6
7
8
9
10
11
12
13
14
Instantenous
SNR
(dB)
2 Tx 1 Rx
0
128
256
384
512
Subcarrier
index
0
25
50
75 100
125
150
Transmission
frame
(Time)
4
5
6
7
8
9
10
11
12
13
14
Instantaneous
SNR
(dB)
Fig.2. InstantaneouschannelSNRof512-subcarrierOFDMsymbolsfortheone-transmitter,one-receiver(1Tx1Rx)andforthespace-time
blockcodedtwo-transmitter,one-receiver(2Tx1Rx)scenariosfortransmissionovertheCOST207TUchannelatanaverageSNRof10
dB.
sityemploysmultipleantennasatthereceiverforacquiring
multiple copies ofthe transmittedsignals,which are then
combinedinorderto mitigatethechannel-inducedfading.
Spacetimecoding[13],[6],[2]isaspecicformof
trans-mit antenna diversity, which aims for usefully exploiting
the multipath phenomenon experienced by signals
prop-agating through the dispersive mobile channel. This is
achievedbycombiningmultipletransmissionantennaswith
appropriate signal processing at the receiver, in order to
provide diversity and coding gain over uncoded
single-antennascenarios.
In this system, we employ a two-transmitter and
one-receiver conguration, in conjunction with turbo coding
[10]. InFigure2,weshowtheinstantaneouschannelSNR
experiencedbythe512-subcarrierOFDMmodemfora
one-transmitter, one-receiver scheme and for the space time
blockcodeG
2
[6]using twotransmittersand onereceiver
for transmission over the COST207 Typical Urban (TU)
channel. TheaveragechannelSNRwas10dB.Wecansee
inFigure2thatthevariationoftheinstantaneouschannel
SNR for aone-transmitter, one-receiverscheme is severe.
TheinstantaneouschannelSNRmaybecomeaslowas4dB
duetothedeepfadesinictedbythechannel. Ontheother
hand,wecanseethatforthespace-timeblockcodeG
2
us-ingonereceiverthevariationoftheinstantaneouschannel
SNRislesssevere. Explicitly,byemployingmultiple
trans-mitantennasinFigure2,wehavesignicantlyreducedthe
eect of thechannels' deepfades. Whilstspace-time
cod-ing endeavours to mitigate the fading-related time- and
frequency-domain channel-qualityuctuationsat the cost
ofincreasingthetransmitter'scomplexity,adaptive
modu-lationattemptstoaccommodatethesechannelquality
uc-tuationsbyappropriatelycontrollingtheAQAMmodes,as
itwillbeoutlinedinthenextsection.
IV. Adaptive Modulation
In order to accommodate the time- and
frequency-domain channel quality variations, a multi-mode system
is desirable, which allows us to switch between a set of
dierent source- and channel coders as well as
transmis-sion parameters,depending on the instantaneous channel
quality[14],[2],[3],[4].
In the proposed system, wehave dened three
operat-ing modes, which correspond to the uncoded audio bit
rates of 16, 32 and 64 kbit/s. This corresponds to 372,
743and 1486 bitsper23.22 ms audio frame. In
conjunc-tion with half-rate channel coding and also allowing for
checksumsandsignallingoverheads,thenumberof
trans-mitted turbo coded bits per OFDM symbol is 928, 1856
and 3712 for the three dierent source-coded modes,
re-spectively. Again, these bit ratesare also summarised in
Table I. Each transmission mode uses a dierent
mod-ulation scheme, depending on the instantaneous channel
conditions. Itis benecial,if thetransceivercandrop its
source rate, for example from 64 kbit/s to 32 kbit/sand
invokeQPSK modulationinsteadof 16QAM,while
main-tainingthesamebandwidth. Hence,during good channel
conditions,thehigherthroughput,higheraudioqualitybut
less robustmode of operations canbe invoked, while the
morerobustbutloweraudioqualityBPSK/16kbit/smode
canbeappliedduringdegradingchannelconditions.
Figure 3 shows the FER observed for all three audio
codingmodes versusthechannel BERthat waspredicted
bythereceiverduringthe channelqualityestimation
pro-cess. TherationalebehindusingtheFER,ratherthanthe
BER for estimating the expected channel quality of the
nexttransmitted OFDM symbol isthat the MPEG-4
au-dio codechas to dropthe specic turbo-decoded received
OFDMsymbols,whichcontainedtransmissionerrors. This
is because corrupted audio packets would result in
0.0
0.05
0.1
0.15
0.2
Channel BER
10
-4
2
5
10
-3
2
5
10
-2
2
5
10
-1
2
5
10
0
FER
64kbit/s mode
32kbit/s mode
16kbit/s mode
Fig.3. FERagainstchannelBERoftheOFDM/G2/TC(2,1,3)
sys-temfortransmissionovertheCOST207TUchannel.
0
2
4
6
8
10
12
14
16
18
20
SNR (dB)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
BPS
FER=10
-2
FER=10
-3
Fixed
AQAM
Fig.4. BPSperformancecomparisonbetweentheadaptiveandxed
modulationschemes.
wasobservedforanestimatedchannelerrorrateofabout
4%forthe16and32kbit/smodes,whileaBERofover5%
wastolerable for the 64 kbit/s mode. This was, because
the numberofbits perOFDM symbol wasquadrupledin
the 16QAM mode compared to the BPSK mode, which
substantiallyincreasedtheturbocodec'sperformancedue
to beneting from alonger interleaverwithout increasing
theaudiosystem'sdelay.
In Figure 4, we show the Bits Per Symbol (BPS)
throughputperformancecomparisonbetweenthe
subband-adaptive and xed OFDM modulation schemes. From
the gure we can see that at a low BPS throughput the
adaptive modulation scheme outperforms the xed
mod-ulation scheme. However, as the BPS throughput of the
system increases, the xed modulation schemes are
pre-ferred. This is, becauseadaptivemodulation is
advanta-geous, when there are high channel quality variations, as
in the one-transmitter,onereceiverscheme. However, we
hasbeensignicantlyreducedbyemployingtwoG
2
space-time transmitters. Therefore, the advantages of adaptive
modulationerodedduetothereducedchannelquality
vari-ations experienced inthe contextof thespace-time coded
system. Twodierent-complexitysystemdesignprinciples
can be proposed as a consequence. The rst system is
the one-transmitter, one receiver scheme (1T-1R), which
mitigates the severe variation of the channel quality by
employingadaptive modulation. Bycontrast, wecan
de-signamorecomplex(2T-2R)space-timesystem,which
em-ploysxedmodulationschemes,sincenosubstantial
bene-tsaccruefrom employingadaptivemodulation, once the
fading-inducedchannel-qualityuctuations havebeen
suf-cientlymitigatedbytheG
2
space-timecode. Intherest
of this contribution, we have opted for investigating the
performanceofthespace-time coded system,requiringan
increased complexity in conjunction with dierent
modu-lationmodes,namelyBPSK,QPSKand16QAM.
V. System Performance
Asmentionedbefore,thedetailedsubsystemparameters
used in ourspace-time coded OFDM systemare listed in
TableII. Again, the channel prole used was the COST
207TUchannel[9]havingfourpathsandamaximum
dis-persionof4:5s,whereeachpathwasfadedindependently
ataDopplerfrequencyof2:2510 6
Hz.
Wefoundthattheassociatedaudioqualityexpressedin
termsof the Segmental SNR (SEGSNR) degradationwas
deemed to be perceptually objectionable for frame error
ratesin excess of5%. For thesakeof maintainingahigh
audioquality,amaximumtargetFERof1%wasstipulated
based on our informal listening tests. Again, corrupted
framesweredroppedand thepreviousuncorruptedframe
wasusedforreplacingthedroppedframe.
TheBERisplottedversusthechannelSNR inFigure5
forthethreedierentmodesofoperation. Theresults
us-ingtheconventionalone-transmitter,one-receiverscenario
were also shown in conjunction with the 1, 2and 4BPS
xed modem modes as abenchmarker. The employment
of space time coding improved the system's performance
signicantly, givingan approximately3 dB channel SNR
improvement at a FER of 1%. As expected, the lowest
throughput BPSK/16kbit/s mode was morerobust, than
the QPSK/32kbit/s and the 16QAM/64kbit/s
congura-tions,albeitdeliveringaloweraudioquality.
The overall SEGSNR versuschannel SNR performance
oftheproposed audiotransceiveris displayedinFigure 6,
again,employingG
2
space-timecodingusingtwo
transmit-tersandonereceiver. Thebenchmarkerusingthe
conven-tional one-transmitter,one-receiverschemewasalso
char-acterised in the gure. Weobserve that the employment
of space time coding provides a substantial improvement
in terms of maintaining an error free audio performance.
Specically,anSNR advantageof4dBwasrecorded
com-pared to the conventional one-transmitter, one-receiver
benchmarker for all three modulation modes.
0
2
4
6
8
10
12
14
16
18
Channel SNR (dB)
10
-4
2
5
10
-3
2
5
10
-2
2
5
10
-1
2
5
10
0
BER
Conventional 1T-1R
Space Time 2T-1R
16QAM/64kbit/s
QPSK/32kbit/s
BPSK/16kbit/s
Fig. 5. BER versus Channel SNR performance of the
G
2
/OFDM/TC(2,1,3)scheme(2T-2R)fortransmissionoverthe
COST207 TUchannel, incomparison to the conventional
one-transmitter,one-receiver(1T-1R)benchmarker.
0
2
4
6
8
10
12
14
16
Channel SNR(dB)
6
8
10
12
14
16
18
20
22
SEGSNR
(dB)
Space Time 2T-1R
Conventional 1T-1R
16QAM/64kbit/s
QPSK/32kbit/s
BPSK/16kbit/s
Fig. 6. SEGSNR versus channel SNR performance of the
G2/OFDM/TC(2,1,3)scheme(2T-2R)fortransmissionoverthe
COST207 TUchannel, incomparison to the conventional
one-transmitter,one-receiver(1T-1R)benchmarker.
space-time coding, namely onthe curvesdrawnin dotted
lines, the 16QAM/64kbit/s mode was shown to
outper-form theQPSK/32kbit/s scheme in terms of both
objec-tiveand subjectiveaudio quality forchannel SNRs in
ex-cess of about 10 dB. At a channel SNR of about 9 dB,
where the16QAMandQPSKSEGSNRcurvescrosseach
other in Figure6, itis preferableto invoketheinherently
loweraudioquality,butunimpairedQPSKmodeof
opera-tion. Similarly, ataround5dB whentheQPSK/32kbit/s
scheme's performance startsto degrade,itis better to
in-vokethe unimpairedBPSK/ 16kbit/s mode ofoperation,
in ordertoavoidthechannel-inducedaudioartifacts.
Inconclusion,aturbocodedandspace-timecoded
adap-tiveversusxedmodulationbasedOFDMassisted
MPEG-4 audio system has been investigated. The space-time
coded, two-transmitter, one-receiver conguration
out-performed the conventional one-transmitter, one-receiver
scheme by about 4dB in channel SNR terms, when
com-municatingoverthehighly dispersiveCOST207TU
chan-nel. Inourfuture work, unequalerrorprotectionassisted
schemeswill bebenchmarkedagainsttheproposedpacket
droppingregimeofthis contribution.
References
[1] S.Nanda,K.Balachandran, andS.Kumar,\Adaptation
tech-niquesinwirelesspacketdataservices,"IEEECommun.Mag.,
pp.54{64,Jan.2000.
[2] L.Hanzo, T.H.Liew,B.L.Yeap: TurboCoding,Turbo
Equal-isation and Space-Time Coding for Transmission over Fading
Channels,JohnWiley,toappearin2001
[3] L.Hanzo,F.C.A.Somerville,J.P.Woodard:VoiceCompression
andCommunications:PrinciplesandApplicationsforFixedand
WirelessChannels;IEEEPress-JohnWiley, 1
[4] L.Hanzo, P.Cherriman, J. Streit: WirelessVideo
Communi-cations: SecondtoThirdGenerationandBeyond,IEEE Press,
2001 2
[5] ISO/IEC JTC1/SC29/WG11/N2203, MPEG-4 Audio
Ver-sion 1 Final Committee Draft 14496-3, http://www.tnt.
uni-hannover.de/projec t/m peg/ aud io/d ocu ment s/, March
1998.
[6] S.Alamouti,\Asimpletransmitdiversitytechniqueforwireless
communications,"IEEEJournalonSelected Areasin
Commu-nications,vol.16,pp.1451{1458,October1998.
[7] L.Bahl,J.Cocke,F.Jelinek,andJ.Raviv,\Optimaldecoding
oflinearcodesforminimizingsymbolerrorrate,"IEEE
Trans-actions on Information Theory, vol. 20, pp. 284{287, March
1974.
[8] H. Holmaand A.Toskala,WCDMAfor UMTS. John
Wiley-IEEEPress,April2000.
[9] M. Failli, \Digital land mobile radio communications COST
207", Technical Report, European Commission, Luxembourg,
1989.
[10] C.Berrou,A.Glavieux,andP.Thitimajshima,\NearShannon
limit error-correcting coding and decoding: Turbo codes," in
ProceedingsofICC,pp.1064{1070,1993.
[11] N.Iwakami,T.Moriya,andS.Miki,\High-qualityaudio
cod-ingatlessthan 64kbit/sbyusingtransform-domainweighted
interleave vector quantization (TwinVQ)," in Proceedings of
ICASSP,pp.3095{3098,May1995.
[12] T.Moriya,N.Iwakami,A.Jin,K.Ikeda, and S.Miki, \A
de-sign of transform coder for both speech and audio signals at
1 bit/sample,"inProceedings of ICASSP,pp.1371{1374, Apr
1997.
[13] V.Tarokh,N.Seshadri,andA.Calderbank,\Space-timecodes
for highdata ratewirelesscommunication: Performance
crite-rionandcodeconstruction,"IEEETransactionsonInformation
Theory,vol.44,pp.744{765,March1998.
[14] L.Hanzo,W.T.Webband T.Keller,Single and Multicarrier
Quadrature Amplitude Modulation : Principles and
Applica-tionsforPersonalCommunications,WLANsandBroadcasting
(2ndEd.). JohnWiley-IEEEPress,April2000.
[15] L.HanzoandJ.P.Woodard,\AnIntelligentMultimodeVoice
CommunicationsSystemForIndoors Communications,"IEEE
TransactionsonVehicularTechnology,vol.44,pp.735{749,Nov
1995.
[16] T.Keller,M.Muenster,andL.Hanzo,\ATurbo-Coded
Burst-By-BurstAdaptiveWidebandSpeechTransceiver,"IEEE
Jour-nal on Selected Areas in Communications, vol. 18, pp. 2363{
2372,Nov2000.
1
For detailed contents please refer to
http://www-mobile.ecs.soton.ac.uk
2
For detailed contents please refer to