End2End sound localization model

Reference:

P. Vecchiotti, N. Ma, S. Squartini, and G. J. Brown, “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM,” in 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 345 E 47TH ST, NEW YORK, NY 10017 USA, 2019, pp. 451–455.

Only WaveLoc-GTF is implemented

Model

Training

Dataset

BRIR

Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

Room A B C D

RT_60(s) 0.32 0.47 0.68 0.89

DDR(dB) 6.09 5.31 8.82 6.12
Sound source

TIMIT database

Sentences per azimuth

Train Validate Evaluate

24 6 15

Multi-conditional training(MCT)

For For each reverberant room, the rest 3 reverberant rooms and anechoic room are used for training

Training curves

Evaluation

Root mean square error(RMSE) is used as the metrics of performance. For each reverberant room, the evaluation was performed 3 times to get more stable results and the test dataset was regenerated each time.

Since binaural sound is directly fed to models without extra preprocess and there may be short pulses in speech, the localization result was reported based on chunks rather than frames. Each chunk consisted of 25 consecutive frames.

My result vs. paper

Reverberant room	A	B	C	D
My result	1.5	2.0	1.4	2.7
Result in paper	1.5	3.0	1.7	3.5

Main dependencies

tensorflow-1.14
pysofa (can be installed by pip)
BasicTools (in my other repository)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
gen_dataset		gen_dataset
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
WaveLoc.py		WaveLoc.py
evaluate_mct.py		evaluate_mct.py
train_mct.py		train_mct.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End2End sound localization model

Model

Training

Dataset

Multi-conditional training(MCT)

Evaluation

My result vs. paper

Main dependencies

About

Releases

Packages

Languages

Room	A	B	C	D
RT_60(s)	0.32	0.47	0.68	0.89
DDR(dB)	6.09	5.31	8.82	6.12

Train	Validate	Evaluate
24	6	15

bingo-todd/WaveLoc

Folders and files

Latest commit

History

Repository files navigation

End2End sound localization model

Model

Training

Dataset

Multi-conditional training(MCT)

Evaluation

My result vs. paper

Main dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages