IWANO Koji | Tokyo City University Researchers Information System

IWANO Koji

Profile	Research field	Research achievement	Educational achievement	Social contribution achievement

Books etc
No.	Title	Autour Type	Publisher	Publication date	Range	ISBN
1	DSP for In-Vehicle and Mobile Systems	Contributor	Springer-Verlag	Jan. 2005	Chapter 9, pp.139-152, Noise Robust Speech Recognition Using Prosodic Information	0387229787
2	Spoken Multimodal Human-Computer Dialogue in Mobile Environments	Contributor	Springer-Verlag	Jan. 2005	Chapter 3, pp.37-53, A Robust Multimodal Speech Recognition Method Using Optical Flow Analysis	1402030738
3	Text to Speech Synthesis - New Paradigms and Advances -	Contributor	Prentice Hall PTR	Jul. 2004	Chapter 8, pp.155-173, Prosody Control for HMM-Based Japanese TTS	013145661X

Published Papers
No.	Ｔitle	Journal	Vol	No	Start Page	End Page	Publication date	DOI	Referee
1	Multimodal Speech Recognition Using Mouth Images from Depth Camera	Proc. APSIPA			pp. 1233	1236	Dec. 2017	https://doi.org/10.1109/APSIPA.2017.82822271	Refereed
2	Error correction using long context match for smartphone speech recognition	IEICE Transactions on Information and Systems	E98D	11	1932	1942	Nov. 1, 2015	https://doi.org/10.1587/transinf.2015EDP71791	Refereed
3	Error Correction Using Long Context Match for Smartphone Speech Recognition	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS	E98D	11	1932	1942	Nov. 2015	https://doi.org/10.1587/transinf.2015EDP71791	Refereed
4	AN EFFICIENT ERROR CORRECTION INTERFACE FOR SPEECH RECOGNITION ON MOBILE TOUCHSCREEN DEVICES	2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014			454	459	2014		Refereed
5	Simple Gesture-based Error Correction Interface for Smartphone Speech Recognition	15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4			1194	1198	2014		Refereed
6	Feature normalization based on non-extensive statistics for speech recognition	Speech Communication	55	5	587	599	Jun. 2013	https://doi.org/10.1016/j.specom.2013.02.0041	Refereed
7	A noise-robust speech recognition approach incorporating normalized speech/non-speech likelihood into hypothesis scores	SPEECH COMMUNICATION	55	2	377	386	Feb. 2013	https://doi.org/10.1016/j.specom.2012.10.0011	Refereed
8	Detection of overlapped speech using lapel microphones in meeting	Speech Communication	55	10	941	949	2013	https://doi.org/10.1016/j.specom.2013.06.0131	Refereed
9	Spectral subtraction based on non-extensive statistics for speech recognition	IEICE Transactions on Information and Systems	E96-D	8	1774	1782	2013	https://doi.org/10.1587/transinf.E96.D.17741	Refereed
10	Q-Gaussian based spectral subtraction for robust speech recognition	13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3			1254	1257	2012		Refereed
11	Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity	13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3			1498	1501	2012		Refereed
12	VAD-measure-embedded Decoder with Online Model Adaptation	11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4			3122	+	2010		Refereed
13	Robust Speech Recognition Using VAD-measure-embedded Decoder	INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5			2203	+	2009		Refereed
14	GENERALIZATION OF SPECIALIZED ON-THE-FLY COMPOSITION	2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS			4317	4320	2009		Refereed
15	Optimization of On-the-Fly Composition for WFST-Based Speech Recognition Decoders	The IEICE Transactions on Information and Systems	Vol.J92-D	No.7	1026	1035	2009		Not refereed
16	Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance	COMPUTER SPEECH AND LANGUAGE	22	2	171	184	Apr. 2008	https://doi.org/10.1016/j.csl.2007.07.0031	Refereed
17	Evaluation of a noise-robust multi-stream speaker verification method using F(0) information	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS	E91D	3	549	557	Mar. 2008	https://doi.org/10.1093/ietisy/e9l-d.3.5491	Refereed
18	Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages	EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING	Vol.2008	Article ID 573832	7 pages		2008	https://doi.org/10.1155/2008/5738321	Refereed
19	Thai Broadcast News Corpus Construction and Evaluation	SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008			1249	1254	2008		Refereed
20	Implementation and Evaluation of Fast On-the-fly WFST Composition Algorithms	INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5			2110	2113	2008		Refereed
21	Combining Gaussian mixture model with Global Variance term to improve the quality of an HMM-based polyglot speech synthesizer	2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3			1241	+	2007		Refereed
22	Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images	EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING	Vol.2007	Article ID 64506	9 pages		2007	https://doi.org/10.1155/2007/645061	Refereed
23	The effect of spectral space reduction in spontaneous speech on recognition performances	2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3			473	+	2007		Refereed
24	Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition	INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4			89	92	2007		Refereed
25	Presentation-Content Retrieval Integrated with the Speech Information	IEICE Transactions on Information and Systems	Vol.J90-D	No.2	209	222	2007		Not refereed
26	New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer	SPEECH COMMUNICATION	48	10	1227	1242	Oct. 2006	https://doi.org/10.1016/j.specom.2006.05.0031	Refereed
27	Sentence-extractive automatic speech summarization and evaluation techniques	SPEECH COMMUNICATION	48	9	1151	1161	Sep. 2006	https://doi.org/10.1016/j.specom.2006.04.0051	Refereed
28	A stream-weight and threshold estimation method using adaboost for multi-stream speaker verification	2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13			5939	5942	2006		Refereed
29	A Weight Estimation Method Using LDA for Multi-Band Speech Recognition	INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5			2534	2537	2006		Refereed
30	A stream-weight and threshold estimation method using Adaboost for multi-stream speaker verification	2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS			1081	+	2006		Refereed
31	Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese	SPEECH COMMUNICATION	47	1-2	208	219	Sep. 2005	https://doi.org/10.1016/j.specom.2005.02.0101	Refereed
32	Sentence extraction-based presentation summarization techniques and evaluation metrics	2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5			1065	1068	2005		Refereed
33	A ROBUST MULTIMODAL SPEECH RECOGNITION METHOD USING OPTICAL FLOW ANALYSIS	SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS	28		37	53	2005		Refereed
34	Why is the recognition of spontaneous speech so hard?	TEXT, SPEECH AND DIALOGUE, PROCEEDINGS	3658		9	22	2005		Refereed
35	A stream-weight optimization method for multi-stream HMMS based on likelihood value normalization	2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5			469	472	2005		Refereed
36	Polyglot synthesis using a mixture of monolingual corpora	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	I		I1	I4	2005	https://doi.org/10.1109/ICASSP.2005.14150351	Refereed
37	Noise robust speech recognition using F-0 contour information	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS	E87D	5	1102	1109	May. 2004		Refereed
38	Multi-modal speech recognition using optical-flow analysis for lip images	JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY	36	2-3	117	124	Feb. 2004		Refereed
39	A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMS	2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS			857	860	2004		Refereed
40	Unsupervised class-based language model adaptation for spontaneous speech recognition	2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS			236	239	2003		Refereed
41	Parallel computing-based architecture for mixed-initiative spoken dialogue	FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS			53	58	2002		Refereed
42	Ubiquitous speech processing	2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS			13	16	2001		Refereed
43	Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	3		1763	1766	2000	https://doi.org/10.1109/ICASSP.2000.8620941	Refereed
44	Integration of Prosodic Word Boundary Detection to Unlimited-Vocabulary Speech Recognition	IEICE Transactions on Information and Systems	Vol.J83-D-II	No.10	1977	1985	2000		Not refereed
45	A Statistical Modeling of Fundamental Frequency Contours in Moraic Unit and Its Use for the Detection of Prosodic Word Boundaries	IPSJ Journal	Vol.40	No.4	1356	1364	1999		Not refereed

MISC
No.	Ｔitle	Journal	Vol	No	Start Page	End Page	Publication date
1	Analysis of effects of voice mimicry on speaker verification and acoustic features of the imitated voices	IEICE technical report. Speech	114	411	43	48	Jan. 22, 2015
2	Error Correction Using Long Context Match for Smartphone Speech Recognition	IEICE technical report. Speech	114	365	117	122	Dec. 15, 2014
3	Error Correction Using Long Context Match for Smartphone Speech Recognition	IPSJ SIG Notes	2014	22	1	6	Dec. 8, 2014
4	Detecting Overlapped Speech in Meeting Recorded by Lapel Microphones		2012	6	1	6	Jul. 12, 2012
5	Two-pass Approach for Recognizing Code-Switching Speech	Technical report of IEICE. PRMU	111	430	225	229	Feb. 2, 2012
6	Nonlinear Normalization Using q-Logarithm for Robust Speech Recognition	IEICE technical report	111	153	45	50	Jul. 14, 2011
7	Noise-robust speech recognition decoder using speech/non-speech confidence measures	IEICE technical report	110	81	49	54	Jun. 10, 2010
8	A Prosody Adaptation Method for HMM-based Speech Synthesis Achieving High Naturalness and Individurity		2010	12	1	6	Feb. 5, 2010
9	A mean F_0 speaker adaptation method for regression model-based F_0 contour generation	IEICE technical report	109	99	87	92	Jun. 17, 2009
10	A study on prosody control for spontaneous speech synthesis		2009	23	1	8	May. 14, 2009
11	Speeding up fundamental frequency information extraction by Hough transform for noise-robust speech recognition	IEICE technical report	108	422	19	24	Jan. 22, 2009
12	Improvements and evaluations of on-the-fly WFST composition in speech recognition	IPSJ SIG Notes	2008	102	29	34	Oct. 17, 2008
13	Accent analysis for Mandarin large vocabulary continuous speech recognition		38	2	123	127	Mar. 20, 2008
14	Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment	IEICE technical report	107	551	93	98	Mar. 13, 2008
15	Accent Analysis for Mandarin Large Vocabulary Continuous Speech Recognition	IEICE technical report	107	551	87	91	Mar. 13, 2008
16	Speaker verification using multi-stream HMMs with dimensionally weighted feature vectors	IEICE technical report. Speech	107	406	43	47	Dec. 13, 2007
17	A Study on the Statistical Models for HMM-Based Spontaneous Speech Synthesis	IEICE technical report	107	77	13	18	May. 24, 2007
18	A Study on Multimodal Speech Recognition for Spoken Dialogue Systems	IEICE technical report	107	77	19	24	May. 24, 2007
19	Using presentation slide information for lecture speech recognition	IPSJ SIG Notes	2006	136	221	226	Dec. 22, 2006
20	Using presentation slide information for lecture speech recognition		106	442	43	48	Dec. 15, 2006
21	The Analysis of Acoustic and Linguistic Characteristics in Spontaneous Japanese	IEICE technical report	106	78	19	24	May. 19, 2006
22	An LDA-based Weight Estimation Method for Multi-Band Speech Recognition	IEICE technical report	106	78	13	18	May. 19, 2006
23	HMM-based speaker adaptable polyglot synthesizer : Development and evaluation	IEICE technical report	105	494	127	132	Dec. 22, 2005
24	Spoken dialogue system robust against speech variations based on massively parallel computing	IEICE technical report	105	494	1	6	Dec. 22, 2005
25	Spoken dialogue system robust against speech variations based on massively parallel computing	IPSJ SIG Notes	2005	127	91	96	Dec. 22, 2005
26	HMM-based speaker adaptable polyglot synthesizer : Development and evaluation	IPSJ SIG Notes	2005	127	217	222	Dec. 22, 2005
27	A threshold optimization method based on Adaboost for multi-stream speaker verification	IPSJ SIG Notes	2005	127	1	6	Dec. 21, 2005
28	A threshold optimization method based on Adaboost for multi-stream speaker verification	IEICE technical report	105	495	1	6	Dec. 21, 2005
29	Sentence Extraction-Based Speech Summarization Methods and Objective Evaluation Techniques	IEICE technical report. Speech	105	132	1	6	Jun. 16, 2005
30	Language Model Adaptation for ASR Using Machine-Translated Data	IEICE technical report. Speech	105	132	19	23	Jun. 16, 2005
31	Toward realization of HMM-based spontaneous speech synthesis	IEICE technical report. Speech	105	98	25	30	May. 20, 2005
32	A study on automatic lecture segmentation for indexing purposes		2005	1	7	8	Mar. 8, 2005
33	Addition of new languages to a polyglot HMM-based synthesizer		2005	1	197	198	Mar. 8, 2005
34	Evaluation of speech summarization techniques using objective metrics		2005	1	3	4	Mar. 8, 2005
35	Analysis of cepstral features of Japanese spontaneous speech using Mahalanobis distance		2005	1	231	232	Mar. 8, 2005
36	A stream-weight optimization method for audio-visual speech recognition in real environments	IPSJ SIG Notes	2005	12	29	34	Feb. 4, 2005
37	A stream-weight optimization method based on boosting for multi-stream speaker verification	IEICE technical report. Natural language understanding and models of communication	104	539	85	90	Dec. 21, 2004
38	A stream - weight optimization method based on boosting for multi - stream speaker verification	IPSJ SIG Notes	2004	131	175	180	Dec. 21, 2004
39	Analysis of acoustic characteristics in sponaneous speech using Corpus of spontaneous Japanese	IPSJ SIG Notes	2004	103	7	12	Oct. 22, 2004
40	Use of F0 information for noise - robust speaker verification	IPSJ SIG Notes	2004	57	31	36	May. 28, 2004
41	Use of F_0 information for noise-robust speaker verification	IEICE technical report. Speech	104	87	1	6	May. 21, 2004
42	Investigation of a stream - weight optimization method for multi - modal speech recognition	IPSJ SIG Notes	2003	124	241	246	Dec. 18, 2003
43	Noise - robust speech recognition using band - dependent weighted likelihood	IPSJ SIG Notes	2003	124	19	24	Dec. 18, 2003
44	Investigation of a stream-weight optimization method of multi-modal speech recognition	IEICE technical report. Natural language understanding and models of communication	103	517	241	246	Dec. 11, 2003
45	Noise-robust speech recognition using band-dependent weighted likelihood	IEICE technical report. Natural language understanding and models of communication	103	517	19	24	Dec. 11, 2003
46	Multi-Modal Person Authentication Using Speech and Ear Images	IEICE technical report. Speech	103	94	25	30	May. 30, 2003
47	A Multi - Modal Speech Recognition Using Side - Face Images	IPSJ SIG Notes	2003	58	61	66	May. 27, 2003
48	Use of Prosodic Information for Noise - Robust Speech Recognition	IPSJ SIG Notes	2003	58	55	60	May. 27, 2003
49	A Rapid Listening System for Presentations Using Automatic Speech Summarization Techniques	IPSJ SIG Notes	2003	57	83	88	May. 26, 2003
50	Improvement of visual features for multi-modal speech recognition		2003	1	195	196	Mar. 18, 2003
51	Multi-modal speaker verification using speech and face images		2003	1	107	108	Mar. 18, 2003
52	Improving naturalness using residual exicitation for HMM-based speech synthesis		2003	1	241	242	Mar. 18, 2003
53	Multi-modal speaker verification using speech and ear images		2003	1	109	110	Mar. 18, 2003
54	Unsupervised batch-type topic adaptation for language models		2003	1	129	130	Mar. 18, 2003
55	Unsupervised batch - type adaptation method for language models	IPSJ SIG Notes	2002	121	183	188	Dec. 16, 2002
56	Unsupervised batch-type adaptation method for language models	IEICE technical report. Natural language understanding and models of communication	102	528	19	24	Dec. 13, 2002
57	Robust F_0 Extraction for Noisy Environments and Its Use for Speech Recognition	Technical report of IEICE. EA	102	33	37	42	Apr. 19, 2002
58	Parallel computing-based meeting speech recognition system with incremental on-line speaker adaptation		2002	1	105	106	Mar. 18, 2002
59	Evaluation of multi-modal speech recognition in real environments		2002	1	151	152	Mar. 18, 2002
60	A study on multi - modal speech recognition using optical - flow analysis		2002	10	33	38	Feb. 1, 2002
61	Robust Pitch Extraction for Noisy Environments Using Hough Transformation	IPSJ SIG Notes	2001	100	9	14	Oct. 19, 2001
62	A Study on F0 Contour Generation Factors Using Categorical Multiple Regression	IPSJ SIG Notes	2001	100	15	20	Oct. 19, 2001
63	Meeting speech recognition system using parallel computing.		2001	2	113	114	Oct. 1, 2001
64	A study on pitch contour generation factors using categorical multiple regression.		2001	2	221	222	Oct. 1, 2001
65	Multimodal speech recognition using optical-flow analysis		2001	2	27	28	Oct. 1, 2001
66	Pitch extraction using Hough transformation under noisy environments		2001	2	209	210	Oct. 1, 2001
67	Development and Evaluation of a Spoken Dialog System Using Spontaneous Speech	IPSJ SIG Notes	2001	55	79	86	Jun. 1, 2001
68	Use of Prosodic Word Boundary Information for Unlimited-Vocabulary Speech Recognition	IEICE technical report. Natural language understanding and models of communication	99	524	73	78	Dec. 21, 1999
69	Use of Prosodic Word Boundary Information for Unlimited - Vocabulary Speech Recognition	IPSJ SIG Notes	1999	108	205	210	Dec. 20, 1999
70	Recognition of Family and Given Names of Unlimited Vocabulary Based on Prosodic Word Boundary Detection		1999	1	151	152	Mar. 1, 1999
71	Recognizing Accent Types and Detecting Prosodic Word Boundaries Using Statistical Models of Moraic Transition	IEICE technical report. Speech	98	106	1	8	Jun. 12, 1998
72	Expression of Accent Phrases by Statistical Models of Moraic Transition		1998	1	153	154	Mar. 1, 1998
73	Improvements in Syntactic Boundary Detection by Statistical Models of Moraic Transition		1997	2	133	134	Sep. 1, 1997
74	Detecting Syntactic Boundaries Using Statistical Models of Moraic Transition	IEICE technical report. Speech	97	114	33	40	Jun. 19, 1997

Conference Activities & Talks
No.	Ｔitle	Conference	Publication date	Promoter	Venue
1	Multimodal Speech Recognition and Analysis of Spontaneous Speech - Memories of Research in Furui Laboratory -		Jan. 26, 2023
2	Analysis of Effects of Voice Mimicry Attack by Professional/Non-Professional Impersonators on Deep Learning Based Speaker Verification	The 12th Symposium on Biometrics, Recognition and Authentication	Nov. 16, 2022
3	Noise-Tolerant Time-Domain Speech Separation with Noise Bases	Asia-Pacific Signal and Information Processing Association Annual Summit and Conference	Dec. 16, 2021
4	Noise-robust time-domain speech separation with basis signals for noise		Mar. 3, 2021
5	Team Takoyaki submission for VoxCeleb Speaker Recognition Challenge 2020	the VoxSRC Workshop 2020	Oct. 2020
6	A Kinect-based Multimodal Person Authentication System with User Existence Confirmation	電子情報通信学会技術研究報告	Mar. 11, 2018
7	Multimodal speech recognition using mouth images from depth camera	Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017	Feb. 5, 2018
8	Neural network-based estimation of degree of feeling that natural objects appear in photographic images	電子情報通信学会技術研究報告	Sep. 28, 2017
9	口唇深度画像を利用したディープオートエンコーダに基づくマルチモーダル音声認識	日本音響学会研究発表会講演論文集(CD-ROM)	Sep. 11, 2017
10	プロの物真似タレントの声真似が話者照合に与える影響と音響特徴の分析	電子情報通信学会技術研究報告	Aug. 23, 2017
11	口唇の深度画像を用いたディープオートエンコーダによるマルチモーダル音声認識	情報処理学会研究報告(Web)	Jul. 20, 2017
12	話者照合におけるプロの物真似タレントの声真似攻撃の影響の分析	情報処理学会全国大会講演論文集	Mar. 16, 2017
13	日本語楽曲の旋律と歌詞のアクセントの関係分析のための自動対応付け	情報処理学会全国大会講演論文集	Mar. 16, 2017
14	話者認識と顔画像認識を用いた映像におけるマルチモーダル人物同定	日本音響学会研究発表会講演論文集(CD-ROM)	Mar. 1, 2017
15	Analysis of Voice Imitation by Professional/Non-Professional Impersonators Based on Kullback–Leibler Divergence between Acoustic Models	Joint Meeting of Acoustical Society of America and Acoustic Society of Japan	Nov. 2016
16	複数スマートフォンで収録された会話音声の対話グループ検出と話者決定の性能改善	電子情報通信学会技術研究報告	Aug. 17, 2016
17	Music retrieval based on time structure information of musical instruments and musical instrument activity detection using Deep Neural Network	情報処理学会研究報告(Web)	May. 14, 2016
18	TokyoTech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task	CEUR Workshop Proceedings	Jan. 1, 2016
19	An efficient error correction interface for speech recognition on mobile touchscreen devices	2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings	Apr. 1, 2014
20	Simple gesture-based error correction interface for smartphone speech recognition	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	Jan. 1, 2014
21	Q-Gaussian based spectral subtraction for robust speech recognition	13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012	Dec. 1, 2012
22	Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity		2012
23	An Efficient Prosody Adaptation Method and Its Application to HMM-based Speech Synthesis		2010
24	Recent Development of WFST-Based Speech Recognition Decoder		2009
25	Robust Speech Recognition Using VAD-Measure-Embedded Decoder		2009
26	Generalization of Specialized On-the-fly Composition		2009
27	Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation		2009
28	Noise Robust Speech Recognition Using Spectral Subtraction and F0 Information Extracted by Hough Transform		2009
29	Accent Analysis for Mandarin Large Vocabulary Continuous Speech Recognition		2008
30	Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment		2008
31	Development of a Speech Recognition System for Icelandic Using Machine Translated Text		2008
32	Thai Broadcast News Corpus Construction and Evaluation		2008
33	Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition		2007
34	Combining Gaussian Mixture Model with Global Variance Term to Improve the Quality of an HMM-Based Polyglot Speech Synthesizer		2007
35	Development of a Speech Recognition System Using a Sparse Training Corpus		2007
36	Acoustic and Linguistic Characterization of Spontaneous Speech		2007
37	The Effect of Spectral Space Reduction in Spontaneous Speech on Recognition Performances		2007
38	A Stream-Weight and Threshold Estimation Method Using Adaboost for Multi-Stream Speaker Verification		2006
39	A Large Vocabulary Continuous Speech Recognition System for Indonesian Language		2006
40	Acoustic and Linguistic Characterization of Spontaneous Speech		2006
41	Progress on a Speaker Adaptable Polyglot Synthesizer		2006
42	A Weight Estimation Method Using LDA for Multi-Band Speech Recognition		2006
43	New Approach to Polyglot Synthesis: How to Speak Any Language with Anyone's Voice		2006
44	Why is Automatic Recognition of Spontaneous Speech So Difficult?		2006
45	Cross-Language Synthesis with a Polyglot Synthesizer		2005
46	Multimodal Speaker Verification Using Ear Image Features Extracted by PCA and ICA		2005
47	Language Model Adaptation for Resource Deficient Language Using Translated Data		2005
48	Stream-Weight Optimization by LDA and Adaboost for Multi-Stream Speaker Verification		2005
49	Cluster-Based Modeling for Ubiquitous Speech Recognition		2005
50	Analysis of Spectral Space Reduction in Spontaneous Speech and Its Effects on Speech Recognition Performance		2005
51	Why Is the Recognition of Spontaneous Speech so Hard?		2005
52	Sentence Extraction-Based Presentation Summarization Techniques and Evaluation Metrics		2005
53	Sentence Extraction-Based Automatic Speech Summarization and Evaluation Techniques		2005
54	Toward Robust Multimodal Speech Recognition		2005
55	Speaker Adaptable Multilingual Synthesis		2005
56	Polyglot Synthesis Using a Mixture of Monolingual Corpora		2005
57	A Stream-Weight Optimization Method for Multi-Stream HMMs Based on Likelihood Value Normalization		2005
58	Improvement of Audio-Visual Speech Recognition in Cars		2004
59	A Stream-Weight Optimization Method for Audio-Visual Speech Recognition Using Multi-Stream HMMs		2004
60	Audio-Visual Speech Recognition Using New Lip Features Extracted from Side-Face Images		2004
61	Noise-Robust Speaker Verification Using F0 Features		2004
62	Unsupervised Class-Based Language Model Adaptation for Spontaneous Speech Recognition		2003
63	Unsupervised Language Model Adaptation Using Word Classes for Spontaneous Speech Recognition		2003
64	Noise Robust Speech Recognition Using Prosodic Information		2003
65	Audio-Visual Speech Recognition Using Lip Movement Extracted from Side-Face Images		2003
66	Audio-Visual Person Authentication Using Speech and Ear Images		2003
67	A Robust Multi-Modal Speech Recognition Method Using Optical-Flow Analysis		2002
68	Noise Robust Speech Recognition Using F0 Contour Extracted by Hough Transform		2002
69	Speech-Rate-Variable HMM-Based Japanese TTS System		2002
70	Parallel Computing-Based Architecture for Mixed-Initiative Spoken Dialogue		2002
71	Bimodal Speech Recognition Using Lip Movement Measured by Optical-Flow Analysis		2001
72	Ubiquitous Speech Processing		2001
73	Continuous Speech Recognition of Japanese Using Prosodic Word Boundaries Detected by Mora Transition Modeling of Fundamental Frequency Contours		2001
74	Detection of Prosodic Word Boundaries by Statistical Modeling of Mora Transitions of Fundamental Frequency Contours and Its Use for Continuous Speech Recognition		2000
75	Modeling and Generation of Accentual Phrase F0 Contours Based on Discrete HMMs Synchronized at Mora-unit Transitions		2000
76	Prosodic Word Boundary Detection Using Mora Transition Modeling of Fundamental Frequency Contours -Speaker Independent Experiments-		1999
77	Speaker-Independent Detection of Prosodic Word Boundary Using Mora Transition Modeling of Fundamental Frequency Contours		1999
78	Prosodic Word Boundary Detection Using Statistical Modeling of Moraic Fundamental Frequency Contours and Its Use for Continuous Speech Recognition		1999
79	Accent Type Recognition and Syntactic Boundary Detection of Japanese Using Statistical Modeling of Moraic Transitions of Fundamental Frequency Contours		1998
80	Representing Prosodic Words Using Statistical Models of Moraic Transition of Fundamental Frequency Contours of Japanese		1998
81	A Method of Representing Fundamental Frequency Contours of Japanese Using Statistical Models of Moraic Transition		1997
82	Detecting Phrase Boundaries by Low-Pass Filtering of Fundamental Frequency Contours		1997
83	Use of Prosodic Features in Speech Recognition		1996

Awards & Honors
No.	Publication date	Association	Prize	Subtitle
1	Nov. 2022	The 12th Symposium on Biometrics, Recognition and Authentication	Best Presentation Award	Analysis of Effects of Voice Mimicry Attack by Professional/Non-Professional Impersonators on Deep Learning Based Speaker Verification
2	Nov. 2018	The 8th Symposium on Biometrics, Recognition and Authentication	Best Presentation Award	Analysis of Effects of Voice Mimicry Attack by Professional/Non-Professional Impersonators on Speaker Verification

Research Grants & Projects
No.	Offer organization	System name	Title	Fund classification	Date
1			A Study on High-Performance Speaker Verification in Wearable Computing Environments	competitive_research_funding	2009 - NOW
2			A Study on High-Performance Speech Recognition Decoder Based on Weighted Finite-State Transducers	competitive_research_funding	2008 - NOW
3	Japan Society for the Promotion of Science	Grants-in-Aid for Scientific Research	Speech Recognition for Computer-Supported Conference Systems under Ubiquitous/Wearable Computing Environment		2000 - 2002

inspection-site