Published Papers |
No. | Title | Journal | Vol | No | Start Page | End Page | Publication date | DOI | Referee |
1 | Comparative Analysis of Voice Mimicry Attacks by High- and Low-Skilled Imitators on Speaker Verification Systems  | 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) | | | 1 | 6 | Dec. 3, 2024 | https://doi.org/10.1109/apsipaasc63619.2025.108486931 | Refereed |
2 | LDMSE: Low Computational Cost Generative Diffusion Model for Speech Enhancement  | 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) | | | 1 | 6 | Dec. 3, 2024 | https://doi.org/10.1109/apsipaasc63619.2025.108490511 | Refereed |
3 | Noise-Tolerant Time-Domain Speech Separation with Noise Bases | 2021 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) | | | 624 | 629 | Dec. 16, 2021 | | Refereed |
4 | Multimodal speech recognition using mouth images from depth camera  | 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) | | | 1233 | 1236 | Dec. 2017 | https://doi.org/10.1109/apsipa.2017.82822271 | Refereed |
5 | TokyoTech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task | CEUR Workshop Proceedings | 1739 | | | | 2016 | | |
6 | Error Correction Using Long Context Match for Smartphone Speech Recognition | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | E98D | 11 | 1932 | 1942 | Nov. 2015 | https://doi.org/10.1587/transinf.2015EDP71791 | Refereed |
7 | Error Correction Using Long Context Match for Smartphone Speech Recognition  | IEICE Transactions on Information and Systems | E98.D | 11 | 1932 | 1942 | 2015 | https://doi.org/10.1587/transinf.2015edp71791 | Refereed |
8 | AN EFFICIENT ERROR CORRECTION INTERFACE FOR SPEECH RECOGNITION ON MOBILE TOUCHSCREEN DEVICES | 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | | | 454 | 459 | 2014 | | Refereed |
9 | Simple Gesture-based Error Correction Interface for Smartphone Speech Recognition | 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | | | 1194 | 1198 | 2014 | | Refereed |
10 | Feature normalization based on non-extensive statistics for speech recognition  | Speech Communication | 55 | 5 | 587 | 599 | Jun. 2013 | https://doi.org/10.1016/j.specom.2013.02.0041 | Refereed |
11 | A noise-robust speech recognition approach incorporating normalized speech/non-speech likelihood into hypothesis scores  | SPEECH COMMUNICATION | 55 | 2 | 377 | 386 | Feb. 2013 | https://doi.org/10.1016/j.specom.2012.10.0011 | Refereed |
12 | Detection of overlapped speech using lapel microphones in meeting  | Speech Communication | 55 | 10 | 941 | 949 | 2013 | https://doi.org/10.1016/j.specom.2013.06.0131 | Refereed |
13 | Spectral subtraction based on non-extensive statistics for speech recognition  | IEICE Transactions on Information and Systems | E96-D | 8 | 1774 | 1782 | 2013 | https://doi.org/10.1587/transinf.E96.D.17741 | Refereed |
14 | Q-Gaussian based spectral subtraction for robust speech recognition | 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | | | 1254 | 1257 | 2012 | | Refereed |
15 | Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity | 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | | | 1498 | 1501 | 2012 | | Refereed |
16 | An efficient prosody adaptation method and its application to HMM-based speech synthesis | APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference | | | 82 | 85 | 2010 | | |
17 | VAD-measure-embedded Decoder with Online Model Adaptation | 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | | | 3122 | + | 2010 | | Refereed |
18 | GENERALIZATION OF SPECIALIZED ON-THE-FLY COMPOSITION | 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | | | 4317 | 4320 | 2009 | | Refereed |
19 | Optimization of On-the-Fly Composition for WFST-Based Speech Recognition Decoders | The IEICE Transactions on Information and Systems | Vol.J92-D | No.7 | 1026 | 1035 | 2009 | | Not refereed |
20 | Development of a WFST based speech recognition system for a resource deficient language using machine translation | APSIPA ASC 2009 - Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference | | | 50 | 56 | 2009 | | |
21 | Noise robust speech recognition using spectral subtraction and F 0 information extracted by Hough transform | APSIPA ASC 2009 - Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference | | | 631 | 634 | 2009 | | |
22 | Robust Speech Recognition Using VAD-measure-embedded Decoder | INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | | | 2203 | + | 2009 | | Refereed |
23 | Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance | COMPUTER SPEECH AND LANGUAGE | 22 | 2 | 171 | 184 | Apr. 2008 | https://doi.org/10.1016/j.csl.2007.07.0031 | Refereed |
24 | Evaluation of a noise-robust multi-stream speaker verification method using F(0) information | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | E91D | 3 | 549 | 557 | Mar. 2008 | https://doi.org/10.1093/ietisy/e9l-d.3.5491 | Refereed |
25 | Implementation and Evaluation of Fast On-the-fly WFST Composition Algorithms | INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | | | 2110 | 2113 | 2008 | | Refereed |
26 | Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | Vol.2008 | Article ID 573832 | 7 pages | | 2008 | https://doi.org/10.1155/2008/5738321 | Refereed |
27 | Thai Broadcast News Corpus Construction and Evaluation | SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 | | | 1249 | 1254 | 2008 | | Refereed |
28 | Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | Vol.2007 | Article ID 64506 | 9 pages | | 2007 | https://doi.org/10.1155/2007/645061 | Refereed |
29 | Combining Gaussian mixture model with Global Variance term to improve the quality of an HMM-based polyglot speech synthesizer | 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | | | 1241 | + | 2007 | | Refereed |
30 | The effect of spectral space reduction in spontaneous speech on recognition performances | 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | | | 473 | + | 2007 | | Refereed |
31 | Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition  | INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | | | 89 | 92 | 2007 | | Refereed |
32 | Presentation-Content Retrieval Integrated with the Speech Information | IEICE Transactions on Information and Systems | Vol.J90-D | No.2 | 209 | 222 | 2007 | | Not refereed |
33 | New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer | SPEECH COMMUNICATION | 48 | 10 | 1227 | 1242 | Oct. 2006 | https://doi.org/10.1016/j.specom.2006.05.0031 | Refereed |
34 | Sentence-extractive automatic speech summarization and evaluation techniques | SPEECH COMMUNICATION | 48 | 9 | 1151 | 1161 | Sep. 2006 | https://doi.org/10.1016/j.specom.2006.04.0051 | Refereed |
35 | A stream-weight and threshold estimation method using Adaboost for multi-stream speaker verification | 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS | | | 1081 | + | 2006 | | Refereed |
36 | A stream-weight and threshold estimation method using adaboost for multi-stream speaker verification | 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 | | | 5939 | 5942 | 2006 | | Refereed |
37 | A Weight Estimation Method Using LDA for Multi-Band Speech Recognition | INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | | | 2534 | 2537 | 2006 | | Refereed |
38 | Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese | SPEECH COMMUNICATION | 47 | 1-2 | 208 | 219 | Sep. 2005 | https://doi.org/10.1016/j.specom.2005.02.0101 | Refereed |
39 | Language model adaptation for resource deficient languages using translated data | 9th European Conference on Speech Communication and Technology | | | 1329 | 1332 | 2005 | | |
40 | Multimodal speaker verification using ear image features extracted by PCA and ICA | Lecture Notes in Computer Science | 3546 | | 588 | 596 | 2005 | https://doi.org/10.1007/11527923_611 | |
41 | Cross-language synthesis with a polyglot synthesizer | 9th European Conference on Speech Communication and Technology | | | 1477 | 1480 | 2005 | | |
42 | Cluster-based modeling for ubiquitous speech recognition | 9th European Conference on Speech Communication and Technology | | | 2865 | 2868 | 2005 | | |
43 | Polyglot synthesis using a mixture of monolingual corpora | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | I | | I1 | I4 | 2005 | https://doi.org/10.1109/ICASSP.2005.14150351 | Refereed |
44 | A stream-weight optimization method for multi-stream HMMS based on likelihood value normalization | 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 | | | 469 | 472 | 2005 | | Refereed |
45 | Sentence extraction-based presentation summarization techniques and evaluation metrics | 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 | | | 1065 | 1068 | 2005 | | Refereed |
46 | A ROBUST MULTIMODAL SPEECH RECOGNITION METHOD USING OPTICAL FLOW ANALYSIS | SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS | 28 | | 37 | 53 | 2005 | | Refereed |
47 | Why is the recognition of spontaneous speech so hard? | TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 3658 | | 9 | 22 | 2005 | | Refereed |
48 | Stream-weight optimization by LDA and adaboost for multi-stream speaker verification | 9th European Conference on Speech Communication and Technology | | | 2185 | 2188 | 2005 | | |
49 | Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances | 9th European Conference on Speech Communication and Technology | | | 3381 | 3384 | 2005 | | |
50 | Noise robust speech recognition using F-0 contour information | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | E87D | 5 | 1102 | 1109 | May. 2004 | | Refereed |
51 | Multi-modal speech recognition using optical-flow analysis for lip images | JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 36 | 2-3 | 117 | 124 | Feb. 2004 | | Refereed |
52 | Noise-robust speaker verification using F0 features | 8th International Conference on Spoken Language Processing, ICSLP 2004 | | | 1417 | 1420 | 2004 | | |
53 | A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMS | 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS | | | 857 | 860 | 2004 | | Refereed |
54 | Unsupervised class-based language model adaptation for spontaneous speech recognition | 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS | | | 236 | 239 | 2003 | | Refereed |
55 | Parallel computing-based architecture for mixed-initiative spoken dialogue | FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS | | | 53 | 58 | 2002 | | Refereed |
56 | Noise robust speech recognition using F0 contour extracted by Hough transform | 7th International Conference on Spoken Language Processing, ICSLP 2002 | | | 941 | 944 | 2002 | | |
57 | Ubiquitous speech processing | 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS | | | 13 | 16 | 2001 | | Refereed |
58 | Integration of Prosodic Word Boundary Detection to Unlimited-Vocabulary Speech Recognition | IEICE Transactions on Information and Systems | Vol.J83-D-II | No.10 | 1977 | 1985 | 2000 | | Not refereed |
59 | Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | 3 | | 1763 | 1766 | 2000 | https://doi.org/10.1109/ICASSP.2000.8620941 | Refereed |
60 | A Statistical Modeling of Fundamental Frequency Contours in Moraic Unit and Its Use for the Detection of Prosodic Word Boundaries | IPSJ Journal | Vol.40 | No.4 | 1356 | 1364 | 1999 | | Not refereed |
61 | Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | 1 | | 133 | 136 | 1999 | https://doi.org/10.1109/icassp.1999.7580801 | |
MISC |
No. | Title | Journal | Vol | No | Start Page | End Page | Publication date |
1 | Analysis of effects of voice mimicry on speaker verification and acoustic features of the imitated voices  | IEICE technical report. Speech | 114 | 411 | 43 | 48 | Jan. 22, 2015 |
2 | Error Correction Using Long Context Match for Smartphone Speech Recognition  | IEICE technical report. Speech | 114 | 365 | 117 | 122 | Dec. 15, 2014 |
3 | Error Correction Using Long Context Match for Smartphone Speech Recognition  | IPSJ SIG Notes | 2014 | 22 | 1 | 6 | Dec. 8, 2014 |
4 | Detecting Overlapped Speech in Meeting Recorded by Lapel Microphones  | | 2012 | 6 | 1 | 6 | Jul. 12, 2012 |
5 | Two-pass Approach for Recognizing Code-Switching Speech  | Technical report of IEICE. PRMU | 111 | 430 | 225 | 229 | Feb. 2, 2012 |
6 | Nonlinear Normalization Using q-Logarithm for Robust Speech Recognition  | IEICE technical report | 111 | 153 | 45 | 50 | Jul. 14, 2011 |
7 | Noise-robust speech recognition decoder using speech/non-speech confidence measures  | IEICE technical report | 110 | 81 | 49 | 54 | Jun. 10, 2010 |
8 | A Prosody Adaptation Method for HMM-based Speech Synthesis Achieving High Naturalness and Individurity  | | 2010 | 12 | 1 | 6 | Feb. 5, 2010 |
9 | A mean F_0 speaker adaptation method for regression model-based F_0 contour generation  | IEICE technical report | 109 | 99 | 87 | 92 | Jun. 17, 2009 |
10 | A study on prosody control for spontaneous speech synthesis  | | 2009 | 23 | 1 | 8 | May. 14, 2009 |
11 | Speeding up fundamental frequency information extraction by Hough transform for noise-robust speech recognition  | IEICE technical report | 108 | 422 | 19 | 24 | Jan. 22, 2009 |
12 | Improvements and evaluations of on-the-fly WFST composition in speech recognition  | IPSJ SIG Notes | 2008 | 102 | 29 | 34 | Oct. 17, 2008 |
13 | Accent analysis for Mandarin large vocabulary continuous speech recognition  | | 38 | 2 | 123 | 127 | Mar. 20, 2008 |
14 | Accent Analysis for Mandarin Large Vocabulary Continuous Speech Recognition  | IEICE technical report | 107 | 551 | 87 | 91 | Mar. 13, 2008 |
15 | Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment  | IEICE technical report | 107 | 551 | 93 | 98 | Mar. 13, 2008 |
16 | Speaker verification using multi-stream HMMs with dimensionally weighted feature vectors | IEICE technical report. Speech | 107 | 406 | 43 | 47 | Dec. 13, 2007 |
17 | A Study on Multimodal Speech Recognition for Spoken Dialogue Systems  | IEICE technical report | 107 | 77 | 19 | 24 | May. 24, 2007 |
18 | A Study on the Statistical Models for HMM-Based Spontaneous Speech Synthesis  | IEICE technical report | 107 | 77 | 13 | 18 | May. 24, 2007 |
19 | Using presentation slide information for lecture speech recognition  | IPSJ SIG Notes | 2006 | 136 | 221 | 226 | Dec. 22, 2006 |
20 | Using presentation slide information for lecture speech recognition | | 106 | 442 | 43 | 48 | Dec. 15, 2006 |
21 | The Analysis of Acoustic and Linguistic Characteristics in Spontaneous Japanese  | IEICE technical report | 106 | 78 | 19 | 24 | May. 19, 2006 |
22 | An LDA-based Weight Estimation Method for Multi-Band Speech Recognition  | IEICE technical report | 106 | 78 | 13 | 18 | May. 19, 2006 |
23 | HMM-based speaker adaptable polyglot synthesizer : Development and evaluation  | IEICE technical report | 105 | 494 | 127 | 132 | Dec. 22, 2005 |
24 | HMM-based speaker adaptable polyglot synthesizer : Development and evaluation  | IPSJ SIG Notes | 2005 | 127 | 217 | 222 | Dec. 22, 2005 |
25 | Spoken dialogue system robust against speech variations based on massively parallel computing  | IPSJ SIG Notes | 2005 | 127 | 91 | 96 | Dec. 22, 2005 |
26 | Spoken dialogue system robust against speech variations based on massively parallel computing  | IEICE technical report | 105 | 494 | 1 | 6 | Dec. 22, 2005 |
27 | A threshold optimization method based on Adaboost for multi-stream speaker verification  | IPSJ SIG Notes | 2005 | 127 | 1 | 6 | Dec. 21, 2005 |
28 | A threshold optimization method based on Adaboost for multi-stream speaker verification  | IEICE technical report | 105 | 495 | 1 | 6 | Dec. 21, 2005 |
29 | Sentence Extraction-Based Speech Summarization Methods and Objective Evaluation Techniques  | IEICE technical report. Speech | 105 | 132 | 1 | 6 | Jun. 16, 2005 |
30 | Language Model Adaptation for ASR Using Machine-Translated Data  | IEICE technical report. Speech | 105 | 132 | 19 | 23 | Jun. 16, 2005 |
31 | Toward realization of HMM-based spontaneous speech synthesis  | IEICE technical report. Speech | 105 | 98 | 25 | 30 | May. 20, 2005 |
32 | A study on automatic lecture segmentation for indexing purposes | | 2005 | 1 | 7 | 8 | Mar. 8, 2005 |
33 | Analysis of cepstral features of Japanese spontaneous speech using Mahalanobis distance | | 2005 | 1 | 231 | 232 | Mar. 8, 2005 |
34 | Addition of new languages to a polyglot HMM-based synthesizer | | 2005 | 1 | 197 | 198 | Mar. 8, 2005 |
35 | Evaluation of speech summarization techniques using objective metrics | | 2005 | 1 | 3 | 4 | Mar. 8, 2005 |
36 | A stream-weight optimization method for audio-visual speech recognition in real environments  | IPSJ SIG Notes | 2005 | 12 | 29 | 34 | Feb. 4, 2005 |
37 | A stream-weight optimization method based on boosting for multi-stream speaker verification  | IEICE technical report. Natural language understanding and models of communication | 104 | 539 | 85 | 90 | Dec. 21, 2004 |
38 | A stream - weight optimization method based on boosting for multi - stream speaker verification  | IPSJ SIG Notes | 2004 | 131 | 175 | 180 | Dec. 21, 2004 |
39 | Analysis of acoustic characteristics in sponaneous speech using Corpus of spontaneous Japanese  | IPSJ SIG Notes | 2004 | 103 | 7 | 12 | Oct. 22, 2004 |
40 | Use of F0 information for noise - robust speaker verification  | IPSJ SIG Notes | 2004 | 57 | 31 | 36 | May. 28, 2004 |
41 | Use of F_0 information for noise-robust speaker verification  | IEICE technical report. Speech | 104 | 87 | 1 | 6 | May. 21, 2004 |
42 | Investigation of a stream - weight optimization method for multi - modal speech recognition  | IPSJ SIG Notes | 2003 | 124 | 241 | 246 | Dec. 18, 2003 |
43 | Noise - robust speech recognition using band - dependent weighted likelihood  | IPSJ SIG Notes | 2003 | 124 | 19 | 24 | Dec. 18, 2003 |
44 | Investigation of a stream-weight optimization method of multi-modal speech recognition  | IEICE technical report. Natural language understanding and models of communication | 103 | 517 | 241 | 246 | Dec. 11, 2003 |
45 | Noise-robust speech recognition using band-dependent weighted likelihood  | IEICE technical report. Natural language understanding and models of communication | 103 | 517 | 19 | 24 | Dec. 11, 2003 |
46 | Multi-Modal Person Authentication Using Speech and Ear Images  | IEICE technical report. Speech | 103 | 94 | 25 | 30 | May. 30, 2003 |
47 | A Multi - Modal Speech Recognition Using Side - Face Images  | IPSJ SIG Notes | 2003 | 58 | 61 | 66 | May. 27, 2003 |
48 | Use of Prosodic Information for Noise - Robust Speech Recognition  | IPSJ SIG Notes | 2003 | 58 | 55 | 60 | May. 27, 2003 |
49 | A Rapid Listening System for Presentations Using Automatic Speech Summarization Techniques  | IPSJ SIG Notes | 2003 | 57 | 83 | 88 | May. 26, 2003 |
50 | Unsupervised batch-type topic adaptation for language models | | 2003 | 1 | 129 | 130 | Mar. 18, 2003 |
51 | Improving naturalness using residual exicitation for HMM-based speech synthesis | | 2003 | 1 | 241 | 242 | Mar. 18, 2003 |
52 | Improvement of visual features for multi-modal speech recognition | | 2003 | 1 | 195 | 196 | Mar. 18, 2003 |
53 | Multi-modal speaker verification using speech and face images | | 2003 | 1 | 107 | 108 | Mar. 18, 2003 |
54 | Multi-modal speaker verification using speech and ear images | | 2003 | 1 | 109 | 110 | Mar. 18, 2003 |
55 | Unsupervised batch - type adaptation method for language models  | IPSJ SIG Notes | 2002 | 121 | 183 | 188 | Dec. 16, 2002 |
56 | Unsupervised batch-type adaptation method for language models  | IEICE technical report. Natural language understanding and models of communication | 102 | 528 | 19 | 24 | Dec. 13, 2002 |
57 | Robust F_0 Extraction for Noisy Environments and Its Use for Speech Recognition  | Technical report of IEICE. EA | 102 | 33 | 37 | 42 | Apr. 19, 2002 |
58 | Parallel computing-based meeting speech recognition system with incremental on-line speaker adaptation | | 2002 | 1 | 105 | 106 | Mar. 18, 2002 |
59 | Evaluation of multi-modal speech recognition in real environments | | 2002 | 1 | 151 | 152 | Mar. 18, 2002 |
60 | A study on multi - modal speech recognition using optical - flow analysis  | | 2002 | 10 | 33 | 38 | Feb. 1, 2002 |
61 | A Study on F0 Contour Generation Factors Using Categorical Multiple Regression  | IPSJ SIG Notes | 2001 | 100 | 15 | 20 | Oct. 19, 2001 |
62 | Robust Pitch Extraction for Noisy Environments Using Hough Transformation  | IPSJ SIG Notes | 2001 | 100 | 9 | 14 | Oct. 19, 2001 |
63 | Pitch extraction using Hough transformation under noisy environments | | 2001 | 2 | 209 | 210 | Oct. 1, 2001 |
64 | A study on pitch contour generation factors using categorical multiple regression. | | 2001 | 2 | 221 | 222 | Oct. 1, 2001 |
65 | Multimodal speech recognition using optical-flow analysis | | 2001 | 2 | 27 | 28 | Oct. 1, 2001 |
66 | Meeting speech recognition system using parallel computing. | | 2001 | 2 | 113 | 114 | Oct. 1, 2001 |
67 | Development and Evaluation of a Spoken Dialog System Using Spontaneous Speech  | IPSJ SIG Notes | 2001 | 55 | 79 | 86 | Jun. 1, 2001 |
68 | Use of Prosodic Word Boundary Information for Unlimited-Vocabulary Speech Recognition  | IEICE technical report. Natural language understanding and models of communication | 99 | 524 | 73 | 78 | Dec. 21, 1999 |
69 | Use of Prosodic Word Boundary Information for Unlimited - Vocabulary Speech Recognition  | IPSJ SIG Notes | 1999 | 108 | 205 | 210 | Dec. 20, 1999 |
70 | Recognition of Family and Given Names of Unlimited Vocabulary Based on Prosodic Word Boundary Detection | | 1999 | 1 | 151 | 152 | Mar. 1, 1999 |
71 | Recognizing Accent Types and Detecting Prosodic Word Boundaries Using Statistical Models of Moraic Transition | IEICE technical report. Speech | 98 | 106 | 1 | 8 | Jun. 12, 1998 |
72 | Expression of Accent Phrases by Statistical Models of Moraic Transition | | 1998 | 1 | 153 | 154 | Mar. 1, 1998 |
73 | Improvements in Syntactic Boundary Detection by Statistical Models of Moraic Transition | | 1997 | 2 | 133 | 134 | Sep. 1, 1997 |
74 | Detecting Syntactic Boundaries Using Statistical Models of Moraic Transition | IEICE technical report. Speech | 97 | 114 | 33 | 40 | Jun. 19, 1997 |
Conference Activities & Talks |
No. | Title | Conference | Publication date | Promoter | Venue |
1 | Multimodal Speech Recognition and Analysis of Spontaneous Speech - Memories of Research in Furui Laboratory - | | Jan. 26, 2023 | | |
2 | Analysis of Effects of Voice Mimicry Attack by Professional/Non-Professional Impersonators on Deep Learning Based Speaker Verification | The 12th Symposium on Biometrics, Recognition and Authentication | Nov. 16, 2022 | | |
3 | Noise-robust time-domain speech separation with basis signals for noise | | Mar. 3, 2021 | | |
4 | Team Takoyaki submission for VoxCeleb Speaker Recognition Challenge 2020 | the VoxSRC Workshop 2020 | Oct. 2020 | | |
5 | A Kinect-based Multimodal Person Authentication System with User Existence Confirmation | 電子情報通信学会技術研究報告 | Mar. 11, 2018 | | |
6 | Multimodal speech recognition using mouth images from depth camera | Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017 | Feb. 5, 2018 | | |
7 | Neural network-based estimation of degree of feeling that natural objects appear in photographic images | 電子情報通信学会技術研究報告 | Sep. 28, 2017 | | |
8 | 口唇深度画像を利用したディープオートエンコーダに基づくマルチモーダル音声認識 | 日本音響学会研究発表会講演論文集(CD-ROM) | Sep. 11, 2017 | | |
9 | プロの物真似タレントの声真似が話者照合に与える影響と音響特徴の分析 | 電子情報通信学会技術研究報告 | Aug. 23, 2017 | | |
10 | 口唇の深度画像を用いたディープオートエンコーダによるマルチモーダル音声認識 | 情報処理学会研究報告(Web) | Jul. 20, 2017 | | |
11 | 日本語楽曲の旋律と歌詞のアクセントの関係分析のための自動対応付け | 情報処理学会全国大会講演論文集 | Mar. 16, 2017 | | |
12 | 話者照合におけるプロの物真似タレントの声真似攻撃の影響の分析 | 情報処理学会全国大会講演論文集 | Mar. 16, 2017 | | |
13 | 話者認識と顔画像認識を用いた映像におけるマルチモーダル人物同定 | 日本音響学会研究発表会講演論文集(CD-ROM) | Mar. 1, 2017 | | |
14 | Analysis of Voice Imitation by Professional/Non-Professional Impersonators Based on Kullback–Leibler Divergence between Acoustic Models | Joint Meeting of Acoustical Society of America and Acoustic Society of Japan | Nov. 2016 | | |
15 | 複数スマートフォンで収録された会話音声の対話グループ検出と話者決定の性能改善 | 電子情報通信学会技術研究報告 | Aug. 17, 2016 | | |
16 | Music retrieval based on time structure information of musical instruments and musical instrument activity detection using Deep Neural Network | 情報処理学会研究報告(Web) | May. 14, 2016 | | |
17 | TokyoTech at MediaEval 2016 Multimodal Person Discovery in Broadcast TV task | CEUR Workshop Proceedings | Jan. 1, 2016 | | |
18 | An efficient error correction interface for speech recognition on mobile touchscreen devices | 2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings | Apr. 1, 2014 | | |
19 | Simple gesture-based error correction interface for smartphone speech recognition | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Jan. 1, 2014 | | |
20 | Q-Gaussian based spectral subtraction for robust speech recognition | 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 | Dec. 1, 2012 | | |
21 | Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity | | 2012 | | |
22 | An Efficient Prosody Adaptation Method and Its Application to HMM-based Speech Synthesis | | 2010 | | |
23 | Generalization of Specialized On-the-fly Composition | | 2009 | | |
24 | Recent Development of WFST-Based Speech Recognition Decoder | | 2009 | | |
25 | Robust Speech Recognition Using VAD-Measure-Embedded Decoder | | 2009 | | |
26 | Noise Robust Speech Recognition Using Spectral Subtraction and F0 Information Extracted by Hough Transform | | 2009 | | |
27 | Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation | | 2009 | | |
28 | Development of a Speech Recognition System for Icelandic Using Machine Translated Text | | 2008 | | |
29 | Accent Analysis for Mandarin Large Vocabulary Continuous Speech Recognition | | 2008 | | |
30 | Thai Broadcast News Corpus Construction and Evaluation | | 2008 | | |
31 | Initial Evaluation of the Drivers' Japanese Speech Corpus in a Car Environment | | 2008 | | |
32 | Development of a Speech Recognition System Using a Sparse Training Corpus | | 2007 | | |
33 | Acoustic and Linguistic Characterization of Spontaneous Speech | | 2007 | | |
34 | Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition | | 2007 | | |
35 | The Effect of Spectral Space Reduction in Spontaneous Speech on Recognition Performances | | 2007 | | |
36 | Combining Gaussian Mixture Model with Global Variance Term to Improve the Quality of an HMM-Based Polyglot Speech Synthesizer | | 2007 | | |
37 | New Approach to Polyglot Synthesis: How to Speak Any Language with Anyone's Voice | | 2006 | | |
38 | Acoustic and Linguistic Characterization of Spontaneous Speech | | 2006 | | |
39 | Progress on a Speaker Adaptable Polyglot Synthesizer | | 2006 | | |
40 | A Stream-Weight and Threshold Estimation Method Using Adaboost for Multi-Stream Speaker Verification | | 2006 | | |
41 | A Large Vocabulary Continuous Speech Recognition System for Indonesian Language | | 2006 | | |
42 | A Weight Estimation Method Using LDA for Multi-Band Speech Recognition | | 2006 | | |
43 | Why is Automatic Recognition of Spontaneous Speech So Difficult? | | 2006 | | |
44 | Multimodal Speaker Verification Using Ear Image Features Extracted by PCA and ICA | | 2005 | | |
45 | Language Model Adaptation for Resource Deficient Language Using Translated Data | | 2005 | | |
46 | Cross-Language Synthesis with a Polyglot Synthesizer | | 2005 | | |
47 | Stream-Weight Optimization by LDA and Adaboost for Multi-Stream Speaker Verification | | 2005 | | |
48 | Cluster-Based Modeling for Ubiquitous Speech Recognition | | 2005 | | |
49 | Analysis of Spectral Space Reduction in Spontaneous Speech and Its Effects on Speech Recognition Performance | | 2005 | | |
50 | Why Is the Recognition of Spontaneous Speech so Hard? | | 2005 | | |
51 | Sentence Extraction-Based Presentation Summarization Techniques and Evaluation Metrics | | 2005 | | |
52 | Sentence Extraction-Based Automatic Speech Summarization and Evaluation Techniques | | 2005 | | |
53 | Toward Robust Multimodal Speech Recognition | | 2005 | | |
54 | Speaker Adaptable Multilingual Synthesis | | 2005 | | |
55 | Polyglot Synthesis Using a Mixture of Monolingual Corpora | | 2005 | | |
56 | A Stream-Weight Optimization Method for Multi-Stream HMMs Based on Likelihood Value Normalization | | 2005 | | |
57 | Improvement of Audio-Visual Speech Recognition in Cars | | 2004 | | |
58 | A Stream-Weight Optimization Method for Audio-Visual Speech Recognition Using Multi-Stream HMMs | | 2004 | | |
59 | Audio-Visual Speech Recognition Using New Lip Features Extracted from Side-Face Images | | 2004 | | |
60 | Noise-Robust Speaker Verification Using F0 Features | | 2004 | | |
61 | Unsupervised Class-Based Language Model Adaptation for Spontaneous Speech Recognition | | 2003 | | |
62 | Unsupervised Language Model Adaptation Using Word Classes for Spontaneous Speech Recognition | | 2003 | | |
63 | Noise Robust Speech Recognition Using Prosodic Information | | 2003 | | |
64 | Audio-Visual Speech Recognition Using Lip Movement Extracted from Side-Face Images | | 2003 | | |
65 | Audio-Visual Person Authentication Using Speech and Ear Images | | 2003 | | |
66 | A Robust Multi-Modal Speech Recognition Method Using Optical-Flow Analysis | | 2002 | | |
67 | Noise Robust Speech Recognition Using F0 Contour Extracted by Hough Transform | | 2002 | | |
68 | Speech-Rate-Variable HMM-Based Japanese TTS System | | 2002 | | |
69 | Parallel Computing-Based Architecture for Mixed-Initiative Spoken Dialogue | | 2002 | | |
70 | Bimodal Speech Recognition Using Lip Movement Measured by Optical-Flow Analysis | | 2001 | | |
71 | Ubiquitous Speech Processing | | 2001 | | |
72 | Continuous Speech Recognition of Japanese Using Prosodic Word Boundaries Detected by Mora Transition Modeling of Fundamental Frequency Contours | | 2001 | | |
73 | Detection of Prosodic Word Boundaries by Statistical Modeling of Mora Transitions of Fundamental Frequency Contours and Its Use for Continuous Speech Recognition | | 2000 | | |
74 | Modeling and Generation of Accentual Phrase F0 Contours Based on Discrete HMMs Synchronized at Mora-unit Transitions | | 2000 | | |
75 | Prosodic Word Boundary Detection Using Mora Transition Modeling of Fundamental Frequency Contours -Speaker Independent Experiments- | | 1999 | | |
76 | Speaker-Independent Detection of Prosodic Word Boundary Using Mora Transition Modeling of Fundamental Frequency Contours | | 1999 | | |
77 | Prosodic Word Boundary Detection Using Statistical Modeling of Moraic Fundamental Frequency Contours and Its Use for Continuous Speech Recognition | | 1999 | | |
78 | Accent Type Recognition and Syntactic Boundary Detection of Japanese Using Statistical Modeling of Moraic Transitions of Fundamental Frequency Contours | | 1998 | | |
79 | Representing Prosodic Words Using Statistical Models of Moraic Transition of Fundamental Frequency Contours of Japanese | | 1998 | | |
80 | Detecting Phrase Boundaries by Low-Pass Filtering of Fundamental Frequency Contours | | 1997 | | |
81 | A Method of Representing Fundamental Frequency Contours of Japanese Using Statistical Models of Moraic Transition | | 1997 | | |
82 | Use of Prosodic Features in Speech Recognition | | 1996 | | |