Watanabe Naoki

Graduate School of Science, Technology and Innovation / Department of Science, Technology and InnovationAssistant Professor

Researcher Information

Research Areas

  • Informatics / Biological, health, and medical informatics
Research activity information

Award

  • Oct. 2021 第1回神戸大学先端バイオ工学研究センター成果発表会, 最優秀ポスター賞, 酵素反応予測のための機械学習モデルの開発
    渡邉直暉, 村田昌浩, 山本昌輝, 荻野千秋, 近藤昭彦, 荒木通啓

Paper

  • Thien Vu, Yoshihiro Kokubo, Mai Inoue, Masaki Yamamoto, Attayeb Mohsen, Agustin Martin-Morales, Research Dawadi, Takao Inoue, Jie Ting Tay, Mari Yoshizaki, Naoki Watanabe, Yuki Kuriya, Chisa Matsumoto, Ahmed Arafa, Yoko M Nakao, Yuka Kato, Masayuki Teramoto, Michihiro Araki
    Abstract Background Coronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analyze complex, nonlinear, and high-dimensional datasets, uncovering novel predictors of CHD that go beyond the limitations of traditional models, which rely on predefined variables. Objective This study aims to evaluate the contribution of various risk factors to CHD, focusing on both established and novel markers using ML techniques. Methods The study recruited 7672 participants aged 30-84 years from Suita City, Japan, between 1989 and 1999. Over an average of 15 years, participants were monitored for cardiovascular events. A total of 7260 participants and 28 variables were included in the analysis after excluding individuals with missing outcome data and eliminating unnecessary variables. Five ML models—logistic regression, random forest (RF), support vector machine, Extreme Gradient Boosting, and Light Gradient-Boosting Machine—were applied for predicting CHD incidence. Model performance was evaluated using accuracy, sensitivity, specificity, precision, area under the curve, F1-score, calibration curves, observed-to-expected ratios, and decision curve analysis. Additionally, Shapley Additive Explanations (SHAPs) were used to interpret the prediction models and understand the contribution of various risk factors to CHD. Results Among 7260 participants, 305 (4.2%) were diagnosed with CHD. The RF model demonstrated the highest performance, with an accuracy of 0.73 (95% CI 0.64‐0.80), sensitivity of 0.74 (95% CI 0.62‐0.84), specificity of 0.72 (95% CI 0.61‐0.83), and an area under the curve of 0.73 (95% CI 0.65‐0.80). RF also showed excellent calibration, with predicted probabilities closely aligning with observed outcomes, and provided substantial net benefit across a range of risk thresholds, as demonstrated by decision curve analysis. SHAP analysis elucidated key predictors of CHD, including the intima-media thickness (IMT_cMax) of the common carotid artery, blood pressure, lipid profiles (non–high-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides), and estimated glomerular filtration rate. Novel risk factors identified as significant contributors to CHD risk included lower calcium levels, elevated white blood cell counts, and body fat percentage. Furthermore, a protective effect was observed in women, suggesting the potential necessity for gender-specific risk assessment strategies in future cardiovascular health evaluations. Conclusions We developed a model to predict CHD using ML and applied SHAP methods for interpretation. This approach highlights the multifactor nature of CHD risk evaluation, aiming to support health care professionals in identifying risk factors and formulating effective prevention strategies.
    JMIR Publications Inc., May 2025, JMIR Cardio, 9, e68066 - e68066
    Scientific journal

  • Research Dawadi, Thien Vu, Jie Ting Tay, Phap Tran Ngoc Hoang, Ai Oya, Masaki Yamamoto, Naoki Watanabe, Yuki Kuriya, Michihiro Araki
    Springer Science and Business Media LLC, May 2025, Discover Artificial Intelligence, 5(1) (1), English
    [Refereed]
    Scientific journal

  • Thien Vu, Research Dawadi, Masaki Yamamoto, Jie Ting Tay, Naoki Watanabe, Yuki Kuriya, Ai Oya, Phap Ngoc Hoang Tran, Michihiro Araki
    Springer Science and Business Media LLC, Feb. 2025, BMC Medical Informatics and Decision Making, 25(1) (1), English
    [Refereed]
    Scientific journal

  • Mari Yoshizaki, Yuki Kuriya, Masaki Yamamoto, Naoki Watanabe, Michihiro Araki
    Aims Health food products (HFPs) are foods and products related to maintaining and promoting health. HFPs may sometimes cause unforeseen adverse health effects by interacting with drugs. Considering the importance of information on the interactions between HFPs and drugs, this study aimed to establish a workflow to extract information on Drug‐HFP Interactions (DHIs) from open resources. Methods First, Information on drugs, enzymes, their interactions, and known DHIs was collected from multiple public databases and literature sources. Next, a network consisted of enzymes, HFP, and drugs was constructed, assuming enzymes as candidates for hubs in Drug‐HFP interactions (Method 1). Furthermore, we developed methods to analyze the biomedical context of each drug and HFP to predict potential DHIs out of the DHIs obtained in Method 1 by applying BioWordVec, a widely used biomedical terminology quantifier (Method 2‐1 and 2‐2). Results 44,965 DHIs (30% known) were identified in Method 1, including 38 metabolic enzymes, 157 HFPs, and 1256 drugs. Method 2‐1 selected 7401 DHIs (17% known) from the DHIs of Method 1, while Method 2‐2 chose 2819 DHIs (30% known). Based on the different assumptions in these methods where Method 2‐1 specifically selects HFPs interacting with specific enzymes and Method 2‐2 specifically selects HFPs with similar function with drugs, the propsed methods resulted in extracting a wide variety of DHIs. Conclusions By integrating the results of language processing techniques with those of the network analysis, a workflow to efficiently extract unknown and known DHIs was constructed.
    Wiley, Mar. 2024, British Journal of Clinical Pharmacology, 90(6) (6), 1514 - 1524, English
    [Refereed]
    Scientific journal

  • Naoki Watanabe, Masaki Yamamoto, Masahiro Murata, Yuki Kuriya, Michihiro Araki
    Abstract Motivation Enzymes are key targets to biosynthesize functional substances in metabolic engineering. Therefore, various machine learning models have been developed to predict Enzyme Commission numbers, one of the enzyme annotations. However, the previously reported models predict the sequences with numerous consecutive identical amino acids, which are found within unannotated sequences, as enzymes. Results Here, we propose EnzymeNet for prediction of complete Enzyme Commission numbers using Residual Neural Networks. EnzymeNet can exclude the exceptional sequences described above. Several EnzymeNet models were built and optimized to explore the best conditions for removing such sequences. As a result, the models exhibited higher prediction accuracy with Macro F1 score up to 0.850 than previously reported models. Moreover, even the enzyme sequences with low similarity to training data, which were difficult to predict using the reported models, could be predicted extensively using EnzymeNet models. The robustness of EnzymeNet models will lead to discover novel enzymes for biosynthesis of functional compounds using microorganisms. Availability The source code of EnzymeNet models is freely available at and https://github.com/nwatanbe/enzymenet. Supplementary information Supplementary data are available at Bioinformatics Advances online.
    Lead, Oxford University Press (OUP), Nov. 2023, Bioinformatics Advances, 3(1) (1), vbad173, English
    [Refereed]
    Scientific journal

  • Naoki Watanabe, Yuki Kuriya, Masahiro Murata, Masaki Yamamoto, Masayuki Shimizu, Michihiro Araki
    The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
    Lead, MDPI AG, May 2023, Biology, 12(6) (6), 795
    [Refereed]
    Scientific journal

  • Yuki Kuriya, Masahiro Murata, Masaki Yamamoto, Naoki Watanabe, Michihiro Araki
    Omics data was acquired, and the development and research of metabolic simulation and analysis methods using them were also actively carried out. However, it was a laborious task to acquire such data each time the medium composition, culture conditions, and target organism changed. Therefore, in this study, we aimed to extract and estimate important variables and necessary numbers for predicting metabolic flux distribution as the state of cell metabolism by flux sampling using a genome-scale metabolic model (GSM) and its analysis. Acetic acid production from glucose in Escherichia coli with GSM iJO1366 was used as a case study. Flux sampling obtained by OptGP using 1000 pattern constraints on substrate, product, and growth fluxes produced a wider sample than the default case. The analysis also suggested that the fluxes of iron ions, O2, CO2, and NH4+, were important for predicting the metabolic flux distribution. Additionally, the comparison with the literature value of 13C-MFA using CO2 emission flux as an example of an important flux suggested that the important flux obtained by this method was valid for the prediction of flux distribution. In this way, the method of this research was useful for extracting variables that were important for predicting flux distribution, and as a result, the possibility of contributing to the reduction of measurement variables in experiments was suggested.
    MDPI AG, May 2023, Bioengineering, 10(6) (6), 636
    [Refereed]
    Scientific journal

  • Naoki Watanabe, Masaki Yamamoto, Masahiro Murata, Christopher J. Vavricka, Chiaki Ogino, Akihiko Kondo, Michihiro Araki
    Lead, American Chemical Society (ACS), Sep. 2022, The Journal of Physical Chemistry B, 126(36) (36), 6762 - 6770
    [Refereed]
    Scientific journal

  • Christopher J. Vavricka, Shunsuke Takahashi, Naoki Watanabe, Musashi Takenaka, Mami Matsuda, Takanobu Yoshida, Ryo Suzuki, Hiromasa Kiyota, Jianyong Li, Hiromichi Minami, Jun Ishii, Kenji Tsuge, Michihiro Araki, Akihiko Kondo, Tomohisa Hasunuma
    Abstract Engineering the microbial production of secondary metabolites is limited by the known reactions of correctly annotated enzymes. Therefore, the machine learning discovery of specialized enzymes offers great potential to expand the range of biosynthesis pathways. Benzylisoquinoline alkaloid production is a model example of metabolic engineering with potential to revolutionize the paradigm of sustainable biomanufacturing. Existing bacterial studies utilize a norlaudanosoline pathway, whereas plants contain a more stable norcoclaurine pathway, which is exploited in yeast. However, committed aromatic precursors are still produced using microbial enzymes that remain elusive in plants, and additional downstream missing links remain hidden within highly duplicated plant gene families. In the current study, machine learning is applied to predict and select plant missing link enzymes from homologous candidate sequences. Metabolomics-based characterization of the selected sequences reveals potential aromatic acetaldehyde synthases and phenylpyruvate decarboxylases in reconstructed plant gene-only benzylisoquinoline alkaloid pathways from tyrosine. Synergistic application of the aryl acetaldehyde producing enzymes results in enhanced benzylisoquinoline alkaloid production through hybrid norcoclaurine and norlaudanosoline pathways.
    Springer Science and Business Media LLC, Mar. 2022, Nature Communications, 13, 1405
    [Refereed]
    Scientific journal

  • Naoki Watanabe, Masahiro Murata, Teppei Ogawa, Christopher J. Vavricka, Akihiko Kondo, Chiaki Ogino, Michihiro Araki
    Lead, American Chemical Society (ACS), Feb. 2020, Journal of Chemical Information and Modeling, 60(3) (3), 1833 - 1843, English
    [Refereed]
    Scientific journal

Books And Other Publications

  • AIとバイオの融合最前線
    厨祐喜, 渡邉直暉, 荒木通啓
    Contributor, スマートセルの効率的創製に向けた情報解析技術とAI活用, シーエムシー出版, May 2025, Japanese, ISBN: 9784781318677

  • 微生物を用いた有用物質生産技術の開発
    渡邉直暉, 荒木通啓
    Contributor, 計算科学を用いた酵素の探索・改良技術, 技術情報協会, Jun. 2024, Japanese, ISBN: 9784867980231

  • 微生物を活用した有用物質の製造技術
    渡邉直暉, 荒木通啓
    Contributor, スマートセルの効率的創製に向けた情報解析技術とAI活用, シーエムシー出版, May 2023, Japanese, ISBN: 9784781317403

Lectures, oral presentations, etc.

  • Development of enzyme annotation prediction model using residual neural networks for enzyme candidate selection
    Naoki Watanabe, Masaki Yamamoto, Masahiro Murata, Yuki Kuriya, Michihiro Araki
    1st Asia & Pacific Bioinformatics Joint Conference, Oct. 2024, English
    Poster presentation

  • Extraction of Different Protein Features Among Multiple Deep Learning Models for Protein Annotations
    Naoki Watanabe, Yuki Kuriya, Masahiro Murata, Masaki Yamamoto, Masayuki Shimizu, Michihiro Araki
    The 16th Asian Congress on Biotechnology, Oct. 2023, English, Biotechnology Center Of Ho Chi Minh City, Viet Nam, International conference
    Poster presentation

  • 広範な酵素反応予測のための機械学習モデルの開発
    渡邉直暉, 山本 昌輝, 村田昌浩, Vavricka Christopher J, 荻野千秋, 近藤昭彦, 荒木通啓
    第74回日本生物工学会大会, Oct. 2022

  • 酵素反応予測のための機械学習モデルの開発
    渡邉直暉, 山本 昌輝, 村田昌浩, 荻野千秋, 近藤昭彦, 荒木通啓
    第1回神戸大学先端バイオ工学研究センター成果発表会, Oct. 2021

  • 基質酵素反応予測のための機械学習に基づく予測モデルの構築方法の探索と評価
    渡邉直暉, 村田昌浩, 荻野千秋, 近藤昭彦, 荒木通啓
    第71回日本生物工学会大会, Sep. 2019

  • Machine Learning Prediction of EC Number of Enzymes
    Naoki Watanabe, Masahiro Murata, Chiaki Ogino, Akihiko Kondo, Michihiro Araki
    The 9th International Symposium of Innovative BioProduction Kobe, Nov. 2018, English
    Poster presentation

  • 機械学習を用いた酵素のEC number予測
    渡邉直暉, 村田昌浩, 荻野千秋, 近藤昭彦, 荒木通啓
    第70回日本生物工学会大会,, Sep. 2018
    Oral presentation