Directory of Researchers

YOKOKAWA Mitsuo
Organization for Advanced and Integrated Research
Professor
Engineering / Other Field
Last Updated :2022/04/15

Researcher Profile and Settings

Affiliation

  • <Faculty / Graduate School / Others>

    Organization for Advanced and Integrated Research
  • <Related Faculty / Graduate School / Others>

    Graduate School of System Informatics / Department of Computational Science, Center for Computational Social Science (CCSS), Education Center on Computational Science and Engineering

Teaching

  • Graduate School of System Informatics / Department of Computational Science, 2021, Large-scale Simulations A1
  • Graduate School of System Informatics / Department of Computational Science, 2021, Large-scale Simulations A2
  • Graduate School of System Informatics / Department of Computational Science, 2021, Advanced Course on Large-Scale Parallel Processing Technologies
  • Graduate School of System Informatics, 2021, High Performance Computing
  • Graduate School of System Informatics, 2021, Advanced Science and Technology B
  • Graduate School of System Informatics / Department of Computational Science, 2021, Fundamental Theory of Computational Science
  • Graduate School of System Informatics / Department of Computational Science, 2021, Applied Large-scale Simulation
  • Faculty of Engineering / Department of Computer Science and Systems Engineering, 2021, Introduction to Computer Science
  • Faculty of Engineering / Department of Computer Science and Systems Engineering, 2021, Parallel Computations

Research Activities

Research Interests

  • High performance computing, Parallel argorithm

Research Areas

  • Informatics / High-performance computing

Awards

  • Jun. 2014 The Institute of Electronics, Information and Communication Engineers, Achievement Award 2013, Research and Development of the K Computer

    SHOJI Fumiyoshi, KUSANO Yoshihiro, YOKOKAWA Mitsuo

    分散メモリ型並列スーパコンピュータ「京」は,科学技術計算を高速に実行する技術HPC-ACE,Tofuと呼ぶ直接結合ネットワーク,電力性能の良い実装方式及び冷却方式など,新しい技術を駆使したCPU数88,128個(705,024コア)からなる大規模計算機システムである.平成18年度から平成24年度にかけて独立行政法人理化学研究所と富士通株式会社によって開発,製作が進められ,平成23年10月には目標性能LINPACK10.51ペタフロップスを達成し,TOP500において世界一と認められた.またHPCチャレンジでは,HPL9.796TFLOPS,GlobalRandomAccess 472GUPS,EP Stream3,857TByte/s,Global FFT 205.9TFLOPSを達成し,それぞれ世界一の性能を達成した.当初の予定どおり平成24年6月

    Japan society

  • Feb. 2011 The Japan Society of Fluid Mechanics, Award for Outstanding Paper in Fluid Mechanics, Energy dissipation rate and energy spectrum in high resolution direct numerical simulations of turbulence in a periodic box

    Kaneda Yukio, Ishihara Takashi, Yokokawa Mitsuo, Itakura Ken'ichi, Uno Atsuya

    著者らは地球シミュレータを駆使して一様等方性乱流の大規模DNS(40963)を実行し,実験的にも達成しにくい高レイノルズ数の乱流状態を数値的に実現することで,十分な慣性小領域を含む乱流データベースを構築した.そして,エネルギー散逸率が高レイノルズ数極限で粘性率に依存しないこと,慣性小領域におけるエネルギースペクトルがコルモゴロフの-5/3乗則から有意にずれることなどの基本的知見を明確に示している.また,著者らの数値計算は,地球シミュレータ上でスペクトル法によって16.4 TFlopsの高速計算を実現したことが評価されて,2002年ゴードンベルの特別賞を授与されている.さらに,著者らのグループは得られた乱流データベースを詳細かつ慎重に解析することで,乱流統計量に関する数多くの知見をもたらし,2009年のAnnu. Rev. Fluid Mech. にそ

    Japan society

  • Nov. 2011 Association for Computing Machinery, ACM Gordon Bell Prize, First principles calculation of electronic states of a silicon nanowire with 100,000 atoms on the K computer

    Y. Hasegawa, J. Iwata, M. Tsuji, D. Takahashi, A. Oshiyama, K. Minami, T. Boku, F. Shoji, A. Uno, M. Kurokawa, H. Inoue, I. Miyoshi, and M. Yokokawa

  • Nov. 2005 SC|05 International Conference for High Performance Computing Networking and StorageACM/IEEE, Best Technical Paper Award, Full Electron Calculation Beyond 20,000 Atoms: Ground Electronic State of Photosynthetic Proteins,

    T. Ikegami, T. Ishida, D.G. Fedorov, K. Kitaura, Y. Inadomi, H. Umeda, M. Yokokawa, S. Sekiguchi

  • Nov. 2002 Association for Computing Machinery, ACM Gordon Bell Prize (Special), 16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator

    M. Yokokawa, K. Itakura, A. Uno, T. Ishihara, Y. Kaneda

  • May 2004 Information Processing Society of Japan, IPSJ Industrial Achievement Award, 大規模並列ベクトル計算機・地球シミュレータの開発

    佐藤哲也, 北脇重宗, 横川三津夫, 平野哲, 松本寛

  • Nov. 2002 Satoru Shingu, Yoshinori Tsuda, Wataru Ohfuchi, Kiyoshi Otsuka, Earth Simulator Center, Japan Marine Science and Technology Center; Hiroshi Takahara, Takashi Hagiwara, Shin-ichi Habata, NEC Corporation; Hiromitsu Fuchigami, Masayuki Yamada, Yuji Sasaki, Kazuo Kobayashi, NEC Informatec Systems; Mitsuo Yokokawa, National Institute of Advanced Industrial Science and Technology; Hiroyuki Itoh, ACM Gordon Bell Prize (Special award for language), 14.9 Tflop/s Three-dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator

    Hitoshi Sakagami, Hitoshi Murai, Yoshiki Seo, Mitsuo Yokokawa

  • Nov. 2002 Association for Computing Machinery, ACM Gordon Bell Prize (Peak performance), A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator

    Satoru Shingu, Yoshinori Tsuda, Wataru Ohfuchi, Kiyoshi Otsuka, Hiroshi Takahara, Takashi Hagiwara, Shin-ichi Habata, Hiromitsu Fuchigami, Masayuki Yamada, Yuji Sasaki, Kazuo Kobayashi, Mitsuo Yokokawa, Hiroyuki Itoh

Published Papers

  • Yujiro Takenaka, Mitsuo Yokokawa, Takashi Ishihara, Kazuhiko Komatsu, Hiroaki Kobayashi

    Springer International Publishing, 2021, Sustained Simulation Performance 2019 and 2020, 51 - 59

    In book

  • Takashi Ishihara, Yukio Kaneda, Koji Morishita, Mitsuo Yokokawa, Atsuya Uno

    American Physical Society (APS), 27 Oct. 2020, Physical Review Fluids, 5 (10), English

    [Refereed]

    Scientific journal

  • Mitsuo Yokokawa, Ayano Nakai, Kazuhiko Komatsu, Yuta Watanabe, Yasuhisa Masaoka, Yoko Isobe, Hiroaki Kobayashi

    Lead, IEEE, May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), English

    [Refereed]

    International conference proceedings

  • Takamasa Hasama, Toshihide Saka, Yoshiaki Itoh, Koji Kondo, Manabu Yamamoto, Tetsuro Tamura, Mitsuo Yokokawa

    To evaluate the aerodynamic instability for buildings considering their high-order oscillation mode and torsional oscillation mode on which it is difficult to perform the usual wind tunnel experiments, the fluid-structure interaction (FSI) analysis code combined with a multi-degree-of-freedom structure model and large-eddy simulation was developed. The calculation results obtained in this study and the results of wind tunnel tests conducted in a previous study using a building model (average breadth 0.14 m x length 0.28 m x height 1 m; scale: 1/600) with a multi-degree-of-freedom structure were compared. The comparison showed that the FSI analysis results were corresponded well with those of the wind tunnel tests; for each spectrum of response displacement (wind, along-wind, and torsional direction) at the top mass node, the frequency distribution and the peak frequency of the first and second oscillation modes were in good agreement with the calculation results and the wind tunnel test results. As for the amplitude of the top displacement of the building, the results for both the along- and across-wind directions showed good correspondence. For the torsional direction, the calculation results reproduced a torsional flutter oscillation with a slightly low wind level in comparison with that observed in the wind tunnel experiments.

    Elsevier {BV}, Feb. 2020, Journal of Wind Engineering and Industrial Aerodynamics, 197, 104052 - 104052, English

    [Refereed]

    Scientific journal

  • Mitsuo Yokokawa, Koji Morishita, Takashi Ishihara, Atsuya Uno, Yukio Kaneda

    Springer International Publishing, 2019, ICCS 2019. Lecture Notes in Computer Science, 11539, 587 - 595

    [Refereed]

  • 建物の地震動応答シミュレーションに現れる前処理付き共役勾配法の並列化

    GOTO Kei, YOKOKAWA MITSUO, SAKA Toshihide

    地震の多い我が国では建築物に対する耐震性の要求が高い.建築物の耐震性を調べるために,建物・地盤の地震動応答を求める数値シミュレーションが行われている.本研究で扱うシミュレーションコードは建物と地盤を3 次元有限要素法で離散化し,各節点について立てられた運動方程式に平均加速度法を適用したものである.そうして得られた連立一次方程式を前処理付き共役勾配法の一種であるPSCCG 法で解いている.本研究では,本シミュレーションの実行時間の大部分を占めるPSCCG 法のプロセス並列化を行った.スレッド数を1 に固定しプロセス数を増やしながら,プロセス並列の実装評価を行った.その結果2 プロセスでの実行時間を基準として,8 プロセスでは最大で3.2 倍の速度向上が確認できた.

    Information Processing Society of Japan, Dec. 2018, 情報処理学会第167回ハイパフォーマンスコンピューティング研究会, 2018-HPC-167 (28), 1 - 5, Japanese

    Symposium

  • 緩和型スーパーノードマルチフロンタル法の最適な緩和パラメータについて

    NAKANO Tomoki, YOKOKAWA MITSUO, FUKAYA Takeshi, YAMAMOTO Yusaku

    数値シミュレーションにおける多くの問題は,偏微分方程式を離散化して得られる連立一次方程式を解く問題に帰着される.そして,多くの場合,連立一次方程式を解く時間は全体のシミュレーション時間の大部分を占める.よって,連立一次方程式を高速に解くことは非常に重要である.本研究では,正定値対称行列に適用できるコレスキー分解を扱う.疎行列に対して,コレスキー分解を行う手法はいくつかあるが,本稿では,緩和型スーパーノードマルチフロンタル法を用いた.同手法では,2 つのスーパーノードを融合する際に非零と見なす零要素数の上限である緩和パラメータが性能に大きな影響を与える。そこで,このパラメータの最適値を求めることを目的として,Intel Xeon (Ivy Bridge-EX) とIntel Xeon Phi(Knights Landing, KNL) のそれぞれ1 コ

    Information Processing Society of Japan, Dec. 2018, 情報処理学会第167回ハイパフォーマンスコンピューティング研究会, 2018-HPC-167 (28), 1 - 8, Japanese

    Symposium

  • 神戸から配信する遠隔インタラクティブ講義「計算生命科学の基礎」の2017年度報告

    渡邉博文, 鈴木洋介, 八木 学, 石野麻由子, 土井陽子, 江口至洋, TANAKA SHIGENORI, TSURUTA HIROKI, 白井剛, MORI ICHIRO, USUI HIDEYUKI, YOKOKAWA MITSUO

    計算生命科学は,生命の理解に向けて,近年急速に進展している計算科学と医農工理学分野が融合した学際的研究領域である.様々な研究分野や産業界等への研究の拡がりが期待されており,包括的な基礎知識を習得する機会が求められている.神戸大学計算科学教育センターは,関係諸機関と協力して,遠隔インタラクティブ講義「計算生命科学の基礎」シリーズを2014年から全国に配信を開始し,昨年度は600名の受講登録を受け付けた.本稿では,2017年度に実施した「計算生命科学の基礎IV」と,最近注目されているAIやディープラーニングに焦点を当て特別編として実施したディープラーニングチュートリアルの開催結果について報告する.年々受講者が増え続けており,アンケートでも高評価を得ている..

    Nov. 2018, 大学ICT推進協議会2018年度年次大会論文集, 1 - 4, Japanese

    Symposium

  • 乱流DNSにおける種々の時間積分スキームの評価

    MATSUZAKI Tsuguo, OKAMOTO Naoya+G, YOKOKAWA MITSUO, KANEDA Yukio

    フーリエ・スペクトル法による一様等方性乱流の直接数値シミュレーション(DNS) では,その計算時間の大部分が,3 次元離散フーリエ変換(3D-FFT)で占められている,並列化による高性能計算が期待されるが,並列化3D-FFT の全対全通信は,近年のスーパーコンピュータのネットワークトポロジには適しておらず,計算時間の多くが通信時間となってしまい,乱流の普遍的統計法則を解明するために必要な高レイノルズ数での大規模乱流DNS が事実上困難となっている.本研究では,DNS で良く用いられる時間積分スキーム(4 次ルンゲ・クッタ法)の代わりとなりうる種々の時間積分スキームを用いることで,3D-FFT の適用回数自体を減らし,乱流DNS の計算時間を短縮することを目的とする.本論文では,それらのスキームによって得られる乱流場の統計量を評価することで,スキームの

    Information Processing Society of Japan, Sep. 2018, 情報処理学会第166回ハイパフォーマンスコンピューティング研究会, 2018-HPC-166 (8), 1 - 7, Japanese

    Symposium

  • メニーコアプロセッサにおける多軸分割を用いた3次元FFTの性能評価

    AOKI Masaaki, IMAMURA Toshiyuki, YOKOKAWA MITSUO, HIROTA Yusuke

    3次元高速フーリエ変換(FFT)の並列処理に対し,通信コストのモデルから,1 軸分割,2 軸分割および通信方法の異なる2 種類の3 軸分割について通信コストを求め,どの分割方法が最適となるかを求めた.また,分割方法の異なる3 次元FFT の京コンピュータとOakforest-PACS を用いてその性能を評価した.

    Information Processing Society of Japan, Mar. 2018, 情報処理学会第162回ハイパフォーマンスコンピューティング研究会, 2017-HPC-163 (29), 1 - 7, Japanese

    Symposium

  • Kenya Yamada, Takahiro Katagiri, Hiroyuki Takizawa, Minami Kazuo, Mitsuo Yokokawa, Toru Nagai, Masao Ogino

    In numerical libraries for sparse matrix operations, there are many tuning parameters related to implementation selection. Selection of different tuning parameters could result in totally different performance. Moreover, optimal implementation depends on the sparse matrices to be operated. It is difficult to find optimal implementation without executing each implementation and thereby examining its performance on a given sparse matrix. In this study, we propose an implementation selection method for sparse iterative algorithms and preconditioners in a numerical library using deep learning. The proposed method uses full color images to represent the features of a sparse matrix. We present an image generation method for partitioning a given matrix (to generate its feature image) so that the value of each matrix element is considered in the implementation selection. We then evaluate the effectiveness of the proposed method by conducting a numerical experiment. In this experiment, the accuracy of implementation selection is evaluated. The training data comprise a pair of sparse matrix and its optimal implementation. The optimal implementation of each sparse matrix in the training data is obtained in advance by executing every implementation and getting the best one. The experimental results obtained using the proposed method show that the accuracy of selecting the optimal implementation of each sparse matrix is 79.5%.

    IEEE, 2018, 2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2018), 2018, 257 - 262, English

    [Refereed]

    International conference proceedings

  • Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

    Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, Hiroaki Kobayashi

    A new SX-Aurora TSUBASA vector supercomputer has been released, and it features a new system architecture and a new execution model to achieve high sustained performance, especially for memory-intensive applications. In SX-Aurora TSUBASA, the vector host (VH) of a standard x86 Linux node is attached to the vector engine (VE) of the newly developed vector processor. An application is executed on the VE, and only system calls are offloaded to the VH. This new execution model can avoid redundant data transfers between the VH and VE that can easily become a bottleneck in the conventional execution model. This paper examines the potential of SX-Aurora TSUBASA. First, the basic performance is clarified by evaluating benchmark programs. Then, the effectiveness of the new execution model is examined by using a microbenchmark. Finally, the potential of SX-Aurora TSUBASA is clarified through evaluations of practical applications.

    ASSOC COMPUTING MACHINERY, 2018, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018, 1 - 12, English

    [Refereed]

    International conference proceedings

  • Fluid-Structure Interaction Analysis for Building using Multi-Degree of Freedom Structure Model

    HASAMA Takamasa, SAKA Toshihide, ITOU Yoshiakip, KONDO Koji, YAMAMOTO Manabu, TAMURA Tetsuro, YOKOKAWA Mitsuo

    In order to evaluate the high-order oscillation mode and the torsional oscillation mode of buildings on which it is difficultto perform normal wind tunnel experiments, the authors have developed an FSI analysis code combined with a ultidegree-of-freedom model and LES and have compared the FSI analysis results and those of wind tunnel tests conducted in a previous study using a

    日本流体力学会, Dec. 2017, 第31回数値流体力学シンポジウム講演論文集, 2017, 1 - 5, Japanese

    Symposium

  • One-way dissectionオーダリングによる連立一次方程式の直接解法の並列化

    NAKANO Tomoki, YOKOKAWA MITSUO, FUKAYA Takeshi, YAMAMOTO Yusaku

    時間依存シュレディンガー方程式のモデル問題として2次元ポアソン方程式を取り上げ,one-way dissection オーダリングによる正定値対称疎行列を係数行列にもつ連立一次方程式の並列直接解法に対して,いくつかの疎行列格納方式を用いた場合の性能評価結果について述べる.また,新しいスカイライン格納方式を提案し,その有効性を確認した.

    Information Processing Society of Japan, Dec. 2017, 情報処理学会第162回ハイパフォーマンスコンピューティング研究会, 2017-HPC-162 (19), 1 - 10, Japanese

    Symposium

  • FFTカーネルを用いたKNLでのスケーラビリティに関する調査

    AOKI Masaaki, HIROTA Yusuke, IMAMURA Toshiyuki, YOKOKAWA MITSUO

    近年注目されている,Intel Xeon Phi の第 2 世代 Knights Landing(KNL) 上での 3 次元FFT 応用を視野に入れて,STREAM ベンチマークや FFTE の C 言語移植版,FFTW などの基本カーネルコードを使用し KNL のメモリ周りの性能測定をまず調査する.更に,KNL 上での 3D-FFT 処理の並列性能評価(特にスケーラビリティ)に関する性能分析を行った.

    Information Processing Society of Japan, Sep. 2017, 情報処理学会第161回ハイパフォーマンスコンピューティング研究会, 2017-HPC-161 (16), 1 - 7, Japanese

    Symposium

  • 色数を抑えた改良Reverse Cuthill-McKee法による線形ソルバの並列化について

    HYOTANI Kentaro, YOKOKAWA MITSUO

    rereverse Cuthill-McKee 法(RCM 法)に焦点をあて,並列化のオーバーヘッドを低減させる改良案として,一番最初の色付けを複数のノードから開始する複数初期点 RCM 法(MIP-RCM 法)を提案した.対称ガウス・ザイデル前処理付共役残差法に対して,MIP-RCM 法と従来の RCM 法による計算時間の比較を行った結果,8 つの問題において MIP-RCM 法の計算時間が短かった.特に,thermal2 の問題の thread 並列計算における反復計算時間の比較では,RCM 法の最も短い 2threads の計算時間に対し, MIP-RCM 法は 12threads で約 2.93 倍の時間短縮が達成され,提案手法の有効性が確認できた.

    Information Processing Society of Japan, Apr. 2017, 情報処理学会第159回ハイパフォーマンスコンピューティング研究会, 2017-HPC-159 (3), 1 - 6, Japanese

    Symposium

  • 建屋間ネットワークのデータ転送性能評価

    UNO Atsuya, IWAMOTO Mitsuo, YAGI Manabu, YOKOKAWA MITSUO

    近年,HPC システムの大規模化にともない,シミュレーション結果も膨大な量となっている.この膨大な計算結果を効率よく分析するための手段として,可視化等が用いられることが多く,可視化専用のハードウェアを搭載したシステムを利用することがよくある.この場合,シミュレーションを行ったシステムとのデータ連携が必要となる.これらのサーバが同一のサイトに設置されている場合は,ストレージ共有で対応できるが,異なるサイトに設置されているシステムを利用する場合には,ネットワーク経由でデータの転送を行うことになり,高速なデータ転送が求められる.今回,スーパーコンピュータ「京」と隣接する神戸大学計算科学研究センターに設置された可視化用計算サーバ「π-VizStudio」を直接ネットワークで接続し,データ転送性能評価を行ったので報告する

    Information Processing Society of Japan, Mar. 2017, 情報処理学会第158回ハイパフォーマンスコンピューティング研究会, 2017-HPC-158 (14), 1 - 6, Japanese

    Symposium

  • 神戸から配信する遠隔インタラクティブ講義「計算生命科学の基礎」

    渡邉博文, 鈴木洋介, 近藤洋隆, 石野麻由子, 土井陽子, 江口至洋, TANAKA SHIGENORI, TSURUTA HIROKI, 白井剛, MORI ICHIRO, USUI HIDEYUKI, YOKOKAWA MITSUO

    2017, 大学ICT推進協議会2017年度年次大会論文集, TF1-2, 1 - 5, Japanese

    Symposium

  • Parareal法と領域分割法による拡散問題での時空間並列性能評価

    IMAMURA Seigo, ONO Kenji, IIZUKA Mikio, YOKOKAWA MITSUO

    CPU のメニーコア化などによってスーパーコンピュータの並列度が増加し,空間方向の領域分割だけでは並列化要素が不十分になりつつある.そのため,時間並列計算法が近年注目されている.本研究では,その有用な解法の一つである Parareal 法を放物型偏微分方程式の一例として拡散問題に適用し,並列加速率の挙動の調査と考察を行った.京コンピュータでの評価では,時間方向の並列数が 100 の場合, Speedup モデルから推定される最大 Speedup 性能 14 倍に対して 8.5 倍の実行性能を得られた.また,領域分割法による空間並列と Parareal 法の時間並列の組み合わせによる大規模並列性能評価を行った.64空間並列を 64 ノードで並列した場合,逐次計算に対して 13.5 倍の速度向上が得られ,さらに Parareal 法による 100 並列を

    Information Processing Society of Japan, Dec. 2016, The 157th IPSJ SIGHPC Meeting, 2016-HPC-152 (19), 1 - 7, Japanese

    Symposium

  • OHICHI Tomomi, TERAI Masaaki, YOKOKAWA Mitsuo

    We developed an Eclipse plug-in tool named STView for visualizing the program structures of Fortran source codes to help improve the performance of the program on a supercomputer. To create a tree that represents program structures, STView uses an abstract syntax tree (AST) generated by Photran and filters the tree because the AST has many nontrivial nodes for tokens. While ord

    ACM, Nov. 2016, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), 1 - 2, Japanese

    [Refereed]

    International conference proceedings

  • Wind-Induced Response Evaluation using Fluid-Structure Interaction analysis for High-rise Building with Complex Surface Shape

    HASAMA Takamasa, ITOU Yoshiakip, YAMAMOTO Manabu, SAKA Toshihide, KONDO Koji, TAMURA Tetsuro, YOKOKAWA Mitsuo

    日本建築学会, Aug. 2016, 日本建築学会大会学術講演梗概集, 1 - 2, Japanese

    Symposium

  • Performance Estimation of Programs by an Extension of the RoofLine Model Considering Cache Effects

    MINAMI Kazuo, INOUE Shunsuke, Chiba Syuichi, KUMAHATA Kiyoshi, YOKOKAWA Mitsuo

    A tuning technique which makes the sustained performance of programs on a single processor higher is very important in high performance computing on modern supercomputers. Therefore, prediction of the marginal performance of programs is a big concern to know how extent we can tune programs. The roofline model can estimate the marginal performance of programs well, if the performance is limited by effective memory bandwidth. The model, however, could not be applied to the performance prediction in the case of increasing of L2 cache accesses. In this paper, we proposed a new prediction model of the marginal performance which can be applied in the case of increasing of accesses to L2 cache. It is found that the new model works well for the marginal performance prediction by applying it to actual programs on the K computer and other systems.

    Information Processing Society of Japan, 14 Jul. 2016, 情報処理学会論文誌コンピューティングシステム(ACS), 9 (2), 1 - 14, Japanese

    [Refereed]

  • Wind Presssure prediction by large-eddy simulation for high-rise building with inner balcony and corner cut

    HASAMA Takamasa, ITOU Yoshiakip, KONDO Koji, YAMAMOTO Manabu, TAMURA Tetsuro, YOKOKAWA Mitsuo

    For evaluating the prediction accuracy of wind pressure on a high-rise building in an urban area using large-eddy simulation (LES), the results of LES calculation and wind tunnel experiment for a high-rise building with a complex surface shape comprising inner balcony and corner cut were compared. Approximately 140 million calculation meshes on the K computer, the fourth fastes

    Northern University, Jun. 2016, Proceedings of the 8th International Colloquium on Bluff Body Aerodynamics and Applications (BBAA VIII), 1 - 10, English

    [Refereed]

    Symposium

  • Imamura, Seigo, Ono, Kenji, Yokokawa, Mitsuo

    Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naive implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.

    TAYLOR & FRANCIS LTD, 2016, International Journal of Computational Fluid Dynamics, 30 (6), 395 - 401, English

    [Refereed]

    Scientific journal

  • Ishihara, Takashi, Morishita, Koji, Yokokawa, Mitsuo, Uno, Atsuya, Kaneda, Yukio

    A study is made about the energy spectrum E(k) of turbulence on the basis of highresolution direct numerical simulations (DNSs) of forced incompressible turbulence in a periodic box using a Fourier spectral method with the number of grid points and the Taylor scale Reynolds number R-lambda up to 12 288(3) and approximately 2300, respectively. The DNS data show that there is a wave-number range (approximately 5 x 10(-3) < k eta< 2 x 10(-2)) in which E(k) fits approximately well to Kolmogorov's k(-5/3) scaling, where eta is the Kolmogorov length scale. However, a close inspection shows that the exponent is a little smaller than -5/3, and E(k) in the range fits to E(k)/[(2/3)k(-5/3)] = c(kL)(m), where is the mean energy dissipation rate per unit mass; L is the integral length scale; and m approximate to -0.12. The coefficient c is independent of k, but has a R-lambda dependence, such as c = CR lambda zeta, where C approximate to 0.9 and zeta approximate to 0.14.

    AMER PHYSICAL SOC, 2016, Physical Review Fluids, 1 (8), 082403R, English

    [Refereed]

    Scientific journal

  • Shoji, Fumiyoshi, Tanaka, Katsuyuki, Matsushita, Satoshi, Takitsuka, Hiroyuki, Tsukamoto, Toshiyuki, Yokokawa, Mitsuo

    We report our operational experience of improving the energy efficiency of the power supply and cooling facilities of the K computer. By optimizing the number of active air handlers, the blowout temperature and the number of fans, the power consumption of the air handlers was reduced by approximately 40 %. We next considered improvements to the energy efficiency of the gas turbine power generators. After analyzing the long-term power generation profile, we found that the efficiency of the gas turbine power generators could be improved by more than 50 %. To increase the energy efficiency of the cooling towers, we considered a range of ways of improving the ventilation around the cooling towers and finally decided to remove a section of wall panels from the chiller building. Preliminary results suggest that the modification has had a positive effect on efficiency.

    SPRINGER HEIDELBERG, 2016, Computer Science-Research and Development, 31 (4), 235 - 243, English

    [Refereed]

    Scientific journal

  • 大規模連立一次方程式の反復解法の実問題への適用と性能評価

    俵谷 健太郎, YOKOKAWA MITSUO

    本報告では,種々の反復解法の実問題への適用,および,その性能を評価した.地盤とその上の大規模構造物の地震動解析において現れる条件数の異なる大規模対称疎行列に対して,数値実験を行った結果,行列に応じて,有効な反復解法が異なることがわかった.

    Information Processing Society of Japan, Dec. 2015, 152回ハイパフォーマンスコンピューティング研究会, 152, Japanese

    Symposium

  • 高レイノルズ数乱流中の渦の組織構造とその時間発展

    石原 卓, 宇野 篤也, 森下 浩二, YOKOKAWA MITSUO, 金田 行雄

    High vorticity regions obtained by the large-scale direct numerical simulations with up to 122883 grid points, of forced incompressible turbulence in a periodic box are visualized using the method developed for handling such large-scale data. The visualization shows that in high Reynolds number turbulence, strong micro-scale vortices are dense in clusters of various sizes up to

    日本流体力学会, Dec. 2015, 第29回数値流体力学シンポジウム講演論文集, 2015, 1 - 5, Japanese

    Symposium

  • 共通の係数行列を持つ複数の連立一次方程式のための反復ソルバの実装と性能評価

    今村 成吾, 小野 謙二, YOKOKAWA MITSUO

    Capacity computing is a promising scenario for improving performance on upcoming exascale supercomputers. Ensemble computing is an instance and has multiple linear systems associated with a common coefficient matrix. We implement to reduce load cost of coefficient matrix by solving them at the same time and improve performance of several iterative solvers. The maximum performance o

    日本流体力学会, Dec. 2015, 第29回数値流体力学シンポジウム講演論文集, 2015, 1 - 5, Japanese

    Symposium

  • LESによるインナーバルコニーおよび隅切りを有する高層建築物に作用する風圧予測

    HASAMA Takamasa, ITOU Yoshiaki, KONDO Koji, YAMAMOTO Manabu, TAMURA Tetsuro, YOKOKAWA Mitsuo

    To evaluate the prediction accuracy of wind pressure on the high-rise building in urban area using Large-Eddy Simulation (LES), the LES calculation and the wind tunnel experiments were compared for the high-rise building with complex surface shape consisted by inner balcony and corner cut. As the result of the complex surface shape resolution, the complex flow feature inside th

    日本流体力学会, Dec. 2015, 第29回数値流体力学シンポジウム講演論文集, 2015, 1 - 5, Japanese

    Symposium

  • RA-003 Performance Estimation of Programs by an Extension of the RoofLine Model considering Cache Effects

    Minami Kazuo, Inoue Shunsuke, Chiba Syuichi, Yokokawa Mitsuo

    Forum on Information Technology, 24 Aug. 2015, 情報科学技術フォーラム講演論文集, 14 (1), 13 - 19, Japanese

  • キャッシュの効果を考慮したルーフラインモデルの拡張によるプログラムの性能評価

    MINAMI Kazuo, INOUE Shunsuke, Chiba Syuichi, YOKOKAWA Mitsuo

    プログラムの実行性能限界を見積もるために,プロセッサのピーク性能,メモリバンド幅, Operational Intensity (Flop/Byte)をパラメータとしたルーフラインモデルが提案されている.ルーフラインモデルは,メモリネックのプログラムの場合に見積り性能と実測性能が良く一致するが,キャッシュアクセスが増えてくると,見積り性能と実測性能が乖離してくる.本報告では,キャッシュアクセスが増大するカーネルプログラムに対し,コーディングに基づく実行性能の見積もり方法を提案する.また,いくつかのカーネルループに対し,スーパーコンピュータ「京」上の実行性能の評価を行った結果,本方法が実効性能見積もりに適用できることを明らかにした.

    Information Processing Society of Japan, Aug. 2015, Forum on Information Technology 2015, 2015, 13 - 19, Japanese

    [Refereed]

    Research society

  • Fortran構造ツリー表示システムのEclipse上への実装

    OICHITomomi, YOKOKAWA Mitsuo, TERAI Masaaki, MINAMI Kazuo

    プログラムのチューニングや並列化作業を行う前段階として,プログラム構造を把握する必要がある.特に,プログラム中のループ,分岐,プロシージャ呼び出しの構造を理解出来れば,プログラム全体の構造を把握することが容易となる.本研究では,FORTRAN 77 及び Fortran 90 で書かれたプログラムについて,プログラム構造把握の作業効率向上を目的とし,ループ,分岐,プロシージャ呼び出しをプログラムから抽出し,ツリー状に可視化する支援ツール (STView) をフリーの統合開発環境 Eclipse 上に構築した

    Information Processing Society of Japan, 12 May 2015, ハイパフォーマンスコンピューティングと計算科学シンポジウム2015 論文集, 2015 (2015), 94 - 94, Japanese

    [Refereed]

  • 三重対角化に対するDongarra-Wilkinson法の性能解析と実装手法について

    KUDO Shuhei, YAMAMOTO Yusaku, YOKOKAWA Mitsuo

    We propose a new tridiagonalization method for real symmetric matrices (D-W method), which is derived from Dongarra's blocked method and Wilkinson's techniques. The Dongarra's blocked method can speedup tridiagonalization by replacing half of operations by matrix-matrix multiplications, but the rest half part (matrix-vector multiplications) remains, and that restricts the perfo

    Information Processing Society of Japan, May 2015, ハイパフォーマンスコンピューティングと計算科学シンポジウム2015 論文集, 2015, 19 - 28, Japanese

    [Refereed]

    Research society

  • Ishihara, Takashi, Enohata, Kei, Morishita, Koji, Yokokawa, Mitsuo, Ishii, Katsuya

    Statistics on the motion of small heavy (inertial) particles in turbulent flows with a high Reynolds number are physically fundamental to understanding realistic turbulent diffusion phenomena. An accurate parallel algorithm for tracking particles in large-scale direct numerical simulations (DNSs) of turbulence in a periodic box has been developed to extract accurate statistics on the motion of inertial particles. The tracking accuracy of the particle motion is known to primarily depend on the spatial resolution of the DNS for the turbulence and the accuracy of the interpolation scheme used to calculate the fluid velocity at the particle position. In this study, a DNS code based on the Fourier spectral method and two-dimensional domain decomposition method was developed and optimised for the K computer. An interpolation scheme based on cubic splines is implemented by solving tridiagonal matrix problems in parallel.

    SPRINGER-VERLAG BERLIN, 2015, Parallel Computing Technologies (Pact 2015), 9251, 522 - 527, English

    [Refereed]

    International conference proceedings

  • Performance Estimation of the Programs by the Extension of the RoofLine Model adding the Cache Effect

    Kazuo Minami, Shunsuke Inoue, Shuichi Chiba, Mitsuo Yokokawa

    The Roofline models have been proposed in order to estimate the marginal performance of programs based on some features of computer systems such as peak performance, memory bandwidth, and operational intensity. The estimated performance by the model is in good agreement with the measured performance in the case that programs access memory devices directly. However, a difference between the estimated performance and the measured performance appears in the case that cache accesses of the program increase. In this paper, we extended the roofline model to a new one which can apply to a performance estimation of programs in which many cache accesses occur. It is shown that the new model can estimate the sustained performance of various kernel loops on the K computer by comparing with measured performance.

    Information Processing Society of Japan (IPSJ), 02 Dec. 2014, IPSJ SIG Notes, 2014 (30), 1 - 9, Japanese

  • 一様等方性乱流の直接数値シミュレーションコードの京コンピュータ向け最適化

    MORISHITA Koji, YOKOKAWA Mitsuo, Uno Atsuya, ISHIHARA Takashi, KANEDA Yukio

    現在日本最速のスーパーコンピュータである「京」を用いて,一様等方性乱流の超大規模直接数値シミュレーション(DNS)を実現するために,地球シミュレータ向けに開発されたフーリエ・スペクトル法に基づく一様等方性乱流のDNSコードの「京」への移植,及び最適化を行った.移植の際には,従来の1次元分割によるデータ分散手法から,より効率的なAll-to-all通信が可能であると考えられる2次元分割による手法へと変更を行った.その結果,「京」の192×128ノードを用いて最大格子点数12288^3の超大規模DNSの実現に成功した.

    IPSJ, 02 Dec. 2014, IPSJ SIG Technical Report, 2014 (17), 1 - 5, Japanese

  • 実街区に建つ高層建築物に作用する風圧予測のためのハイパフォーマンスコンピュータの活用

    HASAMA Takamasa, ITOU Yoshiaki, KONDO Koji, YAMAMOTO Manabu, TAMURA Tetsuro, KAWAMOTO Yoichi, YOKOKAWA Mitsuo

    This study investigated the influence of actual urban block area for the wind pressure prediction of target high-rise building using large-eddy simulations (LES), and introduced the utilization of High-Performance Computer (HPC) for LES. Firstly, four LES cases were carried out using 3072 parallel calculation and compared with four wind tunnel experiments, respectively; (1) No-

    Dec. 2014, 第28回数値流体力学シンポジウム講演論文集, 1 - 8, Japanese

    Symposium

  • Hasegawa, Yukihiro, Iwata, Jun-Ichi, Tsuji, Miwako, Takahashi, Daisuke, Oshiyama, Atsushi, Minami, Kazuo, Boku, Taisuke, Inoue, Hikaru, Kitazawa, Yoshito, Miyoshi, Ikuo, Yokokawa, Mitsuo

    Silicon nanowires are potentially useful in next-generation field-effect transistors, and it is important to clarify the electron states of silicon nanowires to know the behavior of new devices. Computer simulations are promising tools for calculating electron states. Real-space density functional theory (RSDFT) code performs first-principles electronic structure calculations. To obtain higher performance, we applied various optimization techniques to the code: multi-level parallelization, load balance management, sub-mesh/torus allocation, and a message-passing interface library tuned for the K computer. We measured and evaluated the performance of the modified RSDFT code on the K computer. A 5.48 petaflops (PFLOPS) sustained performance was measured for an iteration of a self-consistent field calculation for a 107,292-atom Si nanowire simulation using 82,944 compute nodes, which is 51.67% of the K computer's peak performance of 10.62 PFLOPS. This scale of simulation enables analysis of the behavior of a silicon nanowire with a diameter of 10-20 nm.

    SAGE PUBLICATIONS LTD, 2014, International Journal of High Performance Computing Applications, 28 (3), 335 - 355, English

    [Refereed]

    Scientific journal

  • Yamamoto, Keiji, Uno, Atsuya, Murai, Hitoshi, Tsukamoto, Toshiyuki, Shoji, Fumiyoshi, Matsui, Shuji, Sekizawa, Ryuichi, Sueyasu, Fumichika, Uchiyama, Hiroshi, Okamoto, Mitsuo, Ohgushi, Nobuo, Takashina, Katsutoshi, Wakabayashi, Daisuke, Taguchi, Yuki, Yokokawa, Mitsuo

    The K computer, released on September 29, 2012, is a large-scale parallel supercomputer system consisting of 82,944 compute nodes. We have been able to resolve a significant number of operation issues since its release. Some system software components have been fixed and improved to obtain higher stability and utilization. We achieved 94% service availability because of a low hardware failure rate and approximately 80% node utilization by careful adjustment of operation parameters. We found that the K computer is an extremely stable and high utilization system.

    ELSEVIER SCIENCE BV, 2014, 2014 International Conference on Computational Science, 29, 576 - 585, English

    [Refereed]

    International conference proceedings

  • Design and Evaluation of K Computer

    SHIMIZU Toshiyuki, AJIMA Yuichiro, YOSHIDA Toshio, ASATO Akira, SHIDA Naoyuki, MIURA Kenichi, SUMIMOTO Shinji, NAGAYA Tadao, MIYOSHI Ikuo, AOKI Masaki, HARAGUCHI Masatoshi, YAMANAKA Eiji, MIYAZAKI Hiroyuki, KUSANO Yoshihiro, SHINJO Naoki, OINAGA Yuji, UNO Atsuya, KUROKAWA Motoyoshi, TSUKAMOTO Toshiyuki, MURAI Hitoshi, SHOJI Fumiyoshi, INOUE Shunsuke, KURODA Akiyoshi, TERAI Masaaki, HASEGAWA Yukihiro, MINAMI Kazuo, YOKOKAWA Mitsuo

    スーパーコンピュータ「京」の構成と評価について述べる.「京」はスパコンの広範な分野での利活用を目指した10PFLOPS級のスパコンである.我々は,デザインコンセプトとして,①汎用的なCPUアーキテクチャの採用と高いCPU単体性能の実現,②高いスケーラビリティのインターコネクトの専用開発,③並列度の爆発に抗する技術の導入,⑤高い信頼性,柔軟な運用性,省電力性の実現を掲げ,2011年にそのシステムを完成させた.HPC向けCPU,SPARC64 VIIIfxと,スケーラビリティの高いTofuインターコネクトを専用に開発し,並列度の爆発に抗する技術としてVISIMPACTを実装した.冷却やジョブマネージャ等により,高い信頼性,柔軟な運用性,省電力性を実現した.「京」は2011年6月と11月にTOP500で世界一となった.また,複数のアプリケーションで高い実行

    The Institute of Electronics, Information and Communication Engineers, Oct. 2013, The IEICE transactions on information and systems (Japanese edition), 96 (10), 2118 - 2129, Japanese

    [Refereed]

  • Performance Tuning of a Lattice QCD Code on a Node of the K computer

    TERAI MASAAKI, ISHIKAWA KEN-ICHI, SUGISAKI YOSHINORI, MINAMI KAZUO, SHOJI FUMIYOSHI, NAKAMURA YOSHIFUMI, KURAMASHI YOSHINOBU, YOKOKAWA MITSUO

    Lattice QCD is first principle calculation to solve the dynamics between quarks and gluons based on strong interaction. The calculation is performed on four dimensional space-time which is discretized to lattice, and requires a huge amount of inversion of the sparse matrix derived from Wilson-Dirac equation. In this study, Lattice QCD code, LDDHMC uses domain decomposition HMC

    Information Processing Society of Japan, 25 Sep. 2013, 情報処理学会論文誌コンピューティングシステム(ACS), 6 (3), 43 - 57, Japanese

    [Refereed]

  • MITSUO YOKOKAWA

    This paper proposes the design of ultra scalable MPI collective communication for the K computer, which consists of 82,944 computing nodes and is the world's first system over 10 PFLOPS. The nodes are connected by a Tofu interconnect that introduces six dimensional mesh/torus topology. Existing MPI libraries, however, perform poorly on such a direct network system since they assume typical cluster environments. Thus, we design collective algorithms optimized for the K computer. On the design of the algorithms, we place importance on collision-freeness for long messages and low latency for short messages. The long-message algorithms use multiple RDMA network interfaces and consist of neighbor communication in order to gain high bandwidth and avoid message collisions. On the other hand, the short-message algorithms are designed to reduce software overhead, which comes from the number of relaying nodes. The evaluation results on up to 55,296 nodes of the K computer show the new implementation outperforms the existing one for long messages by a factor of 4 to 11 times. It also shows the short-message algorithms complement the long-message ones. © 2012 Springer-Verlag.

    Springer-Verlag, May 2013, Computer Science - Research and Development, 28 (2-3), 147 - 155, English

    [Refereed]

    Scientific journal

  • スーパーコンピュータ「京」におけるマスタ・ワーカ型プログラミングモデルの検討

    MURAI HITOSHI, MINAMI KAZUO, YOKOKAWA MITSUO, UMEDA HIROAKI, SATO MITSUHISA, TSUJI MIWAKO, INADOMI YUUICHI, AOYAGI MUTSUMI, NAKAJIMA MAKOTO

    超並列計算機の安定的・友好的な利用のためにはシステム自体の耐故障性に加え,ユーザが開発するアプリケーションの耐故障性も重要である.我々は,スーパーコンピュータ「京」における耐故障機能を具備したマスタ・ワーカ型プログラミングモデルを検討中である.本報告では,次の3つの検討状況について,報告する.1)京のジョブ管理機構とMPIライブラリの拡張による耐故障性の実現,2)MPIの動的プロセス生成機能とRemote Procedure Callに基づくマスタ・ワーカ型プログラミングモデルの実現,3)本モデルに基づくフラグメント分子軌道法コードの実装および評価.

    Information Processing Society of Japan, Feb. 2013, ISJ SIG Technical Report, 2013-HPC-138 (26), 1 - 6, Japanese

    Symposium

  • Performance Tuning of a Lattice QCD code on a node of the K computer

    TERAI MASAAKI, ISHIKAWA KEN-ICHI, SUGISAKI YOSHINORI, MINAMI KAZUO, SHOJI FUMIYOSHI, NAKAMURA YOSHIFUMI, KURAMASHI YOSHINOBU, YOKOKAWA MITSUO

    Lattice QCD is first principle calculation to solve the dynamics between quarks and gluons based on strong interaction. The calculation is performed on four dimensional space-time which is discretized to lattice, and requires a huge amount of inversion of the sparse matrix derived from Wilson-Dirac equation. In this study, Lattice QCD code, LDDHMC uses domain decomposition HMC

    Information Processing Society of Japan, 08 Jan. 2013, HPCS2013, 2013 (2013), 34 - 43, Japanese

  • Tokuhisa, Atsushi, Arai, Junya, Joti, Yasumasa, Ohno, Yoshiyuki, Kameyama, Toyohisa, Yamamoto, Keiji, Hatanaka, Masayuki, Gerofi, Balazs, Shimada, Akio, Kurokawa, Motoyoshi, Shoji, Fumiyoshi, Okada, Kensuke, Sugimoto, Takashi, Yamaga, Mitsuhiro, Tanaka, Ryotaro, Yokokawa, Mitsuo, Hori, Atsushi, Ishikawa, Yutaka, Hatsui, Takaki, Go, Nobuhiro

    Single-particle coherent X-ray diffraction imaging using an X-ray free-electron laser has the potential to reveal the three-dimensional structure of a biological supra-molecule at sub-nanometer resolution. In order to realise this method, it is necessary to analyze as many as 1 x 10(6) noisy X-ray diffraction patterns, each for an unknown random target orientation. To cope with the severe quantum noise, patterns need to be classified according to their similarities and average similar patterns to improve the signal-to-noise ratio. A high-speed scalable scheme has been developed to carry out classification on the K computer, a 10PFLOPS supercomputer at RIKEN Advanced Institute for Computational Science. It is designed to work on the real-time basis with the experimental diffraction pattern collection at the X-ray free-electron laser facility SACLA so that the result of classification can be feedback for optimizing experimental parameters during the experiment. The present status of our effort developing the system and also a result of application to a set of simulated diffraction patterns is reported. About 1 x 10(6) diffraction patterns were successfully classificatied by running 255 separate 1 h jobs in 385-node mode.

    WILEY-BLACKWELL, 2013, Journal of Synchrotron Radiation, 20 (6), 899 - 904, English

    [Refereed]

    Scientific journal

  • The Design of MPI Communication Facility for K computer

    SUMIMOTO SHINJI, KAWASHIMA TAKAHIRO, SHIDA NAOYUKI, OKAMOTO TAKAYUKI, MIURA KENICHI, UNO ATSUYA, KUROKAWA MOTOYOSHI, SHOJI FUMIYOSHI, YOKOKAWA MITSUO

    This paper describes the design of high performance MPI communication which enables high performance communication with minimized memory usage on the 82,944 node K computer. The Tofu interconnect of K computer uses six dimension torus/mesh direct topology for realizing higher performance and availability on hundreds thousand node system. However, in such a ultra scale system, c

    Information Processing Society of Japan, 09 May 2012, SACSIS2012, 2012 (2012), 237 - 244, Japanese

  • Implementation and Evaluation of MPI_Allreduce on the K computer

    MATSUMOTO YUKI, ADACHI TOMOYA, SUMIMOTO SHINJI, NANRI TAKESHI, SOGATAKESHI, UNO ATSUYA, KUROKAWA MOTOYOSHI, SHOJI FUMIYOSHI, YOKOKAWA MITSUO

    This paper reports a method of speeding up MPI collective communication on the K computer, which consists of 82,944 computing nodes connected by a 6D direct network, named Tofu interconnect. Existing MPI libraries, however, do not have topology-aware algorithms which perform well on such a direct network. Thus, an Allreduce collective algorithm, named Trinaryx3, is designed and implemented in the MPI library for the K computer. The algorithm is optimized for a torus network and enables utilizing multiple RDMA engines, one of the strengths of the K computer. The evaluation results show the new implementation achieves five times higher bandwidth than existing one.

    Information Processing Society of Japan, 09 May 2012, 情報処理学会論文誌コンピューティングシステム(ACS), 5 (2012), 245 - 253, Japanese

  • Masaaki Terai, Hitoshi Murai, Kazuo Minami, Mitsuo Yokokawa, Eiji Tomiyama

    Given that scientific computer programs are becoming larger and more complicated, high performance application developers routinely examine the program structure of their source code to improve their performance. We have developed K-scope, a source code analysis tool that can be used to improve code performance. K-scope has graphical user interface that visualizes program structures of Fortran 90 and FORTRAN 77 source code and enables static data-flow analysis. To develop the tool, we adopted the filtered abstract syntax tree (filtered-AST) model with Java to visualize the program structure efficiently. Filtered-AST, which extends the AST in the structured programming model by abstract block structuring, is suitable for visualization program structures. Based on this model, K-scope has been developed as an experimental implementation. It constructs filtered-AST objects from both source and intermediate code generated by the front-end of the XcalableMP compiler. We provide illustrations of the graphical user interface and give detailed examples of the tool applied to an actual application code. © 2012 IEEE.

    2012, Proceedings of the International Conference on Parallel Processing Workshops, 434 - 443, English

    [Refereed]

    International conference proceedings

  • Overview of the K computer System

    Miyazaki, Hiroyuki, Kusano, Yoshihiro, Shinjou, Naoki, Shoji, Fumiyoshi, Yokokawa, Mitsuo, Watanabe, Tadashi

    RIKEN and Fujitsu have been working together to develop the K computer, with the aim of beginning shared use by the fall of 2012, as a part of the High-Performance Computing Infrastructure (HPCI) initiative led by Japan's Ministry of Education, Culture, Sports, Science and Technology (MEXT). Since the K computer involves over 80 000 compute nodes, building it with lower power consumption and high reliability was important from the availability point of view. This paper describes the K computer system and the measures taken for reducing power consumption and achieving high reliability and high availability. It also presents the results of implementing those measures.

    FUJITSU LTD, 2012, Fujitsu Scientific & Technical Journal, 48 (3), 255 - 265, English

    [Refereed]

    Scientific journal

  • Takahashi, Daisuke, Uno, Atsuya, Yokokawa, Mitsuo, IEEE

    In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on the K computer. The proposed algorithm is based on the six-step FFT algorithm, which can be altered into the recursive six-step FFT algorithm to reduce the number of cache misses. The recursive six-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the recursive six-step FFT algorithm to implement the parallel one-dimensional FFT algorithm. The performance results of one-dimensional FFTs on the K computer are reported. We successfully achieved a performance of over 18 TFlops on 8192 nodes of the K computer (82944 nodes, 128 GFlops/node, 10.6 PFlops peak performance) for a 2(41)-point FFT.

    IEEE COMPUTER SOC, 2012, 2012 Ieee 14th International Conference on High Performance Computing and Communications & 2012 Ieee 9th International Conference on Embedded Software and Systems (Hpcc-Icess), 344 - 350, English

    [Refereed]

    Scientific journal

  • Yokokawa, Mitsuo, IEEE

    The K computer is a distributed memory supercomputer system with 82,944 compute nodes and 5,184 I/O nodes that was jointly developed by RIKEN and Fujitsu as a Japanese national project. Its development began in 2006 and was completed in June, 2012. By achieving the LINPACK performance of 10.51 peta-FLOPS, the K computer ranked first for two consecutive TOP500 list in June and November 2011. During its adjustment, part of the K computer was served, gradually increasing computing resources, to experts in computational science as a trial and was used for performance optimization of users' application codes.

    IEEE, 2012, 2012 Third International Conference on Networking and Computing (Icnc 2012), 21 - 22, English

    [Refereed]

    Scientific journal

  • The K Computer - Toward Its Productive Applications to Our Life

    Yokokawa, Mitsuo, IEEE

    IEEE, 2012, 2012 Sc Companion: High Performance Computing, Networking, Storage and Analysis (Scc), 1673 - 1701, English

    [Refereed]

    Scientific journal

  • Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Fumiyoshi Shoji, Atsuya Uno, Motoyoshi Kurokawa, Hikaru Inoue, Ikuo Miyoshi, Mitsuo Yokokawa

    Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations on the electron states of silicon nanowires with up to 107,292 atoms carried out during the initial performance evaluation phase of the K computer being developed at RIKEN. The RSDFT code has been parallelized and optimized so as to make effective use of the various capabilities of the K computer. Simulation results for the self-consistent electron states of a silicon nanowire with 10,000 atoms were obtained in a run lasting about 24 hours and using 6,144 cores of the K computer. A 3.08 peta-flops sustained performance was measured for one iteration of the SCF calculation in a 107,292-atom Si nanowire calculation using 442,368 cores, which is 43.63% of the peak performance of 7.07 peta-flops. Copyright 2011 ACM.

    2011, Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, English

    [Refereed]

    International conference proceedings

  • Akinori Yonezawa, Tadashi Watanabe, Mitsuo Yokokawa, Mitsuhisa Sato, Kimihiko Hirao

    Advanced Institute for Computational Science (AICS) was created in July 2010 at RIKEN under the supervision of Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT) in order to establish the national center of excellence (COE) for high-performance computing and to operate the 10 petaflops class supercomputer called "K", manufactured by Fujitsu, Ltd. This paper presents AICS in the context of the national high-performance computing infrastructure of Japan. Furthermore, some notable simulation research using the K computer will be briefly discussed. Copyright 2011 ACM.

    2011, State of the Practice Reports, SC'11, English

    [Refereed]

    International conference proceedings

  • Mitsuo Yokokawa, Fumiyoshi Shoji, Atsuya Uno, Motoyoshi Kurokawa, Tadashi Watanabe

    The K computer is a distributed memory supercomputer system consisting of more than 80,000 compute nodes which is being developed by RIKEN as a Japanese national project. Its performance is aimed at achieving 10 peta-flops sustained in the LINPACK benchmark. The system is under installation and adjustment. The whole system will be operational in 2012. © 2011 IEEE.

    2011, Proceedings of the International Symposium on Low Power Electronics and Design, 371 - 372, English

    [Refereed]

    International conference proceedings

  • Tsutomu Ikegami, Toyokazu Ishida, Dmitri G. Fedorov, Kazuo Kitaura, Yuichi Inadomi, Hiroaki Umeda, Mitsuo Yokokawa, Satoshi Sekiguchi

    All electron calculations were performed on the photosynthetic reaction center of Blastochloris viridis, using the fragment molecular orbital (FMO) method. The protein complex of 20,581 atoms and 77,754 electrons was divided into 1398 fragments, and the two-body expansion of FMO/6-31G* was applied to calculate the ground state. The excited electronic states of the embedded electron transfer system were separately calculated by the configuration interaction singles approach with the multilayer FMO method. Despite the structural symmetry of the system, asymmetric excitation energies were observed, especially on the bacteriopheophytin molecules. The asymmetry was attributed to electrostatic interaction with the Surrounding proteins, in which the cytoplasmic side plays a major role. (C) 2009 Wiley periodicals, Inc. J Comput Chem 31: 447-454, 2010

    WILEY, Jan. 2010, JOURNAL OF COMPUTATIONAL CHEMISTRY, 31 (2), 447 - 454, English

    [Refereed]

    Scientific journal

  • MITSUO YOKOKAWA

    2009, Journal of Computational Chemistry, 31 (2), English

    [Refereed]

    Scientific journal

  • GridFMO - Quantum chemistry of proteins on the Grid

    Ikegami, Tsutomu, Maki, Jun, Takami, Toshiya, Tanaka, Yoshio, Yokokawa, Mitsuo, Sekiguchi, Satoshi, Aoyagi, Mutsurni, IEEE

    A GridFMO application was developed by recoining the fragment molecular orbital (FMO) method of GAMESS with Grid technology. With the GridFMO, quantum calculations of macro molecules become possible by using large amount of computational resources collected from many moderate-sized cluster computers. A new middleware suite was developed based on Ninf-G, whose fault tolerance and flexible resource management were found to be indispensable for long-term calculations. The GridFMO was used to draw ab initio potential energy curves of a protein motor system with 16,664 atoms. For the calculations, 10 cluster computers over the Pacific rim were used, sharing the resources with other users via batch queue systems on each machine. A series of 14 GridFMO calculations were conducted for 70 days, coping with more than 100 problems cropping up. The FMO curves were compared against the molecular mechanics (MM), and it was confirmed that (1) the FMO method is capable of drawing smooth curves despite several cut-off approximations, and that (2) the MM method is reliable enough for molecular modeling.

    IEEE, 2007, 2007 8th Ieee/acm International Conference on Grid Computing, 50 - +, English

    [Refereed]

    Scientific journal

  • Ishihara, T., Kaneda, Y., Yokokawa, M., Itakura, K., Uno, A.

    One-point statistics of velocity gradients and Eulerian and Lagrangian accelerations are studied by analysing the data from high-resolution direct numerical simulations (DNS) of turbulence in a periodic box, with up to 4096 3 grid points. The DNS consist of two series of runs; one is with k(max)eta similar to 1 (Series 1) and the other is with k(max)eta similar to 2 (Series 2), where k(max) is the maximum wavenumber and eta the Kolmogorov length scale. The maximum Taylor-microscale Reynolds number R-lambda in Series 1 is about 1130, and it is about 675 in Series 2. Particular attention is paid to the possible Reynolds number (Re) dependence of the statistics. The visualization of the intense vorticity regions shows that the turbulence field at high Re consists of clusters of small intense vorticity regions, and their structure is to be distinguished from those of small eddies. The possible dependence on Re of the probability distribution functions of velocity gradients is analysed through the dependence on R-lambda of the skewness and flatness factors (S and F). The DNS data suggest that the R-lambda dependence of S and F of the longitudinal velocity gradients fit well with a simple power law: S similar to -0.32R(lambda)(0.11) and F similar to 1.14(lambda)(0.34), in fairly good agreement with previous experimental data. They also suggest that all the fourth-order moments of velocity gradients scale with R-lambda similarly to each other at R-lambda > 100, in contrast to R-lambda < 100. Regarding the statistics of time derivatives, the sccond-order time derivatives of turbulent velocities are more intermittent than the first-order ones for both the Eulerian and Lagrangian velocities, and the Lagrangian time derivatives of turbulent velocities are more intermittent than the Eulerian time derivatives, as would be expected. The flatness factor of the Lagrangian acceleration is as large as 90 at R-lambda approximate to 430. The flatness factors of the Eulerian and Lagrangian accelerations increase with R-lambda approximately proportional to R-lambda(alpha E) and R-lambda(alpha L), respectively, where alpha(E) approximate to 0.5 and alpha(L) approximate to 1.0, while those of the second-order time derivatives of the Eulerian and Lagrangian velocities increases approximately proportional to R-lambda(beta E) and R-lambda(beta L), respectively, where beta(E) approximate to 1.5 and beta(L) approximate to 3.0.

    CAMBRIDGE UNIV PRESS, 2007, Journal of Fluid Mechanics, 592, 335 - 366, English

    [Refereed]

    Scientific journal

  • YOKOKAWA Mitsuo

    Atomic Energy Society of Japan, 2006, Journal of the Atomic Energy Society of Japan / Atomic Energy Society of Japan, 48 (11), 877 - 880, Japanese

  • Tsutomu Ikegami, Toyokazu Ishida, Dmitri G. Fedorov, Kazuo Kitaura, Yuichi Inadomi, Hiroaki Umeda, Mitsuo Yokokawa, Satoshi Sekiguchi

    A full electron calculation for the photosynthetic reaction center of Rhadopseudomanas viridis was performed by using the fragment molecular orbital (FMO) method on a massive cluster computer. The target system contains 20,581 atoms and 77,754 electrons, which was divided into 1,398 fragments. According to the FMO prescription, the calculations of the fragments and pairs of the fragments were conducted to obtain the electronic state of the system. The calculation at RHF/6-31G* level of theory took 72.5 hours with 600 CPUs. The CPUs were grouped into several workers, to which the calculations of the fragments were dispatched. An uneven CPU grouping, where two types of workers are generated, was shown to be efficient. © 2005 IEEE.

    2005, Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC'05, 2005, English

    [Refereed]

    International conference proceedings

  • Kaneda, Y, Yokokawa, M, Winter, G, Ecer, A, Satofuka, N, Periaux, J, Fox, P

    This chapter discusses direct numerical simulation (DNS) study of canonical turbulence. Incompressible turbulence obeying the Navier-Stokes (N-S) equation under periodic boundary conditions (BC) is widely regarded as one of the most canonical types of turbulences. It keeps the essence of tile turbulence dynamics: (1) the nonlinear convection effect associated with the fluid motion, (2) dissipativity, and (3) mass conservation, which is equivalent to the incompressibility or the so-called solenoidal condition in incompressible fluid. Underlying the study of turbulence in such a simple geometry is the idea of the Kolmogorov hypotheses, according to which the small scale statistics in fully developed turbulence at sufficiently high Reynolds number Re is universal and insensitive to the details of large scale conditions. The DNS of incompressible homogeneous turbulence was performed under periodic boundary conditions with the number of grid points up to 10243 on the VPP5000 system at the Information Technology Center, Nagoya University, and DNS up to 40963 grid points on the earth simulator (ES). The DNS is based on a spectral method free from alias error. Sustained performance of 16.4 Tflops was achieved in the DNS with 20483 grid points and double precision arithmetic on the ES. © 1996 Elsevier B.V. All rights reserved.

    Elsevier B.V., 2005, Parallel Computational Fluid Dynamics: Multidisciplinary Applications, 23 - 32, English

    [Refereed]

    Scientific journal

  • Habata, S, Umezawa, K, Yokokawa, M, Kitawaki, S

    The Earth Simulator (ES), developed under the Japanese government's initiative "Earth Simulator project", is a highly parallel vector supercomputer system. In this paper, an overview of ES, its architectural features, hardware technology and the result of performance evaluation are described. In May 2002, the ES was acknowledged to be the most powerful computer in the world: 35.86 teraflop/s for the LINPACK HPC benchmark and 26.58 teraflop/s for an atmospheric general circulation code (AFES). Such a remarkable performance may be attributed to the following three architectural features vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network. The ES consists of 640 processor nodes (PN) and an interconnection network (IN), which are housed in 320 PN cabinets and 65 IN cabinets. The ES is installed in a specially designed building, 65m long, 50m wide and 17m high. In order to accomplish this advanced system, many kinds of hardware technologies have been developed, such as a high-density and high-frequency LSI, a high-frequency signal transmission, a high-density packaging, and a high-efficiency cooling and power supply system with low noise so as to reduce whole volume of the ES and total power consumption. For highly parallel processing, a special synchronization means connecting all nodes, Global Barrier Counter (GBC), has been introduced. © 2004 Elsevier B.V. All rights reserved.

    2004, Parallel Computing, 30 (12), 1287 - 1313, English

    [Refereed]

    Scientific journal

  • The earth simulator system

    Habata, S, Yokokawa, M, Kitawaki, S

    2003, Nec Research & Development, 44 (1)

    [Refereed]

    Scientific journal

  • The development of the Earth Simulator

    Habata, H, Yokokawa, M, Kitawaki, S

    2003, Ieice Transactions on Information and Systems, E86D (10)

    [Refereed]

    Scientific journal

  • Yokokawa, M, Matsuno, K, Ecer, A, Periaux, J, Satofuka, N, Fox, P

    The Earth Simulator (ES) is an ultra high-speed supercomputer. The research and development of the ES was initiated in 1997 as one of the goals in the Earth Simulator project aiming at promotion of research for understanding and prediction of global environmental changes. The ES is a parallel computer system of the distributed-memory type, and consists of 640 processor nodes connected by 640 × 640 single-stage crossbar switches. Each processor node is a shared memory system composed of eight vector processors. The total peak performance and main memory capacity are 40 Tflop/s and 10TB, respectively. The LSI technology of 0.15 #m CMOS has been adopted to its one-chip vector processor. Its development has been successfully completed in February, 2002, by achieving a remarkable sustained performance of 35.86 Tfiop/s (a ratio of 88 % to the peak) in the Linpack benchmark program.

    Elsevier Inc., 2003, Parallel Computational Fluid Dynamics: New Frontiers and Multi-Disciplinary Applications, Proceedings, 131 - 138, English

    [Refereed]

    Scientific journal

  • Shingu, S, Fuchigami, H, Yamada, M, Tsuda, Y, Yoshioka, M, Ohfuchi, W, Nakamura, H, Yokokawa, M, Matsuno, K, Ecer, A, Periaux, J, Satofuka, N, Fox, P

    An atmospheric general circulation model (AGCM) for climate studies was developed for the Earth Simulator (ES). The model is called AFES which is based on the CCSR/NIES AGCM and is a global three-dimensional hydrostatic model using the spectral transform method. AFES is optimized for the architecture of the ES. We achieved the high sustained performance by the execution of AFES with T1279L96 resolution on the ES. The performance of 26.58 Tflops was achieved the execution of the main time step loop using all 5120 processors (640 nodes) of the ES. This performance corresponds to 64.9% of the theoretical peak performance 40.96 Tflops. The T1279 resolution, equivalent to about 10 km grid intervals at the equator, is very close to the highest resolution in which the hydrostatic approximation is valid. To our best knowledge, no other model simulation of the global atmosphere has ever been performed with such super high resolution. Currently, such a simulation is possible only on the ES with AFES. In this paper we describe optimization method, computational performance and calculated result of the test runs.

    Elsevier Inc., 2003, Parallel Computational Fluid Dynamics: New Frontiers and Multi-Disciplinary Applications, Proceedings, 79 - 86, English

    [Refereed]

    Scientific journal

  • MPI performance measurement on the earth simulator

    Uehara, H, Tamura, M, Yokokawa, M

    2003, Nec Research & Development, 44 (1)

    [Refereed]

    Scientific journal

  • Hitoshi Uehara, Masanori Tamura, Mitsuo Yokokawa

    Parallel programming is essential for large-scale scientific simulations, and MPI is intended to be a de facto standard API for this kind of programming. Since MPI has several functions that exhibit similar behaviors, programmers often have difficulty in choosing the appropriate function for their programs. An MPI benchmark program library named MBL has been developed for gathering performance data for various MPI functions. It measures the performance of MPI-1 and MPI-2 functions under several communication patterns. MBL has been applied to the measurement of MPI performance in the Earth Simulator. It is confirmed that a maximum throughput of 11.7GB/s is obtained in inter-node communications in the Earth Simulator. © 2002 Springer Berlin Heidelberg.

    2002, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2327, 219 - 230, English

    [Refereed]

    International conference proceedings

  • M. Yokokawa

    The Earth Simulator is an ultra high-speed supercomputer. The research and development of the Earth Simulator was initiated in 1997 as one of the approaches in the Earth Simulator project which aims at promotion of research and development for understanding and prediction of global environmental changes. The Earth Simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a single-stage full crossbar switch. Each processor node is a shared memory system composed of eight vector processors. The total peak performance and main memory capacity are 40Tflop/s and 10TB, respectively. In this paper, a concept of the Earth Simulator and an outline of the Earth Simulator system are described.

    IEEE Computer Society, 2000, Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems, 2001-, 93 - 99, English

    [Refereed]

    International conference proceedings

  • Development of the "Earth Simulator"

    Kawai, S, Yokokawa, M, Ito, H, Shingu, S, Tani, K, Yoshida, K, Keyes, DE, Ecer, A, Periaux, J, Satofuka, N

    2000, Parallel Computational Fluid Dynamics: Towards Teraflops, Optimization, and Novel Formulations

    [Refereed]

    Scientific journal

  • Mitsuo Yokokawa, Shinichi Habata, Shinichi Kawai, Hiroyuki Ito, Keiji Tani, Hajime Miyoshi

    The Earth Simulator is an ultra high-speed supercomputer. The research and development of the Earth Simulator started in 1997 as one of the approaches in the Earth Simulator project which aims at the promotion of research and development for understanding and prediction for global environment change. Conceptual design and basic design of the Earth Simulator have been finished so far. According to the design, the Earth Simulator is a distributed memory parallel system which consists of 640 processor nodes connected by an internode crossbar switch. Each processor node is a shared memory system composed of eight vector processors. The total peak performance and main memory capacity are 40Tflop/s and 10TB, respectively. In this paper, the concepts of the Earth Simulator system and the outline of the basic design are presented.

    Springer Verlag, 1999, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1615, 269 - 280, English

    [Refereed]

    International conference proceedings

  • Yokokawa, M, Schneider, DE, Watanabe, T, Kaburaki, H

    1996, Parallel Computational Fluid Dynamics

    [Refereed]

    Scientific journal

  • WATANABE, T, KABURAKI, H, YOKOKAWA, M

    In response to the preceding Comment by Garcia, Baras, and Mansour [Phys. Rev. E 51, 3784 (1995)], we evaluate the Rayleigh number by taking the temperature jump at the wall into consideration. It is shown that a good agreement between the direct simulation Monte Carlo results and the linear stability theory is obtained by using the diffuse boundary condition, while there is a slight discrepancy in the case of the semislip boundary condition. © 1995 The American Physical Society.

    1995, Physical Review E, 51 (4), 3786 - 3787, English

    [Refereed]

    Scientific journal

  • Parallelization of a Fourier pseudospectral CFD code

    Yokokawa, M, Watanabe, T, Kaburaki, H, INFORMAT PROC SOC

    1995, 1995 International Workshop on Computer Performance Measurement and Analysis (Permean '95), Proceedings

    [Refereed]

    Scientific journal

  • WATANABE, T, KABURAKI, H, MACHIDA, M, YOKOKAWA, M

    The transition between heat conduction and convection in the two-dimensional Rayleigh-Bénard system is simulated using the direct simulation Monte Carlo method. Long-range correlations of temperature fluctuations are found to grow in the transition. © 1995 The American Physical Society.

    1995, Physical Review E, 52 (2), 1601 - 1605

    [Refereed]

    Scientific journal

  • Study of Rayleigh-Benard Instability Using the Direct Simulation Monte Carlo Method

    WATANABE Tadashi, KABURAKI Hideo, MACHIDA Masahiko, YOKOKAWA Mitsuo

    熱伝導から対流熱伝達への遷移が生じるRayleigh-Benard不安定性を、直接シミュレーションモンテカルロ法により調べた。基礎方程式と計算手法を詳しく記述し、分子運動のレベルの計算により得られる臨界レイリー数が、巨視的な流体方程式の線形不安定性理論から導かれる値と一致することを示した。さらに、臨界レイリー数近傍の条件における遷移過程で流れ場が熱伝導状態を示していても、温度変動の空間的な相関は既に対流状態への移行を示していること、変動の影響の及ぶ範囲を示す特性距離は、安定な熱伝導あるいは対流状態では小さく、遷移過程でのみ大きくなることが示された。

    01 Oct. 1994, Therm. Sci. Eng., 2 (4), 17 - 24, Japanese

  • WATANABE, T, KABURAKI, H, YOKOKAWA, M

    The transition from conduction to convection in the two-dimensional Rayleigh-Bénard system has been simulated using the direct simulation Monte Carlo method, where the diffuse reflection boundary conditions are strictly applied at the top and bottom walls. It is shown that the determined critical Rayleigh number agrees well with that obtained by the macroscopic hydrodynamic equations. © 1994 The American Physical Society.

    1994, Physical Review E, 49 (5), 4060 - 4064, English

    [Refereed]

    Scientific journal

  • KABURAKI, H, YOKOKAWA, M

    1994, Molecular Simulation, 12 (3-6), 441 - 444, English

    [Refereed]

    Scientific journal

  • ASAI, K, ISHIGURO, M, AKIMOTO, M, YOKOKAWA, M

    1990, Journal of Nuclear Science and Technology, 27 (7), 683 - 686, English

    [Refereed]

    Scientific journal

  • Yukio Kaneda, Takashi Ishihara, Koji Morishita, Mitsuo Yokokawa, Atsuya Uno

    In high-Reynolds-number turbulence the spatial distribution of velocity fluctuation at small scales is strongly non-uniform. In accordance with the non-uniformity, the distributions of the inertial and viscous forces are also non-uniform. According to direct numerical simulation (DNS) of forced turbulence of an incompressible fluid obeying the Navier-Stokes equation in a periodic box at the Taylor microscale Reynolds number R-lambda approximate to 1100, the average < R-loc > over the space of the 'local Reynolds number' R-loc, which is defined as the ratio of inertial to viscous forces at each point in the flow, is much smaller than the conventional 'Reynolds number' given by Re = UL/v, where U and L are the characteristic velocity and length of the energy-containing eddies, and nu is the kinematic viscosity. While both conditional averages of the inertial and viscous forces for a given squared vorticity omega(2) increase with omega(2) at large omega(2), the conditional average of R-loc is almost independent of omega(2). A comparison of the DNS field with a random structureless velocity field suggests that the increase in the conditional average of R-loc with omega(2) at large omega(2) is suppressed by the Navier-Stokes dynamics. Something similar is also true for the conditional averages for a given local energy dissipation rate per unit mass. Certain features of intermittency effects such as that on the Re dependence of < R-loc > are explained by a multi-fractal model by Dubrulle (J. Fluid Mech., vol. 867, 2019, P1).

    CAMBRIDGE UNIV PRESS, Oct. 2021, JOURNAL OF FLUID MECHANICS, 929, English

    [Refereed]

    Scientific journal

  • Toshiyuki Imamura, Masaaki Aoki, Mitsuo Yokokawa

    This work introduces a new idea of batched 3D-FFT with a survey of data decomposition methods and a review of the states-of-arts high performance parallel FFT libraries. Besides, it is argued that the particular usage of multiple FFTs has been associated with the batched execution. The batched 3D-FFT kernel, which is performed on the K computer, shows 45.9% speedup when N and P are 20483 and 128, respectively. The batched FFT allows the developer to take advantage of a flexible internal data layout and scheduling to improve the total performance.

    IOS PRESS, 2020, PARALLEL COMPUTING: TECHNOLOGY TRENDS, 36, 169 - 178, English

    [Refereed]

    International conference proceedings

MISC

  • Large-scale Direct Numerical Simulations of Homogeneous Isotropic Turbulence

    石原 卓, 横川 三津夫, 森下 浩二, 宇野 篤也, 金田 行雄

    小宮山印刷工業, Jun. 2019, シミュレーション = Journal of the Japan Society for Simulation Technology, 38 (2), 74 - 78, Japanese

    Introduction scientific journal

  • HPC Asia 2018 開催報告

    横川 三津夫

    15 Aug. 2018, 情報処理, 59 (9), 856 - 857, Japanese

  • 20091 多質点構造モデルとLESを用いた流体-構造連成解析による高層建築物の空力不安定振動評価

    挾間 貴雅, 坂 敏秀, 伊藤 嘉晃, 山本 学, 近藤 宏二, 田村 哲郎, 横川 三津夫

    日本建築学会, 20 Jul. 2018, 構造I, (2018), 181 - 182, Japanese

  • Fluid-Structure Interaction Analysis System Using Multi-Degree-of-Freedom Structure Model

    挾間 貴雅, 坂 敏秀, 伊藤 嘉晃, 近藤 宏二, 山本 学, 田村 哲郎, 横川 三津夫

    鹿島技術研究所, Dec. 2017, 鹿島技術研究所年報 Annual report, Kajima Technical Research Institute, Kajima Corporation, 65, 135 - 140, Japanese

  • 地球シミュレータ上でのXcalableMP語の通信性能評価

    上原 均, 村井 均, 横川 三津夫

    29 May 2017, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2017), 13 - 13, Japanese

  • Performance Evaluation of Parareal Method for Large-Scale Spatio-Temporal Computation

    今村 成吾, 飯塚 幹夫, 小野 謙二, 横川 三津夫

    To achieve high-performance computation on recent supercomputer architecture with numerous cores, it is required to incorporate various parallelism into parallel simulation to break performance saturation employing only domain decomposition approach. Parallel-in-Time (PinT) is one of the most promising candidate for such issues. In this paper, the authors demonstrate that Parareal method is simple but still effective implementation of PinT and its Pipelined version shows higher performance. Performance evaluation was conducted on the K Computer with upto 8,000 nodes and found that Pipelined Parareal method with domain decomposition method achieved 222 times faster than the its sequential time-integration.

    日本計算工学会, May 2017, 計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science, 22, 4p, Japanese

  • Fluid-Structure Interaction Analysis of High-Rise Building with Complex Surface Shape

    挾間 貴雅, 伊藤 嘉晃, 近藤 宏二, 坂 敏秀, 山本 学, 田村 哲郎, 横川 三津夫

    鹿島技術研究所, Nov. 2016, 鹿島技術研究所年報 Annual report, Kajima Technical Research Institute, Kajima Corporation, 64, 150 - 155, Japanese

  • 複雑表面形状を有する高層建築物を対象とした流体-構造連成解析による風応答評価

    挾間 貴雅, 伊藤 嘉晃, 山本 学, 坂 敏秀, 近藤 宏二, 田村 哲郎, 横川 三津夫

    日本建築学会, 24 Aug. 2016, 構造I, (2016), 263 - 264, Japanese

  • 地球シミュレータでのXcalableMP言語の評価

    上原 均, 横川 三津夫, 村井 均

    30 May 2016, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2016), 62 - 62, Japanese

  • Study of Parallel Computational Model for Post-Peta Scale Computer System

    上原 均, 横川 三津夫, 村井 均, 板倉 憲一, 浅野 俊幸

    Center for Earth Information Science and Technology, Japan Agency for Marine-Earth Science and Technology, Apr. 2016, Annual report of the earth simulator, 269 - 273, Japanese

  • Vortex Clusters and Their Time Evolution in High-Reynolds-Number Turbulence

    ISHIHARA TAKASHI, UNO ATSUYA, MORISHITA KOJI, YOKOKAWA MITSUO, KANEDA YUKIO

    日本流体力学会, Apr. 2016, ながれ, 35 (2), 109 - 113, Japanese

    Introduction scientific journal

  • Wind Pressure Prediction Using Large-Eddy Simulation for High-rise Building with Complex Surface Shape

    挾間 貴雅, 伊藤 嘉晃, 近藤 宏二, 山本 学, 田村 哲郎, 横川 三津夫

    鹿島技術研究所, Nov. 2015, 鹿島技術研究所年報 Annual report, Kajima Technical Research Institute, Kajima Corporation, 63, 137 - 142, Japanese

  • Effort to Practice Wind Resistant Design Using Computational Fluid Dynamics

    近藤 宏二, 挾間 貴雅, 伊藤 嘉晃, 山本 学, 中山 かほる, 鈴木 雅靖, 田村 哲郎, 河合 英徳, 川本 陽一, 横川 三津夫, 坪倉 誠, 大西 慶治, バレ ラフール

    鹿島技術研究所, Nov. 2015, 鹿島技術研究所年報 Annual report, Kajima Technical Research Institute, Kajima Corporation, 63, 1 - 14, Japanese

  • 20066 Wind Pressure Prediction for High-rise Building with Complex Surface Shape using Large-Eddy Simulation

    HASAMA Takamasa, ITOH Yoshiaki, KONDO Koji, YAMAMOTO Manabu, TAMURA Tetsuro, YOKOKAWA Mitsuo

    Architectural Institute of Japan, 04 Sep. 2015, Summaries of technical papers of annual meeting, (2015), 131 - 132, Japanese

  • Ishihara T., Morishita K., Yokokawa M., Uno A., Kaneda Y.

    The Physical Society of Japan, 2015, Meeting Abstracts of the Physical Society of Japan, 70 (0), 2807 - 2807, Japanese

  • Fortranコードのチューニング支援ツールK-scopeの開発

    寺井 優晃, 富山 栄治, 村井 均, 熊畑 清, 濱田 信次, 井上 俊介, 黒田 明義, 長谷川 幸弘, 南 一生, 横川 三津夫

    08 Jan. 2013, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2013), 84 - 84, Japanese

  • Experiments at the X-ray free-electron laser facility SACLA in cooperration with the K computer

    SUGIMOTO Takashi, JOTI Yasumasa, MAYAMA Kou, OHATA Tohru, SAKAMOTO Tatsuaki, TANAKA Ryotaro, YAMAGA Mitsuhiro, HATSU Takaki, ISHIKAWA Yutaka, KAMEYAMA Toshihisa, KUROKAWA Motoyoshi, SHOJI Fumiyoshi, YOKOKAWA Mitsuo, NISHIKAWA Takeshi

    The X-ray free electron laser facility, SACLA, delivers ultra-short pulse, high-brilliant, and coherent X ray. X-ray coherent diffraction imaging, which is one of scientific applications of SACLA, aims at structural analysis of noncrystalline proteins. To reconstruct three-dimensional structure of proteins, several-million diffraction images are to be analyzed using supercomputers such as the K computer. To achieve cooperative experiments, we perform data-transfer test from SACLA to FOCUS and e-Science supercomputers.

    The Institute of Electronics, Information and Communication Engineers, 05 Oct. 2012, IEICE technical report. Internet Architecture, 112 (236), 25 - 30, Japanese

  • ファイルステージングのあるジョブスケジューリングの評価

    宇野篤也, 庄司文由, 横川三津夫

    スーパーコンピュータ 「京」 や地球シミュレータなどの大規模システムでは,計算ノードのファイル I/O 性能を確保するために 2 階層のファイルシステムを採用しており,ジョブ実行の一連の作業としてファイルシステム間でファイルを移動させるファイルステージング機構をジョブスケジューリングに組み込んでいる.本稿では,ファイルステージングがジョブスケジューリングに与える影響等についてソフトウェアジョブシミュレータを用いて評価したので報告する.

    26 Sep. 2012, 研究報告ハイパフォーマンスコンピューティング(HPC), 2012 (22), 1 - 6, Japanese

  • The K Computer:0. Foreword

    横川 三津夫

    15 Jul. 2012, 情報処理, 53 (8), 752 - 753, Japanese

  • The K Computer:1. Introduction to the Next-Generation Supercomputer Project

    YOKOKAWA MITSUO, WATANABE TADASHI

    「世界最先端・最高性能の次世代スーパーコンピュータの開発・利用及び利用技術の開発・普及」という目標を掲げて,平成18年度からの7年計画で,次世代スーパーコンピュータ(愛称「京」)の開発がスタートした.本稿では,「京」の開発方針,開発経緯,システム構成の決定とその見直し,製作と性能実証等のシステム完成に至るまでのプロジェクト概要について紹介する.

    Information Processing Society of Japan, 15 Jul. 2012, 情報処理, 53 (8), 754 - 758, Japanese

    Introduction scientific journal

  • Energy Dissipation Rate and Energy Spectrum in High Resolution Direct Numerical Simulations of Turbulence in a Periodic Box

    KANEDA Yukio, ISHIHARA Takashi, YOKOKAWA Mitsuo, ITAKURA Ken'ichi, UNO Atsuya, Yukio KANEDA, Takashi ISHIHARA, Mitsuo YOKOKAWA, Ken'ichi ITAKURA, Atsuya UNO, Center for General Education Aichi Institute of Technology, Graduate School of Engineering Nagoya University

    日本流体力学会, 25 Jun. 2012, ながれ : 日本流体力学会誌, 31 (3), 241 - 244, Japanese

    Report scientific journal

  • Reduction of Execution Time of RMATT for Communication Time Optimization for Large Scale Computation

    今出 広明, 平本 新哉, 三浦 健一, 住元 真司, 黒川 原佳, 横川 三津夫, 渡邊 貞

    17 Jan. 2012, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2012), 93 - 100, Japanese

  • Status of the development of K computer

    横川 三津夫

    17 Jan. 2012, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2012), 47 - 47, Japanese

  • Performance Tuning and Evaluation of Sparse matrix-vector multiplication on the K computer

    南 一生, 井上 俊介, 堤 重信, 前田 拓人, 長谷川 幸弘, 黒田 明義, 寺井 優晃, 横川 三津夫

    17 Jan. 2012, ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集, (2012), 23 - 31, Japanese

  • 京速コンピュータ「京」

    横川 三津夫

    日本電気協会, Dec. 2011, 電気協会報, (1044), 14 - 16, Japanese

  • TECHNO TREND : The New World Record of Super Computer "KEI"

    渡邊 貞, 横川 三津夫, 青木 孝

    防衛技術協会, Dec. 2011, 防衛技術ジャーナル, 31 (12), 5 - 11, Japanese

  • Implementation and Evaluation of MPI_Allreduce on the K computer

    Yuki Matsumoto, Tomoya Adachi, Minoru Tanaka, Shinji Sumimoto, Takeshi Soga, Takeshi Nanri, Atsuya Uno, Motoyoshi Kurokawa, Fumiyoshi Shoji, Mitsuo Yokokawa

    This paper reports a method of speeding up MPI collective communication on the K computer, that consists of more than 80 thousand computing nodes connected by direct network. Almost all existing MPI libraries only implement algorithms optimized for indirect network. However, such algorithms perform poor on direct network because of collisions of the messages. Thus, in order to achieve high performance on direct network, it is necessary to implement collective algorithms optimized for the network topology. In this paper, Trinaryx3 Allreduce algorithm is designed and implemented in the MPI library for the K computer. The algorithm is optimized for torus network and enables utilizing multiple RDMA engines, one of the strengths of the K computer. The evaluation result shows that the new implementation achieves five times higher bandwidth than existing one, optimized for indirect network.

    Information Processing Society of Japan (IPSJ), 21 Nov. 2011, IPSJ SIG Notes, 2011 (6), 1 - 10, Japanese

  • Hirao Kimihiko, Yokokawa Mitsuo

    The Advanced Institute for Computational Science (AICS) was established in Kobe on July 1, 2010. This new organization is responsible for operating the Next-Generation Supercomputer-named "K" after the character 京, which stands for 10 to the 16th power-and for carrying out R & D in computational science and technology. Our mission is to get the maximum potential use out of the "K computer" to propel Japan to a leading position in the world of computational science and technology. Petascale computing hardware is just around the corner. Petascale resources will enable us to enter a new era of modeling. The supercomputer is an essential tool for contemporary science and technology. The potential it offers for expanding basic research in the study of the universe, elementary particles, materials science and the life science is clear. But the supercomputer is equally essential to a wide range of advanced science and technology that is directly related to our daily lives. We are in the midst of a fierce global competition to develop and use the most advanced supercomputer. AICS will be working to further science and technology, buttress Japan's competitive edge in science, and respond to the needs of both the Japanese people and the global community. Our vision is for ACIS that will become the Mecca of computational science, a converging point of global knowledge attracting scientists from around the world. We hope to produce exciting results that will amaze the world and the Japanese people.

    The Physical Society of Japan, 2011, Butsuri, 66 (7), 524 - 528, Japanese

  • YOKOKAWA Mitsuo, SHOJI Fumiyoshi

    スーパーコンピュータは,科学技術計算を高速に行う計算機であり,今後の科学技術の発展のためにはなくてはならない計算科学のための基盤ツールである。理化学研究所は,平成18年度から次世代スーパーコンピュータの開発プロジェクトにおいて,LINPACK性能10ペタフロップスを超える世界最速レベルの汎用型スーパーコンピュータ(愛称:京速コンピュータ「京」)の開発を進めている。本稿では,すでに製作が開始された京速コンピュータ「京」の概要について紹介する。

    Atomic Energy Society of Japan, 01 Dec. 2010, Journal of the Atomic Energy Society of Japan, 52 (12), 782 - 786, Japanese

  • Present State of Development Project of a Next-Generation Supercomputer System

    YOKOKAWA Mitsuo

    日本計算工学会, 31 Jan. 2008, Journal of the Japan Society for Computational Engineering and Science, 13 (1), 1733 - 1735, Japanese

  • Grid-Large Scale Quantum Chemistry on the Grid

    IKEGAMI TSUTOMU, MAKI JUN, TAKAMI TOSHIYA, TANAKA YOSHIO, YOKOKAWA MITSUO, SEKIGUCHI SATOSHI, AOYAGI MUTSUMI

    The Grid technology was used to develop the GridFMO application to perform quantum chemical calculations in the distributed parallel environment. The Fragment Molecular Orbital (FMO) method was employed to obtain accurate electronic states of proteins. To support long-term calculations, a Grid middleware was developed with high fault-tolerancy and flexyble resource management, based on the lower middleware of Ninf-G. Uniting 10 cluster computers over the Pacific rim, 14 GridFMO calculations were conducted in a period of 70 days, while sharing them with other users via batch queue systems on each machine. The importance of the fault-tolerancy and the resource management was demonstrated through the experiment.

    Information Processing Society of Japan (IPSJ), 15 May 2007, 情報処理学会論文誌コンピューティングシステム(ACS), 48 (8), 83 - 93, Japanese

  • Design of Data Repository System for 3-D Full-Scale Earthquake Testing Facility

    TANIMURA YUSUKE, TANAKA YOSHIO, YOKOKAWA MITSUO, SEKIGUCHI SATOSHI

    A data repository system, that is called EDgrid Central, is designed for storing huge amount of experiment data by using a 3-D full-scale earthquake testing facility. The EDgrid Central prepares large storage capacity and implements a data modeling for the shake test in the backend. The frontend is a portal for users to retrieve the stored data by meta-data search and bulk download. This system uses the NEEScentral developed by the NEES project in the United States by enhancing search and download functionalities, according to the EDgrid users' requirements. The EDgrid Central allows facility sites to have a permanent repository of the shaking table experiment and it also enables civil engineering researchers to share their data and reports in their daily activities.

    Information Processing Society of Japan (IPSJ), 27 Feb. 2006, IPSJ SIG Notes, 2006 (20), 115 - 120, Japanese

  • Temperature Distribution in the Cluster

    SHIMIZU TOSHIYUKI, SATOH SATOSHI, KODAMA YUETSU, KUDOH TOMOHIRO, YOKOKAWA MITSUO

    In order to stop a large-scale cluster safely in the cases, such as a power failure, air-conditioning under shutdown also poses a problem. In this paper, a 2-dimensional distribution and trends of room temperature is surveyed, and the safety of way which shuts a system down without backing up an air-conditioning machine is confirmed.

    Information Processing Society of Japan (IPSJ), 07 Mar. 2005, IPSJ SIG Notes, 162 (19), 127 - 132, Japanese

  • Ishihara, T, Kaneda, Y, Yokokawa, M, Itakura, K, Uno, A

    The energy spectrum in the near dissipation range of turbulence is studied by analyzing the data of a series of high-resolution direct numerical simulations of incompressible homogeneous turbulence in a periodic box with the Taylor micro-scale Reynolds number R-lambda and resolution ranging up to about 675 and 4096(3), respectively. The spectra in the Reynolds number range fit well to the form C(k eta)(alpha) exp(-beta k eta) in the wavenumber range 0.5 less than or similar to k eta less than or similar to 1.5, where eta is the Kolmogorov dissipation length scale and C, alpha and beta are constants independent of k. The values of alpha and beta decrease monotonically with R-lambda, and they are consistent with the conjecture that they approach to constants as R-lambda -> infinity, but the approach, especially that of beta, is slow.

    PHYSICAL SOC JAPAN, 2005, Journal of the Physical Society of Japan, 74 (5), 1464 - 1471, English

    [Refereed]

  • Aoyama, T, Ishihara, T, Kaneda, Y, Yokokawa, M, Itakura, K, Un, A

    The statistics of energy transfer is studied by using the data of a series of high-resolution direct numerical simulations of incompressible homogeneous turbulence in a periodic box with the Taylor micro-scale Reynolds number R-lambda and grid points up to approximately 1130 and 4096(3), respectively. The data show that the energy transfer T across the wave number k is highly intermittent and the skewness S and flatness F of T increase with k approximately as S proportional to (kL)(alpha S), F proportional to (kL)(alpha F) in the inertial subrange, where alpha(S) similar to 2/3, alpha(F) similar to 1 and L the characteristic length scale of energy containing eddies. The comparison between the statistics of T, the energy dissipation rate epsilon and its average epsilon(r) over a domain of scale r shows that T is less intermittent than epsilon, while there is a certain similarity between the probability distribution functions of T and epsilon(r).

    PHYSICAL SOC JAPAN, 2005, Journal of the Physical Society of Japan, 74 (12), 3202 - 3212, English

    [Refereed]

  • Performance Evaluation of AIST Supercluster P-32 by Linpack

    SAMUKAWA HIKARU, FUJIMOTO YASUSHI, TATEBE OSAMU, KODAMA YUETSU, YOKOKAWA MITSUO, KUDOH TOMOHIRO, SEKIGUCHI SATOSHI

    AIST supercluster installed at Grid Technology Research Center consists of three systems, P-32, M-64, and F-32. We performed Linpack benchmark on P-32 cluster system which is the largest among the three systems. The measured performance of 6.155 Tflop/s corresponds to 75% of the theoretical peak performance. This paper reports how an appropriate combination of parameters of HPL program was effectively found based upon HPL program analysis for computation and communication time behavior for large scale problems. Then effects of NUMA capability of Linux kernel to the Linpack performance which was revealed through the benchmark.

    Information Processing Society of Japan (IPSJ), 30 Jul. 2004, IPSJ SIG Notes, 99 (81(HPC-99)), 163 - 168, Japanese

  • Earth Simulator:A Hardware Overview of the Earth Simulator

    HABATA Shinichi, YOKOKAWA Mitsuo, KITAWAKI Shigemune

    Information Processing Society of Japan (IPSJ), 15 Feb. 2004, IPSJ Magazine, 45 (2), 116 - 121, Japanese

  • Yokokawa Mitsuo

    The Japan Society for Industrial and Applied Mathematics, 2004, Bulletin of the Japan Society for Industrial and Applied Mathematics, 14 (4), 395 - 395, Japanese

  • High-resolution direct numerical simulation of turbulence - Spectra of fourth-order velocity momenrs

    Kaneda, Y, Ishihara, T, Yokokawa, M, Itakura, K, Uno, A, Smits, AJ

    High-resolution direct numerical simulations (DNSs) of incompressible turbulence based on an alias-free spectral method were performed on the Earth Simulator. Statistics of turbulence are studied by a DNS on 1024(3) grid points with a special emphasis on the spectra of moments fourth order in velocity. A brief review is given on some results of the preliminary analysis of the data of DNSs with up to 2048(3) grid points.

    SPRINGER, 2004, Iutam Symposium on Reynolds Number Scaling in Turbulent Flow, 74, 155 - 162, English

    [Refereed]

  • Itakura, K, Uno, A, Yokokawa, M, Ishihara, T, Kaneda, Y

    The Earth Simulator (ES) is an SMP cluster system. There are two types of parallel programming models available on the ES. One is a flat programming model, in which a parallel program is implemented by MPI interfaces only, both within an SMP node and among nodes. The other is a hybrid programming model, in which a parallel program is written by using thread programming within an SMP node and MPI programming among nodes simultaneously. It is generally known that it is difficult to obtain the same high level of performance using the hybrid programming model as can be achieved with the flat programming model. In this paper, we have evaluated scalability of the code for direct numerical simulation of the Navier-Stokes equations on the ES. The hybrid programming model achieves the sustained performance of 346.9Gflop/s, while the flat programming model achieves 296.4Gfiop/s with 16 PNs of the ES for a DNS problem size of 256 3. For small scale problems, however, the hybrid programming model is not as efficient because of microtasking overhead. It is shown that there is an advantage for the hybrid programming model on the ES for the larger size problems. (C) 2004 Elsevier B.V. All rights reserved.

    ELSEVIER SCIENCE BV, 2004, Parallel Computing, 30 (12), 1329 - 1343, English

    [Refereed]

  • Construction of a Portal Site for MD Stencil on the Grid

    YAMAMOTO NAOTAKA, SHIMIZU FUTOSHI, YOKOKAWA MITSUO, SEKIGUCHI SATOSHI, KABURAKI HIDEO, HIMENO RYUTARO, KUROKAWA MOTOYOSHI, TAKEI TOSHIFUMI, MATSUMOTO HIDEKI

    We have to access a computer center which has supercomputers or large-scale cluster systems to carry out large-scale scientific computations. However, these are not familiar for us to execute the job at the center and therefore a delay of research might be occurred. We can build a portal site for scientific researchers to provide friendly computational environments. In this study, we have constructed a portal site for molecular dynamics (MD) simulations with two application components by using the Grid PSE Builder, which is a developing framework for grid-enabled problem solving environment (Grid PSE). Two components are a MD simulation component by using the parallel MD Stencil and an image generator for snapshots by MD simulations. PSE users can choose one of the components, and then jump to job sibmission page, job status page, and results download page in this portal site. An image generator provides a monitoring feature of a simulation as an animation on the web browser. We have confirmed the effect of the portal by applying it to a simulation of intrinsic transformation of vacancy dislocation loop in copper crystal.

    Information Processing Society of Japan (IPSJ), 16 Oct. 2003, IPSJ SIG Notes, 96, 55 - 60, Japanese

  • MPI Performance Evaluation on the Earth Simulator

    UEHARA HITOSHI, TAMURA MASANORI, ITAKURA KEN'ICHI, YOKOKAWA MITSUO

    The Earth Simulator is an ultra high-speed supercomputer which was developed for global environment change simulations. For achieving high performance computing on large scale distributed memory parallel computers such as the Earth Simulator, an optimization of communication processings in user applications is required, and the optimization needs an evaluation for performance of communication methods. On the Earth Simulator, Message Passing Interface (MPI) is supported as the communication method. We have evaluated performance of the MPI-1/MPI-2 functions on the Earth Simulator in detail using MBL which was developed for the measurements of MPI performance on various parallel computers. The results show that the maximum throughputs of ping-pong communication using MPI_Send are 14.8GB/s within a node and 11.8GB/s between two nodes. Latencies of MPI_Send and MPI_Put are 5.58 microseconds and 6.36 microseconds, respectively. On the condition that run one MPI-process on one node and use 512 nodes, latencies of MPI_Barrier and MPI_Win_fence are 3.25 microseconds and 223.75 microseconds, respectively. We found out that the MPI on the Earth Simulator has excellent performance.

    Information Processing Society of Japan (IPSJ), 15 Jan. 2003, 情報処理学会論文誌. ハイパフォーマンスコンピューティングシステム, 44 (6), 24 - 34, Japanese

  • Ishihara, T, Kaneda, Y, Yokokawa, M, Itakura, K, Uno, A

    The spectra of the squares of velocity quadratics including the energy dissipation rate epsilon per unit mass, the enstrophy omega(2) and the pressure p were measured using the data obtained from direct numerical simulations (DNSs) of incompressible turbulence in a periodic box with number of grid points up to 2048(3). These simulations were performed using the Earth Simulator computing system. The spectra for epsilon, omega(2) and p exhibited a wave number range in which the spectra scaled with the wave number k as proportional to k(-a). Exponent a for p was about 1.81, which is in good agreement with the value obtained by assuming the joint probability distribution of the velocity field to be Gaussian, while a values for epsilon and omega(2) were about 2/3, and very different from the Gaussian approximation values.

    PHYSICAL SOC JAPAN, 2003, Journal of the Physical Society of Japan, 72 (5), 983 - 986, English

    [Refereed]

  • Kaneda, Y, Ishihara, T, Yokokawa, M, Itakura, K, Uno, A

    High-resolution direct numerical simulations (DNSs) of incompressible homogeneous turbulence in a periodic box with up to 4096(3) grid points were performed on the Earth Simulator computing system. DNS databases, including the present results, suggest that the normalized mean energy dissipation rate per unit mass tends to a constant, independent of the fluid kinematic viscosity nu as nu-->0. The DNS results also suggest that the energy spectrum in the inertial subrange almost follows the Kolmogorov k(-5/3) scaling law, where k is the wavenumber, but the exponent is steeper than -5/3 by about 0.1. (C) 2003 American Institute of Physics.

    AMER INST PHYSICS, 2003, Physics of Fluids, 15 (2), L21 - L24, English

    [Refereed]

  • Performance tuning of a CFD code on the earth simulator

    Itakura, K, Uno, A, Yokokawa, M, Saito, M, Ishihara, T, Kaneda, Y

    High-resolution direct numerical simulations (DNSs) of incompressible turbulence with numbers of grid points up to 2048(3) have been executed on the Earth Simulator (ES). The DNSs are based on the Fourier spectral method, so that the equation for mass conservation is accurately solved. In DNSs based on the spectral method, most of the computation time is consumed in calculating the three-dimensional (3D) Fast Fourier Transform (FFT). In this paper, we tuned the 3D-FFT algorithm for the Earth Simulator and have achieved DNS at 16ATFLOPS on 2048(3) grid points.

    NEC CORP, 2003, Nec Research & Development, 44 (1), 115 - 120, English

    [Refereed]

  • news 「地球シミュレータ」による世界最大規模乱流シミュレーション

    横川 三津夫

    丸善, Oct. 2002, パリティ, 17 (10), 49 - 51, Japanese

  • Scalability Evaluation of Direct Numerical Simulation on Earth Simulator

    UNO ATSUYA, ITAKURA KEN'ICHI, YOKOKAWA MITSUO, ISHIHARA TAKASHI, KANEDA YUKIO

    There are two programming models on the shared-memory architecure. One, called the flat programming, is using MPI, and the other, called the hybrid programming, is using MPI and shared-memory models simultaneously. In general, it is difficult that the hybrid programming outperforms the flat programming. In this study, we evaluated a scalability of large-scale direct numerical simulations of the Navier-Stokes equations on the Earth Simulator. As a result, the hybrid programming could outpeform the flat programming on the Earth Simulator. Also, we discuss the tuning strategies to obtain higher performance on the Earth Simulator.

    Information Processing Society of Japan (IPSJ), 21 Aug. 2002, IPSJ SIG Notes, 91, 55 - 60, Japanese

  • Performance Evaluation of Hybrid Programming on Earth Simulator

    ITAKURA KEN'ICHI, UNO ATSUYA, UEHARA HITOSHI, SAITO MINORU, YOKOKAWA MITSUO

    The Earth Simulator has 640 processor nodes and its peak performance is 40 Tflop/s. Each node has 8 vector processors, each of which has 8 Gflop/s peak performance, and 16GByte shared main memory. There are two programming methods on a node of the Earth Simulator. One, called flat programming, is MPI on shared memory architecture, the other, called hybrid programming, is "microtask" processing by automatically parallelization by the compiler. In this study, we have evaluated three basic performances. The first one is a calculation time in a node with 8 vector processors which include microtask starting and closing overhead or MPI barrier. The second one is a data transfer time between two nodes with 1-by-1 MPI processes or 8-by-8 MPI processes. The last one is a time for application with the calculation and data transfer. Finally, we evaluated an application program which is large-scale direct numerical simulations of the Navier-Stokes equations. The most of calculation time of this application is 3 dimensional FFT. The total running time with 8 nodes (64 APs) are 4 and 30 seconds for 256^3 and 512^3 problem size, respectively. Since the difference time between two programming models is almost 1 second, the hybrid programming can be achieved the same performance as the flat programming.

    Information Processing Society of Japan (IPSJ), 27 May 2002, IPSJ SIG Notes, 90, 19 - 24, Japanese

  • Features of the HPF/ES Parallelizing compiler for irregular problems

    MURAI HITOSHI, ANAN NORIHISA, HAYASHI YASUHARU, SUEHIRO KENJI, SEO YOSHIKI, OKUDA HIROSHI, YOKOKAWA MITSUO

    We implemented a feature for irregular problems, called HALO, into the HPF/ES compiler on the Earth Simulator. HALO enhances irregular access and communication of an array, and makes it possible to write efficient parallel programs of irregular problems easily. This paper describes the usage and implementation of HALO and shows its evaluation results on the Earth Simulator. A Benchmark program of the finite element method parallelized with HALO achieved an over 10 times faster execution than the one parallelized without HALO on Earth Simulator.

    Information Processing Society of Japan (IPSJ), 27 May 2002, IPSJ SIG Notes, 90, 61 - 66, Japanese

  • 16.4TFlops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator

    Mitsuo Yokokawa, Ken'ichi Itakura, Atsuya Uno, Takashi Ishihara, Yukio Kaneda

    2002, Proc. IEEE/ACM SC2002 Conf., -, English

    [Refereed]

    Summary national conference

  • 地球シミュレータによる乱流のUltra Simulation

    金田行雄, 石原 卓, 横川三津夫, 板倉憲一, 宇野篤也

    2002, 九大応力研研究集会報告 14ME-S1, 90-97, Japanese

    Technical report

  • 乱流のUltra Simulation

    金田行雄, 石原 卓, 横川三津夫, 板倉憲一, 宇野篤也

    2002, 第6回シミュレーション・サイエンス・シンポジウム及び核融合科学研究所共同研究「大型シミュレーション研究」合同研究会集録NIFS-PROC-52, 24-27, Japanese

    Technical report

  • 地球シミュレータ上の一様等方性乱流シミュレーション

    横川三津夫, 斎藤 実, 石原 卓, 金田行雄

    2002, ハイパフォーマンスコンピューティングと計算科学シンポジウムHPCS2002, 125-131, Japanese

    Technical report

  • High Speed Calculation for Solid Molecular Dynamics on Vector Processors

    ITAKURA KEN'ICHI, YOKOKAWA MITSUO, SHIMIZU DAISHI, KIMIZUKA HAJIME, KABURAKI HIDEO

    The Earth Simulator which is under development has 640 processor nodes and its peak performance is 40 Tflop/s. Each node has 8 vector processors, each of which has 8 Gflop/s peak performance, and 16GByte shared main memory. In this study, we have evaluated performance of solid molecular dynamics simulation on an SMP node of the Earth Simulator. In molecular dynamics simulation, each particle is influenced by all particles within a cut-off region and the representation of these pairs of particles is made by a matrix. Two matrix representations, compressed row form and jagged diagonal form, are considered for vectorization. The jagged diagonal form is better than the compressed row form in performance on a vector processor for the force calculation of every pairs, because the vector length of the former is longer than that of the latter. However, computational cost for converting, the normal matrix form to the jagged diagonal form is quite expensive and the total performance in using the jagged diagonal form is low. with the jagged diagonal form is obtained. Speedup by parallelization with the compressed row form is 2.4 to 2.7 with 8 vector processors.

    Information Processing Society of Japan (IPSJ), 26 Oct. 2001, IPSJ SIG Notes, 88, 67 - 72, Japanese

  • MBL2 : MPI benchmark program library for MPI-2

    UEHARA Hitoshi, TSUDA Yoshinori, YOKOKAWA Mitsuo

    MPI is one of major message communication interfaces for application programs. The MPI consists of an MPI-1 as a basic specification, and an MPI-2 as extensions. Some benchmark programs for MPI-1 have been proposed already. However benchmark programs for MPI-2 are a little and their measurements are limited. We have developed an MPI benchmark program library for MPI-2 (MBL2) which measures the detail performance of MPI-I/O and RMA functions of MPI-2. In this report, we describe the MBL2 and performance data of MPI-2 on VPP5000 and SX-5, which we measured using MBL2.

    Information Processing Society of Japan (IPSJ), 25 Jul. 2001, IPSJ SIG Notes, 87, 67 - 72, Japanese

  • Yokokawa Mitsuo

    The Japan Society for Industrial and Applied Mathematics, 2001, Bulletin of the Japan Society for Industrial and Applied Mathematics, 11 (1), 79 - 81, Japanese

  • YOKOKAWA Mitsuo, SAITO Minoru, HAGIWARA Takashi, ISOBE Yoko, JINGUJI Satoshi

    Earth simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a full crossbar network. Each processor node is a shared memory system which is composed of eight vector processors. The total peak performance and main memory capacity are 40Tflops and 10TB, respectively. A performance prediction system GS3 for the Earth Simulator has been developed to estimate sustained performance of programs. To validate accuracy of vector performance prediction by the GS3, the processing times for three groups of kernel loops estimated by the GS3 are compared with the ones measured on SX-4. It is found that the absolute relative errors of the processing time are 0.89%, 1.42% and 6.81% in average for three groups. The sustained performance of three groups on a processor of the Earth Simulator have been estimated by the GS3 and those performance are 5.94Gflops, 3.76Gflops and 2.17Gflops in average.

    JAPAN SOCIETY FOR COMPUTATIONAL ENGINEERING AND SCIENCE, 2001, Transactions of the Japan Society for Computational Engineering and Science, 2001 (0), 20010040 - 20010040, Japanese

  • Earth Simulator Project: Visualizing an Aspect of the Future of the Earth by a Supercomputer

    Mitsuo YOKOKAWA, Keiji TANI

    科学技術庁は,プロセス(基礎科学)研究,観測,計算機シミュレーションの三位一体で地球環境変動予測研究を推進するプロジェクトの一環として,大気大循環シミュレーションで超高速並列計算機「地球シミュレータ」を開発中である.この地球シミュレータのハードウェア,基本ソフトウェア,応用ソフトウェアについてその概要を解説する.

    Information Processing Society of Japan (IPSJ), 15 Apr. 2000, IPSJ Magazine, 41 (4), 369 - 374, Japanese

  • Earth Simulator Project: Seeking a Guide Line for the Symbiosis between the Earth (Gaia) and Human Beings

    Keiji TANI, Mitsuo YOKOKAWA

    科学技術庁は,地球環境変動予測研究を推進するプロジェクトの一環として,大気大循環シミュレーションで超高速並列計算機「地球シミュレータ」を開発中である.この地球シミュレータ開発の必要性,応用のターゲット,そのために求められる計算機としての要件,開発スケジュール,さらには,世界の高性能計算機開発計画における位置付けなどについて解説する.

    Information Processing Society of Japan (IPSJ), 15 Mar. 2000, IPSJ Magazine, 41 (3), 249 - 254, Japanese

  • An Evaluation of HPF Implementation on Cenju-4

    TAKAHASHI MASAKI, SUEHIRO KENJI, SEO YOSHIKI, YOKOKAWA MITSUO

    High Perrormance Fortran (HPF) is considered one of the major parallel programming interfaces as well as Massage Passing Interface (MPI). HPF is a high-level data parallel language designed to provide a clear and easily understood programming interface. Users can parallelize their sequential programs by mainly inserting directives specifying data mapping on distributed memories. We plan to adopt HPF as a common parallel programming interface on the Earth Simulator, which is a distributed memory parallel system under development mainly targeting earth science. In this paper, we evaluated efficiency and describability of HPF by parallelizing two application programs originally developed for sequential execution. The evaluation results showed that users can obtain good scalability by HPF programming with relatively small effort.

    Information Processing Society of Japan (IPSJ), 03 Dec. 1999, IPSJ SIG Notes, 79, 49 - 54, Japanese

  • Development of Performance Estimation System for the Earth Simulator

    YOKOKAWA MITSUO, SHINGU SATORU, HAGIWARA TAKASHI, ISOBE YOKO, TAKAHASHI MASAKI, KAWAI SHINICHI, TANI KEIJI, MIYOSHI HAJIME

    The Earth simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a cross barnetwork. Each processor node is a shared memory system which is composed of eight vector processors. The total peak performance and main memory are 40Tflop/s and 10TB, respectively. A software simulator (GSSS) for the Earth Simulator and its similar computers has been developed to estimate the sustained performance of programs. To validate an accuracy of the software simulator, the processing times for some kernel DO loops estimated by the GSSS are compared with the ones measured on an SX-4. It is found that the absolute relative error of the processing time is about 1% in average. The sustained performance of the kernel loops on the Earth Simulator has been estimated by the GSSS and a performance of 4.18Gflop/s in average is obtained.

    Information Processing Society of Japan (IPSJ), 04 Mar. 1999, IPSJ SIG Notes, 132 (21), 55 - 60, Japanese

  • Development of Parallel Libraries

    SHIMIZU Futoshi, SASAKI Makoto, ICHIHARA Kiyoshi, KISHIDA Norio, SUZUKI Soichiro, SATO Shigeru, TANAKA Yasuhisa, YOKOKAWA Mitsuo, KABURAKI Hideo

    In recent years, since the development of fast processor element has reached the upper limit, parallelism is one of the solutions for massive numerical simulations, and efficient and portable parallel libraries are needed. With MPI or PVM which is not limited to a certain architecture, we are developing parallel subroutine libraries specific for use on parallel vector processors. We report here the development of parallel subroutine library for eigenvalue problem of real symmetric matrix based on the Householder transformation and the bisection method. The matrix is partitioned into columns by the column-wide cyclic decomposition scheme, and all elements of the symmetric matrix are stored in order to reduce data exchanges among processors. For the Householder transformation using eight processors on Paragon, the speedup ratio of 6.0 has been achieved for a matrix of 2000×2000 elements. In the case of the matrix of 4000×4000 elements, the ratio is found to be 4.2 on VPP300.

    Information Processing Society of Japan (IPSJ), 28 Aug. 1996, IPSJ SIG Notes, 62, 129 - 134, Japanese

  • Parallelization of a Numerical Simulation Code on Isotropic Turbulence

    SATO Shigeru, YOKOKAWA Mitsuo, WATANABE Tadashi, KABURAKI Hideo

    日本計算工学会, 29 May 1996, Proceedings of the conference on computational engineering and science, 1 (1), 97 - 100, Japanese

  • PARALLELIZATION OF LATTICE BOLTZMANN CODES

    SUZUKI Soichiro, YOKOKAWA Mitsuo, KABURAKI Hideo

    日本計算工学会, 29 May 1996, Proceedings of the conference on computational engineering and science, 1 (1), 101 - 104, Japanese

  • VECTORIZATION AND VECTOR PARALLELIZATION OF MD SIMULATION ON VPP500

    TANAKA Yasuhisa, YOKOKAWA Mitsuo, KABURAKI Hideo

    日本計算工学会, 29 May 1996, Proceedings of the conference on computational engineering and science, 1 (1), 105 - 108, Japanese

  • DSMC ANALYSIS FOR THE RAYLEIGH-BENARD FLOW BY PARALLEL COMPUTING

    KISHIDA Norio, YOKOKAWA Mitsuo, WATANABE Tadashi, KABURAKI Hideo

    日本計算工学会, 29 May 1996, Proceedings of the conference on computational engineering and science, 1 (1), 117 - 120, Japanese

  • PARALLELIZATION OF PRESSURE EQUATION SOLVER FOR INCOMPRESSIBLE N-S EQUATION

    ICHIHARA Kiyoshi, YOKOKAWA Mitsuo, KABURAKI Hideo

    日本計算工学会, 29 May 1996, Proceedings of the conference on computational engineering and science, 1 (1), 377 - 380, Japanese

  • YOKOKAWA Mitsuo

    The successive overrelaxation method, also called the SOR method, is one of the iterative methods for so1ving a linear system of equations, and has been used in many programs. With the advent of vector processors, the SOR method is executed efficiently in parallel with the red-black or hyperplane ordering on vector processors. In this paper, a parallel scheme termed the 4-color SOR method is revised and compared with the natural and red-black SOR methods on a multiprocessor system. The 4-color SOR method has the highest parallel performance of the three. The parallel-vector calculation of the 4-color method with 4 processors is about 10 times faster than the scalar one with one processor.

    The Japan Society of Mechanical Engineers, 1990, TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B, 56 (524), 1062 - 1065, Japanese

  • KUNUGI Tomoaki, YOKOKAWA Mitsuo

    Many benchmark problems for the numerical analysis of the fluid flow have been proposed. In this paper, a lid-driven cavity flow which is one of the most famous benchmark problems is examined. Four numerical schemes are compared with each other in terms of the grid dependency and accuracy of the solutions. From the viewpoint of the interaction between neighboring computational cells, we choose a 'CONDIF : Controlled Numerical Diffusion with Internal Feedback' approach developed by Runchal and we check this scheme for this cavity flow problem. Consequently, it is found that the CONDIF' approach is very stable for the grid Peclet number condition and the time step is not sensitive to the Courant condition.

    The Japan Society of Mechanical Engineers, 1989, TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B, 55 (515), 1823 - 1828, Japanese

  • KUNUGI Tomoaki, YOKOKAWA Mitsuo, SUMIYOSHI Makoto, AKIYAMA Mitsunobu, NAKANISHI Jun-ichi

    Three-dimensional laminar flows in various curved pipes # (i. e., Rc/a=2 and 9 for curved 180° pipes and Rc/a=2 for curved 90° pipes are numerically simulated through use of the full Navier-Stokes equations. The boundary-fitted coordinate system is used in order to treat complicated boundary configurations like a strongly curved pipe. The obtained flow patterns and the magnitude of the secondary flow in the case of Rc/a=9 for a curved 180° pipe are in good agreement with the experimental results. The elliptic nature of the basis equation is very important in simulating these recirculating flows, especially in the case of small curvature. Two separation regions, one occurring on the outside wall near the inlet and the other occurring on the inside wall at the outlet, are found in the case of Rc/a=2. Not only is the crossover point between the shear stress maxima on the inside and outside of the pipe independent of the curvature and the bending angle of the pipe, but it is also independent of the initial profile at the straight pipe.

    The Japan Society of Mechanical Engineers, 1989, TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series B, 55 (518), 3011 - 3018, Japanese

  • KUNUGI Tomoaki, YOKOKAWA Mitsuo, AKIYAMA Mitsunobu, NAKANISHI Jun-ichi

    Three-dimensional laminar flows in various curved pipes(i.e., Rc/a=2, 4, 7 and 9 for curved 180 degree pipes and Rc/a=2 and 4 for curved 90 degree pipes) were numerically simulated by using time-dependent incompressible Navier-Stokes equations. The boundary-fitted coordinate system was used in order to treat the complicated boundary configurations of strongly curved pipes. The obtained flow patterns and the magnitude of the secondary flow in case of Rc/a=9 for curved 180 degree pipe were in good agreement with the experimental results. The ellipticity of the basic equation was very important to simulate these recirculating flows, especially when the curvature ratio was small. Velocity and pressure fields were visualized by a pseudo-color technique on a workstation.

    THE VISUALIZATION SOCIETY OF JAPAN, 1988, JOURNAL OF THE FLOW VISUALIZATION SOCIETY OF JAPAN, 8 (30), 201 - 204, Japanese

  • ISHIKAWA Hirohiko, YOKOKAWA Mitsuo, ASAI Kiyoshi

    This paper describes the reduction of computation time of a large sparse linear equations obtained by discretization of a three-dimensional Poisson's equation using the finite difference method. The equation is induced from wind field calculations, which are needed for evaluation of environmental consequences due to radioactive effluents.
    Various iterative methods, such aS ICCG, MICCG, ILUCR and MILUCR methods, are applied to solving linear equations and are compared with SOR method. The optimum value of the acceleration factor of SOR method can be obtained numerically according to atmospheric stability for each nuclear site, and the iterations are minimized by using this optimum value.
    The computation time for MICCG or MILUCR method is a half of that for SOR method. The ILUCR method is better than SOR method, because it does not use acceleration factor and the computation time is shorter. The use of vector computer drastically reduces the computation time, and all the iterative methods are applicable. A scalar computer, however, favors the use of MILUCR or MICCG methods because of a half of the computation time for SOR method.

    Atomic Energy Society of Japan, 1987, Journal of the Atomic Energy Society of Japan / Atomic Energy Society of Japan, 29 (2), 158 - 163, Japanese

  • A Readily Available Problem Description Language for Mathematical Programming and Its Processing System

    FUJII MINORU, SAITO HIROKAZU, YOKOKAWA MITSUO, SATO OSAMU, YASUKAWA SHIGERU

    線形計画問題 混合整数計画問題等の線形の数理計画問題は 計算機の高速化とアルゴリズムの改良により 一部の大規模問題を除いて ほとんどの問題がわずかな計算時間で解けるようになっている.しかし 多くのユーザは 数理計画問題を簡単に記述できる言語がないため 計算機への入力データ作成に多大な時間を費やしている.このため 筆者らは 科学計算型の極めて簡単な数理計画問題記述言語PDL/MPを開発した.この言語は 数理計画問題を数式に近い形で記述できるため 言語の習得 問題の記述 修正が極めて短時間にできる.PDL/MPの処理システムは PDL/MPで記述された問題を解釈し 世界中で幅広く使われているMPS系ソフトウェアの入力データを自動作成する.本論文では PDL/MPの概要 処理システム 適用例について記述する.

    Information Processing Society of Japan (IPSJ), 15 Sep. 1986, IPSJ Journal, 27 (9), 880 - 891, Japanese

  • An Optimal Design of a Computer Complex in an Area Including Vector Computers

    FUJII MINORU, YOKOKAWA MITSUO

    近年 計算機利用の多様化 増大にともない 一地域に複数の計算機を設置するユーザが増えてきた.本論文では このような一地域複合計算機システムの構成設計に焦点をあて 与えられた計算需要が処理可能なコスト最小の最適計算機構成を求める混合整数計画モデルについて記述する.本モデルは 既存のモデルに比べ以下の特長をもつ.(i)べクトル計算機を対象計算機として扱える.(ii)2レベルの目的関数を用いることにより 最適計算機構成と最適計算機構成のもとでの最適ジョブ負荷配分を同時に求めることができる.(iii)計算機システムの運用形態 運用時間帯などに複数の種別が設定でき さまざまな運用制約を記述できる.(iv)線形モデルであるため 幅広く使われている数理計画汎用ソフトウェアで容易に解くことができる.

    Information Processing Society of Japan (IPSJ), 15 Sep. 1985, IPSJ Journal, 26 (5), 807 - 814, Japanese

Books etc

  • Contemporary High Performance Computing: From Petascale toward Exascale, Volume Two, Chapter 5 "The K Computer"

    YOKOKAWA Mitsuo, SHOJI Fumiyoshi, HASEGAWA Yukihiro

    Others, Taylor & Francis Inc., Apr. 2015, English, The K computer is a distributed memory supercomputer system with 82,944 compute nodes and 5,184 I/O nodes that was jointly developed by RIKEN and Fujitsu as a Japanese national project. In chapter 5 of the book, outline of the project, architecture of the K computer, and performance of applications are described., ISBN: 9781498700627

    Scholarly book

  • Feasibility Study of a Future HPC System for Memory Intensive Applications: Conceptual Design of Storage System

    ITAKURA Ken'ichi, YAMASHITA Akihiro, SATAKE Koji, UEHARA Hitoshi, UNO Atsuya, YOKOKAWA Mitsuo

    Others, Springer International Publishing AG, Mar. 2015, English, We started a feasibility study of high-end computing systems as a 2-year national project in 2012 for exa-scale computing era. In order to realize the exa-scale system, it is extremely important to design a large scale storage system with high bandwidth.We made a conceptual design of a mass storage system with high-speed I/O technology by studying future I/O requirements from s, ISBN: 9783319106250

    Scholarly book

Presentations

  • Hybrid Computation on Building Responses against Earthquake on a VH and VEs of SX-Aurora TSUBASA

    Mitsuo Yokoakwa

    Workbench on Sustained Simulation Performance (WSSP2021), 17 Mar. 2021, English

    Oral presentation

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 岐阜, Mar. 2019, Japanese, 理化学研究所, 岐阜,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 山口, Jan. 2019, Japanese, 理化学研究所, 山口,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 水戸, Dec. 2018, Japanese, 理化学研究所, 水戸,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 大津, Mar. 2018, Japanese, 理化学研究所, 大津,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 大分, Mar. 2018, Japanese, 理化学研究所, 大分,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • Energy-Preserving Parareal Algorithm for the Hamilton Equation

    Ishikawa Ai, Yaguchi Takaharu, Yokokawa Mitsuo

    SIAM Conference on Parallel Processing for Scientific Computing, Mar. 2018, English, International conference

    Nominated symposium

  • 私たちに身近なスパコンを知ろう!

    YOKOKAWA Mitsuo

    スパコンを知る集い in 長野, Dec. 2017, Japanese, 理化学研究所, 長野,日本, スーパーコンピュータ(スパコン)は,多くの方々にとって遠い存在かもしれない.しかし,毎朝目にする天気予報,自動車や高層ビルの設計,新しい薬の開発など,とても身近なところで毎日の生活を支えている.本講演では,スパコンが実際にどのような場面で役立っているか,私たちの生活の身近な題材を例にして,スーパーコンピュータやシミュレーションの役割,重要性について説明する.また,世界トップレベルのスパコン「京」や,さらに次世代のスパコンがひらく未来を紹介する., Domestic conference

    Oral presentation

  • Second-order structure function in high-resolution DNSs of turbulence - Where is the inertial subrange?

    ISHIHARA Takashi, KANEDA Yukio, MORISHITA Koji, YOKOKAWA Mitsuo, UNO Atsuya

    APS Division of Fluid Dynamics (Fall), Nov. 2017, English, American Physical Society, Denver, U.S.A., We report some results of a series of high resolution direct numerical simulations (DNSs) of forced incompressible isotropic turbulence with up to 122883 grid points and Taylor microscale Reynolds number Rlambda 2300. The DNSs show that there exists a scale range, approximately at 100< r/eta < 600 (eta is the Kolmogorov length scale), where the second-order longitudinal velocit, International conference

    Oral presentation

  • A parallel solver for a linear system with symmetric sparse matrix by one-way dissection ordering

    YOKOKAWA Mitsuo, NAKANO Tomoki, FUKAYA Takeshi, YAMAMOTO Yusaku

    Workshop on Sustained Simulation Performance, Oct. 2017, English, HLRS, University of Stuttgart, Stuttgart, Germany, A direct method for solving a linear system of equations is difficult to parallelize due to recurrence of computational sequences of the solver. In my talk, a parallel computation is applied to the linear system with symmetric sparse matrix which is ordered by a one-dissection method. A performance of thread-based parallelization will be presented., International conference

    Oral presentation

  • Performance of DNS of canonical tubulence and some simulation resultson the K computer

    YOKOKAWA Mitsuo, MORISHITA Koji, ISHIHARA Takashi, UNO Atsuya, KANEDA Yukio

    Russian Supercomputing Days 2017, Sep. 2017, English, Supercomputing Consortium of Russian Universities, Moscow, Russia, Large scale direct numerical simulations (DNSs) of incompressible homogeneous turbulence in a periodic box with the number of grid points up to 12288^3 were carried out on the K computer. The DNS code was parallelized by using Message Passing Interface (MPI) and OpneMP with two dimensional domain decomposition. Simulation results and its performace will be presented in the tal, International conference

    Oral presentation

  • 身近な現象をコンピュータで見てみよう

    YOKOKAWA Mitsuo

    兵庫県数学・理科甲子園ジュニア2017, Aug. 2017, Japanese, 兵庫県教育委員会, 神戸,日本, 「スーパーコンピュータって何だろう?」,「私たちのくらしと、どうつながっているのかな?」 テレビや新聞等で報道されている「スーパーコンピュータ『京』」について,その仕組や活用方法などについて講演する., Domestic conference

    Oral presentation

  • スパコンって何だろう? 何が出来るんだろう?

    YOKOKAWA Mitsuo

    理化学研究所百年記念講演会, Jul. 2017, Japanese, バンドー神戸青少年科学館, 理化学研究所, 神戸,日本, 毎日の天気予報,インフルエンザに効く薬の開発,宇宙の始まりの研究...これらにはスーパーコンピュータが利用されているのを知っているでしょうか?なんとスーパーコンピュータは,皆さんが気付かないところで大活躍しているとっても身近なものなのです.では,スーパーコンピュータは,家にあるパソコンやみんなが持っているスマホと何が違うのでしょうか? また,どうやってスーパーコンピュータで天気予報や薬の開発が出来るのでしょうか? 理化学研究所と富士通が開発したスーパーコンピュータ「京」の話を取り入れながら,スーパーコンピュータを紐解きます., Domestic conference

    Oral presentation

  • 離散偏導関数法と数値積分の併用

    南部 匡範, 谷口 隆晴, YOKOKAWA MITSUO

    第46回数値解析シンポジウム, 2017, Japanese, Domestic conference

    Oral presentation

  • Discrete partial derivative method with numerical integrations

    Nanbu Masanori, Yaguchi Takaharu, Yokokawa Mitsuo

    the International Conference on Scientific Computation And Differential Equations 2017 (SciCADE 2017), 2017, English, International conference

    Oral presentation

  • Performance Study on Two-Path Aliasing-Free Calculation of a Spectral DNS Code

    YOKOKAWA Mitsuo, MORISHITA Koji, UNO Atsuya, ISHIHARA Takashi, KANEDA Yukio

    The 23rd Workshop on Sustained Simulation Performance, Mar. 2016, English, Tohoku University, Sendai, Japan, Two-path aliasing-free calculation of a spectral DNS code is investigated on the two parallel computers, FX10 and SX-ACE., International conference

    Oral presentation

  • Energy spectrum in high Reynolds number turbulence - high resolution DNS results

    Koji Morishita, Takashi Ishihara, Yukio Kaneda, Mitsuo Yokokawa, Atsuya Uno

    68th Annual Meeting of the APS Division of Fluid Dynamics, Nov. 2015, English, Boston, USA, International conference

    Oral presentation

  • 大規模直接数値シミュレーションによる乱流のエネルギースペクトル

    石原 卓, 森下 浩二, 横川 三津夫, 宇野 篤也, 金田 行雄

    日本物理学会2015年秋季大会, Sep. 2015, Japanese, 吹田, Domestic conference

    Oral presentation

  • Performance Evaluation of Iterative Method for Multiple Vectors Associated with a Large-Scale Space Matrix

    IMAMURA Seigo, ONO Kenji, YOKOKAWA Mitsuo

    27th International Conference on Parallel Computational Fluid Dynamics (Parallel CFD 2015), May 2015, English, McGill University, Montreal, Canada, This paper reports a high performance iterative method for multiple solution vectors associated with a common sparse coefficient matrix in stencil computation., International conference

    Oral presentation

  • High-efficiency direct numerical simulation of turbulence by a fourier spectral method on the K computer

    Koji Morishita, Mitsuo Yokokawa, Atsuya Uno, Takashi Ishihara, Yukio Kaneda

    27th International Conference on Parallel Computational Fluid Dynamics, May 2015, English, Montreal, CANADA, International conference

    Oral presentation

  • Direct numerical simulation of high reynolds number turbulence by the K computer

    T. Ishihara, K. Morishita, M. Yokokawa, A. Uno, Y. Kaneda

    JAPAN-RUSSIA WORKSHOP ON SUPERCOMPUTER MODELING, INSTABILITY AND TURBULENCE IN FLUID DYNAMICS(JR SMIT2015), Mar. 2015, English, Moscow, Russia, International conference

    Oral presentation

  • Energy Spectra of Higher Reynolds Number Turbulence by the DNS with up to 12288³ Grid Points

    Takashi Ishihara, Yukio Kaneda, Koji Morishita, Mitsuo Yokokawa

    67th Annual Meeting of the APS Division of Fluid Dynamics, Nov. 2014, English, San Francisco, USA, International conference

    Oral presentation

  • スーパーコンピュータ開発の経験から

    YOKOKAWA Mitsuo

    低温工学・超伝導学会第2回関西支部講演会, Jul. 2014, Japanese, 低温工学・超伝導学会関西支部, 神戸市, 計算機シミュレーションにより様々な現象の解明を試みる「計算科学」は,「理論」,「実験」と並ぶ第3の科学手法として有力な研究手段になっており,この基盤ツールとしてのスーパーコンピュータは益々重要となっている.本講演では,国家プロジェクトとして開発された2つのスーパーコンピュータ,地球シミュレータとスーパーコンピュータ「京」の開発事例について講演する, Domestic conference

    Oral presentation

  • スーパーコンピュータ「京」の開発プロジェクトを終えて

    YOKOKAWA Mitsuo

    電子情報通信学会EMCJ/IEE-EMC講演会, Jun. 2014, Japanese, 電気情報通信学会, 神戸市, 計算機シミュレーションにより様々な現象の解明を試みる「計算科学」は,「理論」,「実験」と並ぶ第3の科学手法として有力な研究手段になっており,この基盤ツールとしてのスーパーコンピュータは益々重要となっている.本講演では,国家プロジェクトとして開発され,世界で初めてLINPACK性能 10PFLOPSを超えたスーパーコンピュータ「京」の開発経緯とその意義について述べる., Domestic conference

    [Invited]

    Invited oral presentation

  • スーパーコンピュータ「京」開発の軌跡とその意義

    YOKOKAWA MITSUO

    電気学会関西支部講演会, Mar. 2014, Japanese, 電気学会関西支部, 神戸, 計算機シミュレーションにより様々な現象の解明を試みる「計算科学」は,「理論」,「実験」と並ぶ第3の科学手法として有力な研究手段になっており,この基盤ツールとしてのスーパーコンピュータは益々重要となっている.本講演では,国家プロジェクトとして開発され,世界で初めてLINPACK性能 10PFLOPSを超えたスーパーコンピュータ「京」の開発経緯とその意義について述べる., Domestic conference

    Public discourse

  • 「京」コンピュータを用いたカノニカル乱流の大規模直接数値シミュレーション

    ISHIHARA TAKASHI, MORISHITA KOJI, YOKOKAWA MITSUO, UNO ATSUYA, KANEDA YUKIO

    第27回数値流体力学シンポジウム, Dec. 2013, Japanese, 名古屋市, Domestic conference

    Oral presentation

  • Experiences of the Development of Supercomputers - Earth Simulator and K computer -

    YOKOKAWA MITSUO

    The 24th Magnetic Recording Conference (TMRC 2013), Aug. 2013, English, IEEE, 東京, The Earth Simulator was a distributed memory, parallel supercomputer using vector architecture for processors. The development project was started in 1997 and completed in 2002. The system was used to promote research into global climate change forecasts using computer simulations. The target performance was at least 5 tera­flop/s for an atmospheric, general circulation, International conference

    Keynote oral presentation

  • スーパーコンピュータ「京」とその応用について

    YOKOKAWA MITSUO

    第57回システム制御情報学会研究発表講演会, May 2013, Japanese, システム制御情報学会, 神戸, 文部科学省「革新的ハイパフォーマンス・コンピューティング・インフラ(HPCI)の構築」プログラムの中核となるスーパーコンピュータ「京」は,理研と富士通が共同で開発したもので,88,128個の計算ノードから成る大規模分散並列コンピュータである.平成23年11月に世界で初めてLINPACK性能10ペタフロップスを達成した.平成24年6月に完成,平成24年9月から一般利用への共用を開始している.本講演では,「京」開発の概要とその応用などを紹介する., Domestic conference

    Invited oral presentation

  • 世界一を達成したスーパーコンピュータ「京」

    YOKOKAWA MITSUO

    平成25年度産学連携シンポジウム, Apr. 2013, Japanese, プロジェクトマネジメント学会関西支部, 神戸, 第3期科学技術基本計画において国家基幹技術として位置づけられ,理化学研究所と富士通が共同で開発を進めてきたスーパーコンピュータ「京(けい)」は,昨年6月に完成し,9月から一般利用が開始された.「京」は,82,944台の計算ノードからなる分散メモリ型並列計算機システムである.計算ノードは,一つのCPU(SPARC64 VIIIfx),16ギガバイトのメモリ,計算ノードを繋ぐインターコネクト用LSI(ICC)で構成されており,また計算ノード間をつなぐ接続(Tofuインターコネクト)は,新規に開発された6次元メッシュ/トーラス型の直接網となっている.理論性能10.6ペタフロップスをもつ「京」は,世界で最初にLINPACK性能10ペタフロップス(1秒間に1京回の浮動小数点演算ができる性能)を達成し,スーパーコンピュータ性能ランキングTOP500において世界一, Domestic conference

    Public discourse

  • The K Computer – Toward Its Productive Applications to Our Life

    YOKOKAWA MITSUO

    International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12), Nov. 2012, English, ACM, IEEE, Salt Lake City, USA, No one doubts that computer simulations are now indispensable techniques to elucidate natural phenomena and to design artificial structures with the help of growing power of supercomputers. Many countries are committed to have supercomputer as a fundamental tool for their national competitiveness. The Next-Generation Supercomputer Development Project was started in 2006 as a se, International conference

    [Invited]

    Invited oral presentation

Association Memberships

  • The Japan Society of Fluid Mechanics

  • The Japan Society for Industrial and Applied Mathematics

  • Information Processing Society of Japan

Research Projects

  • 横川 三津夫

    日本学術振興会, 科学研究費助成事業 基盤研究(C), 基盤研究(C), 神戸大学, 01 Apr. 2018 - 31 Mar. 2021, Principal investigator

    本研究では,建物・地盤連成地震動応答シミュレーションコードのマルチコアシステムにおける高速解法を開発,評価することが目的である. 建築物の耐震性を検討するために,地盤とその上に立つ建物の地震動応答を求める数値シミュレーションが行われている.このようなシミュレーションでは,複雑な領域を対象とするために,一般には3次元有限要素法による計算格子に対し,各格子点における運動方程式を立て,それらを離散化して得られる大規模連立一次方程式を解くことが行われている.有限要素法を適用して得られる全体行列は疎で不規則な形となるが,建物に対応する部分行列と地盤に対応する部分行列の性質をうまく利用したコレスキー法による部分的前処理,及びスケーリングによる部分的前処理を適用した共役勾配法(PSCCG法)を用いて,逐次計算によって解かれている.コレスキー法部分の前処理は直接法となるため並列化が困難であるが,他の部分については並列実行が可能である.本研究では,この解法による計算時間をマルチコアシステム上での並列化することにより短縮する. 2018年度は,本手法の実行可能性を検討するために,モデル問題として2次元計算領域における拡散方程式に対し,有限差分法による離散化を行った.離散化において,計算領域の部分ごとに拡散係数を変化させることにより,条件数が大きい部分行列を持つ全体行列の構成した.同時に,既存の建物・地盤連成地震動解析の時間発展シミュレーションコードで実装されているPSCCG法の計算部分に対し,プロセス並列実装を行い,スレッド数を1に固定し,かつプロセス数を増加させた時の並列性能を評価した.

    Competitive research funding

  • 強スケーリング性能を指向した計算物理向け超並列行列計算ライブラリの開発

    山本 有作, 横川 三津夫, 星 健夫

    日本学術振興会, 科学研究費助成事業 基盤研究(B), 基盤研究(B), 電気通信大学, 01 Apr. 2017 - 31 Mar. 2020

    (1) プラズマシミュレーションの電位計算部分で現れる連立1次方程式向けに,ブロック赤黒順序付け法に基づく高並列な不完全LU分解型前処理プログラムを開発し,GPU上で実装した。性能評価の結果,マルチコアプロセッサ上での実行に比べて10倍以上の加速が得られた。また,GPU向けのソルバであるMAGMAと比較しても,大規模問題の場合に3倍程度の速度向上が見られた。
    (2) 超並列一般化固有値問題ライブラリに対する自作ミドルウェアEigenKernel(https://github.com/eigenkernel/)を発展させ,(i) Oakforest-PACSでのベンチマーク, (ii) マルコフ連鎖モンテカルロ型ベイズ推定による強スケーリング性能予測(外挿),の2点を実現した。また,超並列性に優れた選択的固有対計算アルゴリズムを提案し,電子状態計算の実問題へと適用し,手法の有用性を示し,コードを公開した(https://github.com/lee-djl/k-ep)。超並列型固有値問題ならではの応用研究として,有機デバイス材料(ペンタセン薄膜,10ナノメートルスケール系)の計算を行い,デバイス性能に直接関わる準局在波動関数を得た。
    (3) 建物の地震動応答シミュレーションに現れる大規模連立1次方程式に対し,部分的不完全コレスキー分解前処理付き共役勾配法(CG method preconditioned with partially incomplete Cholesky decomposition)のプロセス並列実装を行い並列性能の評価を行った。また,正定値対称疎行列を係数行列に持つ連立1次方程式に対し,緩和型スーパーノードマルチフロンタル法によるコレスキー分解法を適用し,スーパーノードと見做すパラメータに対する計算性能について評価し,最適な緩和パラメーターの指針を得た。

  • Supporting performance-aware programming with machine learning techniques

    Hiroyuki Takizawa, Kobayashi Hiroaki, Suda Reiji, Okatani Takayuki, Egawa Ryusuke, Ohshima Satoshi

    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B), Grant-in-Aid for Scientific Research (B), Tohoku University, 01 Apr. 2016 - 31 Mar. 2019

    This work has demonstrated some case studies of effectively using machine learning techniques for supporting High-Performance Computing (HPC) programming. Various problems in code optimization can be solved by converting the problems to the problems that have already been proven to be solved by machine learning. Moreover, this work clarified the importance of analyzing the target problems in advance of machine learning, because it is unlikely that a sufficient number of training data are available in code optimization problems. Moreover, as well as HPC programming, machine learning also needs knowledge and experiences of human experts. However, in machine learning, the problem is already parameterized, and hence can be solved if sufficiently-high performance is available.

  • An extended linear algebra library for electronic structure calculation and its optimization for many-core processors

    Yamamoto Yusaku, IMAMURA TOSHIYUKI, INADOMI YUICHI, Vajtersic Marian

    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B), Grant-in-Aid for Scientific Research (B), The University of Electro-Communications, 01 Apr. 2014 - 31 Mar. 2018

    In this project, we aimed at developing solvers for numerical linear algebra functions used in electronic structure calculations. The main achievements of this project are as follows. (1) We performed an error analysis of the CholeskyQR2 method, which is a promising communication-avoiding algorithm for the QR decomposition, and proved its numerical stability. (2) We developed a parallel linear equation solver based on the one-way dissection for quantum wave dynamics. (3) We developed a generalized eigenvalue solver EigenKernel, which is a hybrid solver that combines three eigenvalue solvers for dense matrices. These results will be useful for accelerating electronic structure calculations on many-core processors.

  • YOKOKAWA MITSUO, ISHIHARA Takashi

    Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C), Grant-in-Aid for Scientific Research (C), Kobe University, 01 Apr. 2014 - 31 Mar. 2017, Principal investigator

    A parallel direct numerical simulation (DNS) code was developed on the K computer for solving the Navier-Stokes equations in a box with periodic boundary conditions for three orthogonal axes. The objective of the code is to simulate behaviors of homogeneous, isotropic incompressible turbulent flow, or canonical turbulent flow, which is the most standard flow among turbulent flows without boundary walls. Pesudo-spectral method is used for discretization in 3-dimensional space and forth-order Runge-Kutta method is used for temporal discretization. Hybrid parallelization with both OpenMP and MPI are adopted in the code. The DNS with grid points of 12288 cubed were carried out on the K computer with sustained performance of 2.2% on the 37,376 compute nodes. Simulation data for about Reynolds number 2300 were obtained.

    Competitive research funding