中村あすか

ナカムラアスカ (Asuka Nakamura)

基本情報

所属: 千葉大学情報戦略機構特任研究員

J-GLOBAL ID: 202401011683187922
researchmap会員ID: R000074770

研究キーワード

研究分野

情報通信 / 高性能計算 /

論文

エージェント間距離の計算回数削減による視野を用いたSFMの高速化

片寄颯人, 中村あすか, 富永浩文, 前川仁孝

情報処理学会論文誌 65(1) 15-22 2024年1月15日査読有り

本論文は，視野を用いたSFM（Social Force Model）を高速化するために，エージェント間距離の計算回数を削減する手法を提案する．視野を用いたSFMは，人を運動方程式に基づいて移動するエージェントとしてモデル化し，その動きを再現する手法である．本手法の運動方程式は，視野範囲をエージェントの進行方向前方に扇形の領域として設定し，エージェントが視野範囲外から受ける力を0とする．このため，本手法では進行方向後方に存在するエージェントに対するエージェント間距離の計算が不要である．そこで，提案手法では，進行方向と視野角の関係性を利用し，エージェント間距離の計算回数を削減する．評価の結果，エージェント間距離の計算回数を削減した手法は，従来手法に対して解析時間が約1.25倍高速化することを確認した． This paper proposes a speedup method by reducing the number of distance calculations between agents for the Social Force Model (SFM) using the field of view. The SFM using the field of view is a method to simulate human behavior based on motion equations. The motion equation of the SFM using the field of view defined the field of view as the area fanning out in front of the agent's moving direction, and it ignores force receiving from outside the field of view. Thus, distance calculations between the agent and agents existing behind the agent are unnecessary in the SFM using the field of view. Therefore, the proposed method reduces distance calculations between agents using the moving direction of the agent and the field of view. As a result of the evaluation, the proposed method gives us about 1.25 times speedup.
Evaluation of Hierarchical Pincers Attack Search on Distributed Memory Systems

Asuka Nakamura, Hirobumi Tominaga, Yoshitaka Maekawa

2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW) 4 323-326 2018年11月査読有り筆頭著者
Evaluation of EMVA Using the Instruction-Level Parallelism on Tegra X1

Hirobumi Tominaga, Asuka Nakamura, Yoshitaka Maekawa

2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW) 70 239-242 2018年11月査読有り
タスクスケジューリング問題における探索ノード数削減によるPDF/IHS法の高速化

松瀨, 弘明, 中村, あすか, 富永, 浩文, 前川, 仁孝

情報処理学会論文誌コンピューティングシステム（ACS） 11(2) 17-26 2018年8月3日査読有り

本論文では，タスクスケジューリング問題を高速に解くために，レディ状態を削減したPDF/IHS（Parallelized Depth First/Implicit Heuristic Search）法の探索ノード数を削減するアルゴリズムを提案する．PDF/IHS法は，階層的挟み撃ち探索を用いた並列探索アルゴリズムであり，実行可能なタスクをPE（Processing Element）に割り当てる組合せをすべて列挙することで分枝限定法の探索木を生成する．このため，問題規模が大きくなるほど探索ノード数が膨大になり，探索ノード数の削減が必要となる．PDF/IHS法の探索ノード数を削減するために，不必要なレディ状態を割当てた部分問題を枝刈りする手法が提案されている．不必要なレディ状態を割り当てた部分問題を枝刈りする法は，割り当てたタスク情報のみを用いて枝刈りするため，探索する必要のない部分問題のすべてを枝刈りできない．そこで本論文では，探索する必要のないすべての部分問題を枝刈りするために，探索済みノード情報を利用する．評価の結果，探索済みノードを利用した枝刈りをすることで，スレッド数2のとき相乗平均約1.39倍高速に求解できることを確認した． This paper proposes a reduction algorithm of branching nodes for the PDF/IHS (Parallelized Depth First/Implicit Heuristic Search) using allocations of idle tasks method for solving task scheduling problems fast. The PDF/IHS is parallel search method using HPAS (Hierarchical Pincers Attack Search). The search method creates the search tree by enumerating all combinations of the allocatable tasks to PE (Processing Element). Therefore, it is necessary to reduce the number of search nodes to solve large-scale problems having many search nodes. We proposed the bounding method of the number of search nodes allocating unnecessary idle tasks to reduce search nodes of PDF/IHS. This method can not cut all of subproblems which do not need to search, because this method cuts subproblems using information of allocated tasks. Therefore, it is necessary to reduce all unnecessary nodes to perform further speedup. The proposed method reduces unnecessary subproblems by using searched subproblems. As a result of evaluation, the speedup ratio of the proposed method with the PDF/IHS is about 1.39 times on geometric average at 2 threads.
レジスタ最適化を用いたCUDAによる格子ボルツマン法の高速化手法

富永, 浩文, 中村, あすか, 前川, 仁孝

情報処理学会論文誌プログラミング（PRO） 11(2) 1-8 2018年6月26日査読有り

本論文では，CUDA（Compute Unifide Device Architecture）を用いた格子ボルツマン法（LBM：Lattice Boltzmann Method）を高速化するために，メモリアクセス遅延を削減する手法を提案する．格子ボルツマン法は，解析領域を格子に分割し離散化されたボルツマン方程式を解く手法である．ボルツマン方程式の計算は，周囲の格子点の情報を参照するため，メモリアクセスコストが高いメモリバウンドな計算である．このため，LBMのメモリアクセスコストを削減する手法の1つとしてテンポラルブロッキングが用いられている．CUDAによるテンポラルブロッキングを用いた格子ボルツマン法は，ブロックに分割した領域をスレッドブロックに割り当て時間発展方程式を計算する．本計算は，メモリアクセスのコストを抑えるが，シェアードメモリにデータを格納することで，シェアードメモリに対する同期処理やレイテンシによるアクセスコストが処理の大部分を占める．そこで，本論文では，メモリアクセスコストが低いレジスタを用いてテンポラルブロッキングを行うことで処理を高速化する手法を提案する．提案手法は，テンポラルブロッキングにおける複数時間ステップの計算をレジスタ上に保持して行うことで処理を高速化する． This paper proposes a speed up method of the Lattice Boltzmann Method (LBM) by improving the memory access delay. The LBM is a method of dividing the analysis area to the grid and solving the discretized Boltzmann equation. Calculation of the Boltzmann equation is a memory bound calculations that mean high memory access cost by referring to the surrounding grid points. For this reason, the temporal blocking method is one of the reducing methods of the memory access cost for the LBM. The LBM using temporal blocking by the CUDA calculates the time evolution equation by assigning divided areas to thread blocks. This calculation reduces the access cost to store the data in the shared memory. However, the synchronization and the memory access latency of shared memory occupy the majority of processing. Therefore, this presentation proposes a speed up method of temporal blocking using the registers whose access cost is low. The proposed method accelerates by storing calculation in the registers over multiple time steps of temporal blocking.
CUDA における JDS 形式疎行列ベクトル積に対するダイナミックパラレリズムの評価

長坂一生, 富永浩文, 中村あすか, 前川仁孝

xSIG (cross-disciplinary workshop on computing Systems, Infrastructures, and programming ) 2018年5月29日査読有り
タスクスケジューリング問題におけるレディ状態の割当て削減によるPDF/IHSの高速化

中村, あすか, 富永, 浩文, 前川, 仁孝

情報処理学会論文誌 58(3) 654-662 2017年3月15日査読有り筆頭著者

本論文は，タスクスケジューリング問題の厳密解法であるPDF/IHS（Parallel Depth First/Implicit Heuristic Search）の探索ノード数を削減するアルゴリズムを提案する．PDF/IHSは，階層的挟み撃ち探索を用いた分枝限定法の並列探索アルゴリズムであり，大規模なタスクスケジューリング問題を高速に解くためには探索ノード数の削減が必要となる．PDF/IHSの分枝操作は，スケジュールが未確定となる時刻に実行可能なタスクの処理またはレディ状態を割り当てる全組合せを部分問題として生成する．このため，不必要なレディ状態が割り当てられた部分問題が生成されることがある．そこで，本論文では，PDF/IHSの探索ノード数を削減するために，レディ状態を割り当てる部分問題のうち，最適解が得られないことが保障できる部分問題を枝刈りする．提案するアルゴリズムは，分枝操作で割り当てられたタスクの処理時間の情報から探索する必要のない部分問題を判定する．評価の結果，提案手法は，PDF/IHSに対して最大約96倍高速に求解できることを確認した． This paper proposes a reduction algorithm of branching nodes for the task scheduling algorithm PDF/IHS (Parallel Depth First/Implicit Heuristic Search). The PDF/IHS is a parallel branch and bound method using HPAS (Hierarchical Pincers Attack Search). For solving large-scale task scheduling problems fast using the PDF/IHS, it is necessary to not only using parallel search algorithms but reducing branching nodes. The branching operation of the PDF/IHS makes some subproblems by enumerating all combinations of the allocatable tasks and idle tasks at the time when the schedule is undecided. For this reason, the PDF/IHS may make subproblems assigned some unnecessary idle tasks. Therefore, for reduction of branching nodes of the PDF/IHS, the proposed method cuts some subproblems assigned idle tasks, whose schedule length is longer than others. The proposed algorithm determines unnecessity of searching subproblems using time of allocated task in the branching operation. As a result of evaluation, the speedup ratio of the proposal method compared with the PDF/IHS is about 96 times on the maximum.
GPUを用いた格子ボルツマン法のループ展開を利用した同期オーバヘッド削減による高速化

鈴木, 宏明, 富永, 浩文, 中村, あすか, 前川, 仁孝

情報処理学会論文誌プログラミング（PRO） 9(1) 11-11 2016年2月26日査読有り

本研究では，GPUを用いた格子ボルツマン法を高速化するために，ループ展開を利用して，GPUの実行効率を向上する手法を提案する．GPUを用いた格子ボルツマン法は，時間ステップごとにカーネルを起動し，複数のスレッドブロックが時間発展方程式を求解することで流体の流れを解析する．スレッドブロックは，メモリアクセスのコストを少なくするために，計算に必要なデータをグローバルメモリからシェアードメモリに格納し，シェアードメモリにあるデータのみ参照して計算する．シェアードメモリとグローバルメモリ間のデータ転送のようなスレッド間の同期処理は，計算時間の大部分を占めるため，オーバヘッドになる．そこで本研究では，同期オーバヘッドを削減するために，ループ展開を用いることで複数ステップ分の計算を1つのカーネルに割り当てる．本手法は，1回の時間ステップでシェアードメモリに格納するデータは多くなるが，グローバルメモリとシェアードメモリ間のデータ転送の回数やスレッド間の同期回数を削減できるため，高速化できる．最後に，提案手法の有効性をGeForce TITAN Blackで評価した．評価の結果，ループ展開した格子ボルツマン法は，ループ展開しない格子ボルツマン法に比べ，最大約3.1倍高速化することを確認した． This study proposes a speedup method by improving an efficiency of calculation of the Lattice Boltzmann Method on GPU using loop transformation. This method creates GPU kernels at each time step and analyzes fluid flow. GPU kernel creates thread blocks and calculates time evolution equations. Thread blocks store a data from global memory to shared memory in order to reduce a cost of memory access. Therefore, this method calculates by referring only the data in shared memory. Synchronization between threads, such as data transfer between global memory and shared memory is overhead in order to occupy the majority of calculations. So, this study assigns calculation of multiple time steps to one GPU kernel using loop transformation in order to reduce synchronization overhead. The proposed method stores more data in shared memory compared with the conventional method in a single time step. However, it is possible to reduce synchronization times of threads and data transfer between global memory and shared memory. Finally, the results of the evaluation using GeForce TITAN Black shows the proposed method gives us about 3.1 times speedup compared with the conventional method.
タスクスケジューリング問題の厳密解求解における探索ノード数削減アルゴリズム

中村, あすか, 前川, 仁孝

情報処理学会論文誌プログラミング（PRO） 7(1) 1-9 2014年1月22日査読有り筆頭著者

本論文は，分枝限定法を用いたタスクスケジューリング問題の厳密解法の1つであるDF/IHS法（Depth First/Implicit Heuristic Search method）の探索ノード数を削減するアルゴリズムを提案する．本手法を用いて大規模なタスクスケジューリング問題を高速に解くために，並列探索アルゴリズムが提案されているが，さらなる高速化を行うためには探索ノード数の削減が必要となる．DF/IHS法の分枝操作は，スケジュールが未確定となる時刻に実行可能なタスクの処理またはレディ状態を割り当てる全組合せを部分問題として生成する．このため，不必要なレディ状態が割り当てられた部分問題が生成されることがある．そこで，本論文では，DF/IHS法の探索ノード数を削減するために，レディ状態を割り当てる部分問題のうち，他の部分問題よりも短いスケジューリング長が得られない問題を探索することなしに判定し，探索木中に生成することを抑制する．提案するアルゴリズムは，同一のPEに割り当てられたタスクを融合することで，部分問題のタスクグラフを再定義する．次に，再定義したタスクグラフを比較することで，探索する必要のない部分問題の生成を中止する．本アルゴリズムは，暫定解や下界値を用いないため，探索の進行状況の影響を受けずに探索ノードを削減することができる． This paper proposes a reduction algorithm of branching nodes for DF/IHS (Depth First/Implicit Heuristic Search method) which is one of the efficient algorithms for solving task scheduling problems using branch and bound method. For solving large-scale task scheduling problems fast using the DF/IHS, it is necessary to not only using parallel search algorithms but reducing branching nodes. The branching operation of the DF/IHS makes some subproblems by enumerating all combinations of the allocatable tasks and idle tasks at the time when the schedule is undecided. For this reason, the DF/IHS may make subproblems assigned some unnecessary idle tasks. Therefore, for reduction of branching nodes of the DF/IHS, the proposed method picks out some subproblems, whose schedule length is longer than others, from among subproblems assigned idle tasks without search. The proposed algorithm redefines task graph of subproblem assigned idle tasks in terms of task-fusion. Then it does not make unnecessary subproblems using the comparison of the redefined task graphs. Since the algorithm does not use upper and lower bounds, it is able to reduce branching nodes without effects of the search order.
探索の重複領域削減による階層的挟み撃ち探索の高速化

中村あすか, 富永浩文, 前川仁孝

情報処理学会論文誌コンピューティングシステム（ACS） 4(4) 76-84 2011年10月5日査読有り筆頭著者

講演・口頭発表等

CUDAを用いた回路解析のためのランダムスパースな連立一次方程式の並列性抽出手法

富永浩文, 佐藤滉斗, 中村あすか, 前川仁孝

電気学会電子回路研究会 2025年3月11日
粒子の影響半径を用いた影響範囲の近似精度向上による解像度可変型MPS法の近傍判定回数削減

石井滉人, 中村あすか, 富永浩文, 前川仁孝

第195回HPC研究発表会（SWoPP2024） 2024年8月9日
シミュレーテッド量子アニーリングの並列化手法の評価

足原啓心, 中村あすか, 前川仁孝

第195回HPC研究発表会（SWoPP2024） 2024年8月8日
DF/IHS法の木構造を用いたタスクスケジューリング問題の下界値計算手法

薬師, 啓太, 中村, あすか, 前川, 仁孝

第86回全国大会講演論文集 2024年3月1日

本発表では, DF/IHS法を用いたタスクスケジューリング問題の最適解求解を高速化するため,下界値の計算を簡単化する手法を提案する.DF/IHS法は分枝限定法に基づいて最も短いスケジュール長となるタスクの組合せを求める.DF/IHS法の下界値計算はすべての未割当てタスクの実行に必要な時間を予測するため, 未割当てタスクが同じ部分問題に対して同じ計算を実行する場合があり, 計算に無駄が生じる.そこで, 未割当てタスクが同じ部分問題をDF/IHSの木構造に基づいて検出し, 下界値の計算を簡単化することで, 探索の高速化が期待できる.
残差ベクトル更新方式の切り替えによるPipelined Chronopoulos CG法の収束性向上

沢, 圭佑, 中村, あすか, 前川, 仁孝

第86回全国大会講演論文集 2024年3月1日

本発表では，Pipelied Chronopoulos CG法の収束性の向上のため，残差ベクトル更新方法を動的に切り替える手法を提案する．Pipelined Chronopoulos CG法は，通常のCG法を計算順序の入れ替えにより，通信回数の削減や一対多の通信を隠蔽する手法である．この手法は，計算順序の入れ替えにより丸め誤差の影響を通常のCG法よりも強く受け，収束性が悪い．このため，Pipelied Chronopoulos CG法により，大雑把な解析を行い，特定のタイミングから残差ベクトルの更新方法を切り替えることで，丸め誤差の影響を緩和する．

もっとみる

一覧へ戻る

中村 あすか

基本情報

研究キーワード

研究分野

論文

講演・口頭発表等

中村あすか