大学院工学研究院

荒井 幸代

アライ サチヨ  (Arai Sachiyo)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(工学)(東京工業大学)

連絡先
sachiyofaculty.chiba-u.jp
J-GLOBAL ID
200901031363146377
researchmap会員ID
6000002280

外部リンク

論文

 65
  • Saito Masaharu, Arai Sachiyo
    Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 403-412 2024年3月20日  
    In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.
  • Ikenaga Akiko, Arai Sachiyo
    Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 393-402 2024年3月20日  
    Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.
  • Dan Zhou, Jiqing Du, Sachiyo Arai
    Inf. Sci. 657 119932-119932 2024年2月  
  • 田村秋考, 荒井幸代
    電気学会論文誌 C 144(2) 2024年  
  • Dan Zhou, Jiqing Du, Sachiyo Arai
    Swarm Evol. Comput. 81 101349-101349 2023年8月  

MISC

 120

書籍等出版物

 11

講演・口頭発表等

 201

共同研究・競争的資金等の研究課題

 12

産業財産権

 1

社会貢献活動

 6