研究者業績

荒井 幸代

アライ サチヨ  (Arai Sachiyo)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(工学)(東京工業大学)

連絡先
sachiyofaculty.chiba-u.jp
J-GLOBAL ID
200901031363146377
researchmap会員ID
6000002280

外部リンク

論文

 65
  • Saito Masaharu, Arai Sachiyo
    Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 403-412 2024年3月20日  
    In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.
  • Ikenaga Akiko, Arai Sachiyo
    Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 393-402 2024年3月20日  
    Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.
  • Dan Zhou, Jiqing Du, Sachiyo Arai
    Inf. Sci. 657 119932-119932 2024年2月  
  • 田村秋考, 荒井幸代
    電気学会論文誌 C 144(2) 2024年  
  • Dan Zhou, Jiqing Du, Sachiyo Arai
    Swarm Evol. Comput. 81 101349-101349 2023年8月  
  • Daiko Kishikawa, Sachiyo Arai
    SICE Journal of Control, Measurement, and System Integration 16(1) 140-151 2023年4月  
  • Takumi Saiki, Sachiyo Arai
    IEEE Access 11 75875-75883 2023年  
  • Naoya Takayama, Sachiyo Arai
    IEEE Access 11 58532-58538 2023年  
  • Dan Zhou, Jiqing Du, Sachiyo Arai
    IEEE Access 11 43128-43139 2023年  
  • Naoya Takayama, Sachiyo Arai
    Artificial Life and Robotics 27(3) 594-602 2022年8月  査読有り責任著者
  • 浪越圭一, 荒井幸代
    人工知能学会論文誌 36(5) AG21-B_1 2021年9月1日  査読有り責任著者
  • Daiko Kishikawa, Sachiyo Arai
    Artificial Life and Robotics 26(3) 338-346 2021年8月  査読有り責任著者
  • Daiko Kishikawa, Sachiyo Arai
    2021 10th International Congress on Advanced Applied Informatics (IIAI-AAI) 2021年7月  
  • Yasuhiro Yoshida, Sachiyo Arai, Hiroyasu Kobayashi, Keiichiro Kondo
    Electrical Engineering in Japan 214(2) 2021年1月  査読有り責任著者
    <jats:title>Abstract</jats:title><jats:p>The effective utilization of regenerative power generated by trains has attracted the attention of engineers due to its promising potential in energy conservation for electrified railways. Charge control by wayside battery batteries is an effective method of utilizing this regenerative power. Wayside batteries requires saving energy by utilizing the minimum storage capacity of energy storage devices. However, because current control policies are rule‐based, based on human empirical knowledge, it is difficult to decide the rules appropriately considering the battery's state of charge. Therefore, in this paper, we introduce reinforcement learning with an actor‐critic algorithm to acquire an effective control policy, which had been previously difficult to derive as rules using experts’ knowledge. The proposed algorithm, which can autonomously learn the control policy, stabilizes the balance of power supply and demand. Through several computational simulations, we demonstrate that the proposed method exhibits a superior performance compared to existing ones.</jats:p>
  • 吉田賢央, 荒井幸代
    電子情報通信学会論文誌 D J103-D(11) 788-799 2020年11月1日  査読有り責任著者
  • Yasuhiro Yoshida, Sachiyo Arai, Hiroyasu Kobayashi, Keiichiro Kondo
    IEEJ Transactions on Industry Applications 140(11) 807-816 2020年11月1日  査読有り責任著者
  • 中田勇介, 荒井幸代
    人工知能学会全国大会論文集(Web) 33rd 1949-1950 2020年  
  • 中田勇介, 荒井幸代
    人工知能学会論文誌 13(1) G-J73_1-10 2020年1月  査読有り責任著者
  • 中田勇介, 荒井幸代
    人工知能学会論文誌 34(6) B-J23_1-11 2019年10月  査読有り
  • Toya Kamatani, Yusuke Nakata, Sachiyo Arai
    IEEE International Conference on Agents(ICA) 65-68 2019年  
  • Kousuke Nishi, Sachiyo Arai
    IEEE International Conference on Agents(ICA) 61-64 2019年  
  • Daiko Kishikawa, Sachiyo Arai
    2019 IEEE International Conference on Agents (ICA) 38-43 2019年  
  • 荒井幸代, 北里勇樹
    電気学会論文誌 138(6) 720-727 2018年  査読有り
  • Yasuhiro Yoshida, Sachiyo Arai
    2018 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA) 69-74 2018年  査読有り
  • Akiko Ikenaga, Sachiyo Arai
    2018 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA) 117-118 2018年  査読有り
  • Yusuke Nakata, Yuki Kiatazato, Sachiyo Arai
    2018 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA) 105-108 2018年  査読有り
  • Shota Ishikawa, Sachiyo Arai
    Proceedings of the Tenth International Workshop on Agents in Traffic and Transportation (ATT 2018) co-located with with the Federated Artificial Intelligence Meeting, including ECAI/IJCAI, AAMAS and ICML 2018 conferences (FAIM 2018), Stockholm, Sweden, Ju 63-69 2018年  査読有り
  • Yuki Kitazato, Sachiyo Arai
    Proceedings of the 10th International Conference on Agents and Artificial Intelligence, ICAART 2018, Volume 2, Funchal, Madeira, Portugal, January 16-18, 2018. 276-283 2018年  査読有り責任著者
  • Keiichi Namikoshi, Sachiyo Arai
    Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2018, Kyoto, Japan, July 15-19, 2018 1310-1317 2018年  査読有り
  • 荒井 幸代, 浪越 圭一
    人工知能学会論文誌 32(5) AG16-8 2017年  査読有り
  • 齋竹良介, 荒井幸代
    地域安全学会論文集 28(11) 101-107 2016年3月  査読有り
  • 荒井 幸代, 石川 翔太
    人工知能学会論文誌 31(2) D-F32_1-8 2016年2月  査読有り
    In this paper, we introduce an intelligent vehicle in traffic flow where a phantom traffic jam occurs for ensuring traffic-flow stability. The intelligent vehicle shares information on the speed and gap of the leading vehicle. Furthermore, the intelligent vehicle can foresee changes in the leading vehicles through shared information and can start accelerating faster than human-driven vehicles can. We propose an intelligent vehicle model, which is a generalized Nagel-Schreckenberg model can arbitrarily set the number of leading vehicles to share information with and set maximum distance of inter-vehicle communication. We found that phantom traffic jams are suppressed by an intelligent vehicle that can share information with two or more vehicles in front and information at least 30 meters away.
  • Ryosuke Saitake, Sachiyo Arai
    2016 IEEE INTERNATIONAL CONFERENCE ON AGENTS (IEEE ICA 2016) 110-111 2016年  査読有り
    Multi-Objective Reinforcement Learning (MORL) can he divided into two approaches according to the number of acquired policies. One approach learns a single policy that makes the agent reach a single arbitral Pareto optimal solution, and the other approach learns multiple policies that correspond to each Pareto optimal solution. The latter approach finds the multiple policies simultaneously; however, it incurs significant computational cost. In many real-world cases, learning a single solution is sufficient in the multi-objective context. In this paper, we focus on the former approach where a suitable weight of each object must be defined. To estimate the weight of each object as parameters, we utilize Q-values on the expert's trajectory, which indicates the optimal sequence of actions. This approach is an analogy obtained from apprenticeship learning via inverse reinforcement learning. We evaluate the proposed method using a well-known MORL benchmark problem, i.e., the Deep Sea Treasure environment.
  • Shota Ishikawa, Sachiyo Arai
    2016 IEEE International Conference on Agents (ICA) 90-93 2016年  査読有り
  • Sachiyo Arai, Haichi Xu
    PRICAI 2016: Trends in Artificial Intelligence 9810 16-29 2016年  査読有り
  • Shota Ishikawa, Sachiyo Arai
    2015 WINTER SIMULATION CONFERENCE (WSC) 300-311 2015年  査読有り
    In this paper, we introduce an intelligent vehicle in traffic flow where a phantom traffic jam occurs for ensuring traffic-flow stability. The intelligent vehicle shares information on the speed and gap of the leading vehicle. Furthermore, the intelligent vehicle can foresee changes in the leading vehicles through shared information and can start accelerating faster than human-driven vehicles can. We propose an intelligent-vehicle model, which is a generalized Nagel-Schreckenberg model that allows sharing information with leading vehicles. The generalized Nagel-Schreckenberg model can arbitrarily set the number of leading vehicles to share information with, and we found that phantom traffic jams are resolved by an intelligent vehicle that shares information with two or more vehicles in front.
  • Shota Ishikawa, Sachiyo Arai
    2015 Winter Simulation Conference (WSC) 300-311 2015年  査読有り
  • Haichi Xu, Sachiyo Arai
    IEEJ Transactions on Electronics, Information and Systems 134(9) 1310-1317 2014年  査読有り
    In this paper, we propose a method to diminish the state space explosion problem of a multiagent reinforcement learning context, where each agent needs to observe other agents' states, and previous actions at each step of its learning process. However, both the number of state and action become exponential in the number of agents, leading to enormous amount of computation and very slow learning. In our method, the agent considers other agents' statuses only when they interfere with one another to reach their goals. Our idea is that each agent starts with its state space which does not include information of others'. Then, they automatically expand and refine their state space when agents detect interference. We adopt the information theory measure of entropy to detect the interference status where agents should take into account the other agents. We demonstrate the advantage of our method over the properties of global convergence in a time efficient manner.
  • Sachiyo Arai, Kanako Suzuki
    Journal of Information Processing 22(2) 299-306 2014年  査読有り
    This study is intended to encourage appropriate social norms among multiple agents. Effective norms, such as those emerging from sustained individual interactions over time, can make agents act cooperatively to optimize their performance. We introduce a "social learning" model in which agents mutually interact under a framework of the coordination game. Because coordination games have dual equilibria, social norms are necessary to make agents converge to a unique equilibrium. As described in this paper, we present the emergence of a right social norm by inverse reinforcement learning, which is an approach for extracting a reward function from the observation of optimal behaviors. First, we let a mediator agent estimate the reward function by inverse reinforcement learning from the observation of a master's behavior. Secondly, we introduce agents who act according to an estimated reward function in the multiagent world in which most agents, called citizens, have no way to act. Finally, we evaluate the effectiveness of introducing inverse reinforcement learning. © 2014 Information Processing Society of Japan.
  • 許 海遅, 荒井 幸代
    電気学会論文誌C 133(9) 10-1716 2013年  査読有り
  • Sachiyo Arai -, Tatsuya Masubuchi -
    International Journal of Advancements in Computing Technology 4(22) 257-268 2012年12月  査読有り
  • 内田 英明, 藤井 秀樹, 吉村 忍, 荒井 幸代
    人工知能学会全国大会論文集 JSAI2012 3F2OS101-3F2OS101 2012年  
    道路交通施策の意思決定者にとって,道路環境を変化させた直後の過渡的な交通を予測することは,環境に関する知識が十分に浸透し収束した後の定常状態を予測することと同様に重要である.著者らはこれまで学習を用いた経路探索アルゴリズムの一つであるQ-routingを応用しながら過渡的な交通状況の再現を試みてきた.本講演では現実の問題にQ-routingを適用した際の経路選択行動の遷移の様子を報告する.
  • 角井 勇哉, 荒井 幸代
    日本オペレーションズ・リサーチ学会和文論文誌 54 84-108 2011年  
    野球が契約料や広告料など巨額の金銭を動かすビジネスになっている.この背景では,期待得点値を最大にするラインナップを構成することは野球チームにとって大きな課題である.期待得点値は,野球の攻撃をマルコフ連鎖として捉えた期待得点値算出モデルにより計算可能である.しかし,n名の選手集合から構成され得る全通りのラインナップの期待得点値を計算するためには,O(n^9)の計算量が必要である.そこで,本稿では打番の要求機能の定量化法,および得られた要求機能を用いて,最適ラインナップ構成問題をマッチング問題に定式化する方法を提案する.また,提案法を,1.打番の要求機能の定量化法の妥当性,2.ラインナップ構成法としての評価の2段階から既存手法との比較により議論する.
  • Sachiyo Arai, Yoshihisa Ishigaki
    JACIII 13(6) 649-657 2009年  査読有り
    <jats:p>Although a large number of reinforcement learning algorithms have been proposed for the generation of cooperative behaviors, the question of how to evaluate mutual benefit or loss among them is still open. As far as we know, an emerged behavior is regarded as a cooperative behavior when embedded agents have finally achieved their global goal, regardless of whether or not mutual interference has had any effect during the course of the learning process of each agent. Thus, we cannot detect any harmful interaction on the way to achieving a fully-converged policy. In this paper, we propose a measure based on information theory for evaluating the degree of interaction during the learning process from the viewpoint of information sharing. In order to discuss the bad effects of concurrent learning, we apply our proposed measure to a situation in which there exist conflicts among the agents, and we show the availability of our measure.</jats:p>
  • Sachiyo Arai, Yoshihisa Ishigaki, Hironori Hirata
    INTELLIGENT AGENTS AND MULTI-AGENT SYSTEMS, PROCEEDINGS 5357 34-41 2008年  査読有り
    Although a large number of algorithms have been proposed for generating cooperative behaviors, the question of how to evaluate mutual benefit among them is still open. This study provides a measure for cooperation degree among the reinforcement learning agents. By means of our proposed measure, that is based on information theory, the degree of interaction among agents can be evaluated from the viewpoint of information sharing. Here, we show the availability of this measure through some experiments on "pursuit game", and evaluate the degree of cooperation among hunters and prey.
  • 荒井 幸代
    Springer-Verlag, Lecture Notes in Computer Science Volume 4088 Volume 4088 4088 279-292 2006年  査読有り
  • Nobuyuki Tanaka, Sachiyo Arai
    AGENT COMPUTING AND MULTI-AGENT SYSTEMS 4088 279-292 2006年  査読有り
    In this paper, we discuss guidelines for a reward design problem that defines when and what amount of reward should be given to the agents, within the context of reinforcement learning approach. We take keepaway soccer as a standard task of multiagent domain which requires skilled teamwork. The difficulties of designing reward for good teamwork are due to its features as follows: i) since it is a continuing task which has no explicit goal, it is hard to tell when reward should be given to the agents, ii) since it is a multiagent cooperative task, it is hard to make a fair share of the reward for each agent's contribution. Through some experiments, we show that reward design have a major effect on the agent's behavior, and introduce the reward function that makes agents perform keepaway successfully.
  • N Asgharbeygi, N Nejati, P Langley, S Arai
    INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS 3625 20-37 2005年  査読有り
    Reasoning plays a central role in intelligent systems that operate in complex situations that involve time constraints. In this paper, we present the Adaptive Logic Interpreter, a reasoning system that acquires a controlled inference strategy adapted to the scenario at hand, using a variation on relational reinforcement learning. Employing this inference mechanism in a reactive agent architecture lets the agent focus its reasoning on the most rewarding parts of its knowledge base and hence perform better under time and computational resource constraints. We present experiments that demonstrate the benefits of this approach to reasoning in reactive agents, then discuss related work and directions for future research.

MISC

 120

書籍等出版物

 11

講演・口頭発表等

 201

共同研究・競争的資金等の研究課題

 12

産業財産権

 1

社会貢献活動

 6