研究者業績

荒井 幸代

アライ サチヨ  (Sachiyo Arai)

基本情報

所属
千葉大学 大学院工学研究院 教授
学位
博士(工学)(東京工業大学)

連絡先
sachiyofaculty.chiba-u.jp
J-GLOBAL ID
200901031363146377
researchmap会員ID
6000002280

外部リンク

論文

 76
  • Haichi Xu, Sachiyo Arai
    IEEJ Transactions on Electronics, Information and Systems 134(9) 1310-1317 2014年  査読有り
    In this paper, we propose a method to diminish the state space explosion problem of a multiagent reinforcement learning context, where each agent needs to observe other agents' states, and previous actions at each step of its learning process. However, both the number of state and action become exponential in the number of agents, leading to enormous amount of computation and very slow learning. In our method, the agent considers other agents' statuses only when they interfere with one another to reach their goals. Our idea is that each agent starts with its state space which does not include information of others'. Then, they automatically expand and refine their state space when agents detect interference. We adopt the information theory measure of entropy to detect the interference status where agents should take into account the other agents. We demonstrate the advantage of our method over the properties of global convergence in a time efficient manner.
  • Sachiyo Arai, Kanako Suzuki
    Journal of Information Processing 22(2) 299-306 2014年  査読有り
    This study is intended to encourage appropriate social norms among multiple agents. Effective norms, such as those emerging from sustained individual interactions over time, can make agents act cooperatively to optimize their performance. We introduce a "social learning" model in which agents mutually interact under a framework of the coordination game. Because coordination games have dual equilibria, social norms are necessary to make agents converge to a unique equilibrium. As described in this paper, we present the emergence of a right social norm by inverse reinforcement learning, which is an approach for extracting a reward function from the observation of optimal behaviors. First, we let a mediator agent estimate the reward function by inverse reinforcement learning from the observation of a master's behavior. Secondly, we introduce agents who act according to an estimated reward function in the multiagent world in which most agents, called citizens, have no way to act. Finally, we evaluate the effectiveness of introducing inverse reinforcement learning. © 2014 Information Processing Society of Japan.
  • 許 海遅, 荒井 幸代
    電気学会論文誌C 133(9) 10-1716 2013年  査読有り
  • Sachiyo Arai -, Tatsuya Masubuchi -
    International Journal of Advancements in Computing Technology 4(22) 257-268 2012年12月  査読有り
  • 内田 英明, 藤井 秀樹, 吉村 忍, 荒井 幸代
    人工知能学会全国大会論文集 JSAI2012 3F2OS101-3F2OS101 2012年  
    道路交通施策の意思決定者にとって,道路環境を変化させた直後の過渡的な交通を予測することは,環境に関する知識が十分に浸透し収束した後の定常状態を予測することと同様に重要である.著者らはこれまで学習を用いた経路探索アルゴリズムの一つであるQ-routingを応用しながら過渡的な交通状況の再現を試みてきた.本講演では現実の問題にQ-routingを適用した際の経路選択行動の遷移の様子を報告する.
  • 角井 勇哉, 荒井 幸代
    日本オペレーションズ・リサーチ学会和文論文誌 54 84-108 2011年  
    野球が契約料や広告料など巨額の金銭を動かすビジネスになっている.この背景では,期待得点値を最大にするラインナップを構成することは野球チームにとって大きな課題である.期待得点値は,野球の攻撃をマルコフ連鎖として捉えた期待得点値算出モデルにより計算可能である.しかし,n名の選手集合から構成され得る全通りのラインナップの期待得点値を計算するためには,O(n^9)の計算量が必要である.そこで,本稿では打番の要求機能の定量化法,および得られた要求機能を用いて,最適ラインナップ構成問題をマッチング問題に定式化する方法を提案する.また,提案法を,1.打番の要求機能の定量化法の妥当性,2.ラインナップ構成法としての評価の2段階から既存手法との比較により議論する.
  • Sachiyo Arai, Yoshihisa Ishigaki
    JACIII 13(6) 649-657 2009年  査読有り
    <jats:p>Although a large number of reinforcement learning algorithms have been proposed for the generation of cooperative behaviors, the question of how to evaluate mutual benefit or loss among them is still open. As far as we know, an emerged behavior is regarded as a cooperative behavior when embedded agents have finally achieved their global goal, regardless of whether or not mutual interference has had any effect during the course of the learning process of each agent. Thus, we cannot detect any harmful interaction on the way to achieving a fully-converged policy. In this paper, we propose a measure based on information theory for evaluating the degree of interaction during the learning process from the viewpoint of information sharing. In order to discuss the bad effects of concurrent learning, we apply our proposed measure to a situation in which there exist conflicts among the agents, and we show the availability of our measure.</jats:p>
  • Sachiyo Arai, Yoshihisa Ishigaki, Hironori Hirata
    INTELLIGENT AGENTS AND MULTI-AGENT SYSTEMS, PROCEEDINGS 5357 34-41 2008年  査読有り
    Although a large number of algorithms have been proposed for generating cooperative behaviors, the question of how to evaluate mutual benefit among them is still open. This study provides a measure for cooperation degree among the reinforcement learning agents. By means of our proposed measure, that is based on information theory, the degree of interaction among agents can be evaluated from the viewpoint of information sharing. Here, we show the availability of this measure through some experiments on "pursuit game", and evaluate the degree of cooperation among hunters and prey.
  • 荒井 幸代
    Springer-Verlag, Lecture Notes in Computer Science Volume 4088 Volume 4088 4088 279-292 2006年  査読有り
  • Nobuyuki Tanaka, Sachiyo Arai
    AGENT COMPUTING AND MULTI-AGENT SYSTEMS 4088 279-292 2006年  査読有り
    In this paper, we discuss guidelines for a reward design problem that defines when and what amount of reward should be given to the agents, within the context of reinforcement learning approach. We take keepaway soccer as a standard task of multiagent domain which requires skilled teamwork. The difficulties of designing reward for good teamwork are due to its features as follows: i) since it is a continuing task which has no explicit goal, it is hard to tell when reward should be given to the agents, ii) since it is a multiagent cooperative task, it is hard to make a fair share of the reward for each agent's contribution. Through some experiments, we show that reward design have a major effect on the agent's behavior, and introduce the reward function that makes agents perform keepaway successfully.
  • N Asgharbeygi, N Nejati, P Langley, S Arai
    INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS 3625 20-37 2005年  査読有り
    Reasoning plays a central role in intelligent systems that operate in complex situations that involve time constraints. In this paper, we present the Adaptive Logic Interpreter, a reasoning system that acquires a controlled inference strategy adapted to the scenario at hand, using a variation on relational reinforcement learning. Employing this inference mechanism in a reactive agent architecture lets the agent focus its reasoning on the most rewarding parts of its knowledge base and hence perform better under time and computational resource constraints. We present experiments that demonstrate the benefits of this approach to reasoning in reactive agents, then discuss related work and directions for future research.
  • S Arai, T Ishida
    INTERNATIONAL CONFERENCE ON INFORMATICS RESEARCH FOR DEVELOPMENT OF KNOWLEDGE SOCIETY INFRASTRUCTURE, PROCEEDINGS 132-139 2004年  査読有り
    Semantic Web is a challenging framework to make Web information machine readable or understandable, but it seems not enough to make human's requirements for collecting and utilizing information automatically. The Agent technology becomes hopeful approach to bridge the gap between humans and machines. Agents may be autonomous and intelligent entities that may travel among agents and human. They get the requirements from human or other agents, and offer an appropriate solution through consulting among them. The main difference between agent and ordinary software development is the issue of coordination, cooperation and learning. This issue is very important for utilizing the web information. In this paper, we attempt to give an overview and research challenges with respect to the combination of machine learning and agent technologies with Semantic Web from the perspective of interaction as well as interoperability among agents and humans.
  • S Arai, Y Murakami, Y Sugimoto, T Ishida
    INTELLIGENT AGENTS AND MULTI-AGENT SYSTEMS 2891 98-109 2003年  査読有り
    Current research issues on web services have come to center around flexible composition of existing services. Under the initiative of industry, flexible composition framework has been developed on a workflow model where flow of the processes and bindings among services should be known beforehand. In short, its framework realizes flexible composition within the available services that are not widely opened. This paper focuses on two limitations of current web service composition. One limitation is that it's hard to represent multi-agent scenarios consisting of several concurrent processes because it is based on a workflow model. The other limitation is that once composed, web service cannot be reused or transferrd for other requesters, because there is no function to put the semantics on composite web service. To overcome these limitations, we have developed scenario description language Q, which enables us to realize a web service composition reflecting a multi-agent's context not as a workflow but a scenario. Furthermore, we developed a system that translates multi-agent scenario to DAML-S, and that registers the translated DAML-S as a new Web service. We also discuss the availability of our system for designing an application to C-Commerce and Digital Cities.
  • 荒井 幸代
    Proceedings of Genetic and Evolutionary Computation Conference 815-822 2001年7月  査読有り
  • S Arai, K Sycara
    FROM ANIMALS TO ANIMATS 6 507-516 2000年  査読有り
    The point we want to make in this paper is that Profit-sharing; a reinforcement learning approach is very appropriate to realize the adaptive behaviors in a multi-agent environment. We discuss the effectiveness of Profit-sharing theoretically and empirically within a Pursuit Game where there exist multiple preys and multiple hunters. In our context of this problem, hunters need to coordinate adaptively one another to capture all the preys, without sharing information, predefined organization and any prior knowledge around their environment. Pursuit Game itself is very simple but can be extended to a real problem. Our approach, Profit-sharing, is contrastive to other reinforcement learning approaches which are based on Dynamic Programming, such as Temporal Difference method and Q-learning, in that Profit-sharing guarantees convergence to a effective policy even in domains that do not obey the Markov property, if a task is episodic and a credit is assigned in an appropriate manner. Profit-sharing is also different from Q(1) and Sarsa(1) methods in that it does not need eligibility trace to manage the delayed reward. Though our monolithic implementation here seems to be an impractical in a real word, we need to discuss the validity of algorithm as a multiagent reinforcement learning context before introducing some structured frameworks into the monolithic method to extend its application. The contribution of this paper is that we introduce Profit-sharing as the effective algorithm in the multiagent domain and report its advantages and limitations without hierarchies.
  • 荒井 幸代, 宮崎 和光
    Proceedings of the 6th International Conference on Information Systems Analysis and Synthesis 178-183 2000年  査読有り
  • Sachiyo Arai, Katia P. Sycara, Terry R. Payne
    PRICAI 2000 125-135 2000年  査読有り
  • S Arai, K Sycara, TR Payne
    FOURTH INTERNATIONAL CONFERENCE ON MULTIAGENT SYSTEMS, PROCEEDINGS 359-360 2000年  査読有り
  • Sachiyo Arai, Katia P. Sycara
    Proceedings of the Fourth International Conference on Autonomous Agents, AGENTS 2000, Barcelona, Catalonia, Spain, June 3-7, 2000 104-105 2000年  査読有り
  • Sachiyo Arai, Kazuteru Miyazaki, Shigenobu Kobayashi
    The Fourth International Symposium on Autonomous Decentralized Systems, ISADS 1999, Tokyo, Japan, March 20-23, 1999 310-319 1999年  査読有り
  • S Arai, K Miyazaki, S Kobayashi
    INTELLIGENT AUTONOMOUS SYSTEMS 5 335-342 1998年  査読有り
    This paper deals with planning actions of the cranes in a coilyard of steel manufacture. Each crane would be operated independently but it must share the rail and be required single-track operation among the other cranes. And complete information around the coilyard is not always available to each operator of the crane. Sometimes operator does not need whole information, but there exist a complicated interaction among the cranes. There are two main problems in this case. One is an allocating generated tasks to a certain crane and the other is a controlling cranes' execution to avoid collision. We focus the latter one in this paper and we approach to acquire the cooperative rules to evade collision among the cranes which might be very difficult to design by any experts. Instead of hand-coding these rules, we apply profit-sharing, a kind of a reinforcement learning method, in our multi-agent model. And show that the performance of cranes which are operated by reinforced rules is better than that of cranes modelling by the reactive planner using hand-coded rules.
  • 荒井 幸代, 宮崎 和光, 小林 重信
    Proceedings of the 6th European Workshop on Learning Robots 111-120 1997年  査読有り
  • 荒井 幸代
    Proceedings of the 15th International Joint Conference on Artificial Intelligence 7-7 1997年  査読有り
  • Sachiyo Arai
    Proceedings of the First International Conference on Multiagent Systems, June 12-14, 1995, San Francisco, California, USA 436 1995年  査読有り
  • 荒井 幸代, 宮崎 和光, 小林 重信
    Proceedings of the 1st Pacific Rim International Conference on Artificial Intelligence 77-82 1990年  査読有り
  • 荒井 幸代, 宮崎 和光, 小林 重信
    Proceedings of the 28th SICE Annual Conference 2 1255-1258 1989年  査読有り

MISC

 121

書籍等出版物

 9

講演・口頭発表等

 201

共同研究・競争的資金等の研究課題

 12

産業財産権

 1

社会貢献活動

 6