Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 403-412 2024年3月20日
In recent years, inverse reinforcement learning has attracted attention as a method for estimating the intention of actions using the trajectories of various action-taking agents, including human flow data. In the context of reinforcement learning, “intention” refers to a reward function. Conventional inverse reinforcement learning assumes that all trajectories are generated from policies learned under a single reward function. However, it is natural to assume that people in a human flow act according to multiple policies. In this study, we introduce an expectation-maximization algorithm to inverse reinforcement learning, and propose a method to estimate different reward functions from the trajectories of human flow. The effectiveness of the proposed method was evaluated through a computer experiment based on human flow data collected from subjects around airport gates.
Journal of Advanced Computational Intelligence and Intelligent Informatics 28(2) 393-402 2024年3月20日
Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.
Proceedings of the Annual Conference of JSAI JSAI2021 3N3IS2e05-3N3IS2e05 2021年
Recently, inverse reinforcement learning, which estimates the reward from an expert's trajectories, has been attracting attention for imitating complex behaviors and estimating intentions. This study proposes a novel deep inverse reinforcement learning method that combines LogReg-IRL, an IRL method based on linearly solvable Markov decision process, and ALOCC, an adversarial one-class classification method. The proposed method can quickly learn rewards and state values without reinforcement learning executions or trajectories to be compared. We show that the proposed method obtains a more expert-like gait than LogReg-IRL in the BipedalWalker task through computer experiments.
Proceedings of the Annual Conference of JSAI JSAI2021 4N2IS3b01-4N2IS3b01 2021年
In recent years, the application field of drones has rapidly expanded, and they are attracting attention as a means of gathering information in disaster areas and as a means of disaster relief. Currently, drones are mainly controlled manually or by programmed control with pre-defined actions. Manual control limits the flight range to the transmitter's communication range and requires constant monitoring of the drone in flight, making it difficult to operate the drone when access to the disaster area is difficult or visibility is poor. In addition, since programmed control sets the drone's behavior in advance, stable control in response to changes in the flight environment due to a disaster may be difficult in emergency surveys during a disaster. The purpose of this study is to verify the applicability of model-based control, which is based on a mathematical model of the environment, and model-free control, which obtains optimal control rules from interaction with the environment, to drone’s controller for realizing autonomous drone flight in environments that are difficult to operate with existing control methods. Specifically, we compared the performance of model predictive control as model-based control and reinforcement learning as model-free control by computer experiments, and verified that the characteristics of both control system differ depending on the nature of the disturbance.
This study is intended to encourage appropriate social norms among multiple agents. Effective norms, such as those emerging from sustained individual interactions over time, can make agents act cooperatively to optimize their performance. We introduce a "social learning" model in which agents mutually interact under a framework of the coordination game. Because coordination games have dual equilibria, social norms are necessary to make agents converge to a unique equilibrium. As described in this paper, we present the emergence of a right social norm by inverse reinforcement learning, which is an approach for extracting a reward function from the observation of optimal behaviors. First, we let a mediator agent estimate the reward function by inverse reinforcement learning from the observation of a master's behavior. Secondly, we introduce agents who act according to an estimated reward function in the multiagent world in which most agents, called citizens, have no way to act. Finally, we evaluate the effectiveness of introducing inverse reinforcement learning.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.22(2014) No.2 (online)DOI http://dx.doi.org/10.2197/ipsjjip.22.299------------------------------This study is intended to encourage appropriate social norms among multiple agents. Effective norms, such as those emerging from sustained individual interactions over time, can make agents act cooperatively to optimize their performance. We introduce a "social learning" model in which agents mutually interact under a framework of the coordination game. Because coordination games have dual equilibria, social norms are necessary to make agents converge to a unique equilibrium. As described in this paper, we present the emergence of a right social norm by inverse reinforcement learning, which is an approach for extracting a reward function from the observation of optimal behaviors. First, we let a mediator agent estimate the reward function by inverse reinforcement learning from the observation of a master's behavior. Secondly, we introduce agents who act according to an estimated reward function in the multiagent world in which most agents, called citizens, have no way to act. Finally, we evaluate the effectiveness of introducing inverse reinforcement learning.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.22(2014) No.2 (online)DOI http://dx.doi.org/10.2197/ipsjjip.22.299------------------------------