
BMe Research Grant 

In the urban transportation system, the existing traffic infrastructure and control system cannot meet the demand for growing traffic, resulting in frequent traffic congestions. Due to limited funds and resources, the costeffective implementation of intelligent selfoptimization systems and lowcost controllers in a real transport system is preferred to infrastructure reconstruction. The most common traffic light control strategy in the real world is the constant strategy, i.e., the time intervals of traffic lights are fixed and periodic resulting in inefficient traffic flow control. Therefore, a selfadaptive traffic signal control system is developed that offers a more efficient solution in the traffic management system and generates flexible strategies to control traffic flow via collecting and analyzing realtime data from the traffic environment.
The research work takes place out in a laboratory room provided by the Department of Control Engineering and Information Technology (IIT) of BUTE. Our department aims to persist in embracing and contributing to the latest advances in its competence domains, including the newest technologies related to the application of GPGPUs and artificial intelligence in visualization and image processing, development of GRID and cloudbased services, Industry 4.0 technologies, collaborative robotics to name a few.
The integration of artificial intelligence theories into the control of traffic lights is quite popular in current selfadaptive traffic light control systems. The traffic signal timing plan or the traffic management system can be rescheduled by the fuzzy logic rules based only on the local information [1]. Genetic algorithms, Ant Colony Optimization, and Particle Swarm Optimization are widely used to mimic biological social behavior [2]. The problem of traffic optimization can be transformed into a problem of game theory, where inbound links are considered players and the control of traffic lights is the decision of these players [3].
The machine learningbased selfadaptive traffic signal control system has selflearning capabilities and highefficiency computing capabilities in traffic control problems. It learns the traffic knowledge from the empirical information in the traffic environment to make optimal decisions to control traffic. Reinforcement learning (RL) methods are suitable for implementation in such a complex environment of traffic transportation systems [4]. Earlier work presented a simple case for the isolated traffic signal control, which involves the application of RL to demonstrate efficiency [5]. Multiagent reinforcement learning (MARL) is an extension of RL, where each traffic light can be constructed as an agent and light control is considered the actions taken by the agents. Multiple agents learn the policies to determine the optimal course of action to maximize rewards for themselves in each state [6].
Our goal is to model a single traffic intersection and find the best multipath plan for traffic flow by combining game theory and RL in a multiagent system. By analyzing dynamic and simulated traffic data, it adaptively controls traffic flow at all stages, automatically managing the operation of traffic lights rather than adjusting the timing plans of the traffic light. Thus, to improve the traffic situation, we proposed a semicooperative Nash Qlearning that can learn about and evaluate the advantages and disadvantages of the change in the state of environment after the agents have taken the selected joint action. It updates Qfunction based on Nash equilibrium of the current Qvalues and selects Nash equilibrium solution with cooperative behaviour due to the informational nonuniqueness in the learning process. However, the distribution of traffic between traffic sections is generally unbalanced, which sometimes leads to inefficient controls. To fill this gap, we evaluate and examine the extended version  a semicollaborative Stackelberg Qlearning that replaces the Stackelberg equilibrium solution with Nash equilibrium in updating the Qfunction. More specifically, there are two hierarchical equilibrium solutions in Stackelberg equilibrium. The agent with the largest queues is the leader and the others are three followers, and the leader can dominate the highest priority in the decisionmaking process.
Model formulation
Fig.1 shows a common structure of an intersection with four inbound links (i.e., ...) where vehicles arrive from outside the queues and four outbound connections (i.e., ...) through which the vehicles leave the intersection. In this case, the single red and green states of the traffic light can be considered controllable, which can be coded as follows: 0 and green: 1.
Game theoretical framework
This system can be modelled as a noncooperative nonzero sum game, inbound traffic links are commonly treated as players, and the status of signal light (green or red) is considered the decision. Nash equilibrium (called prisoner’s dilemma in the classic example) is usually feasible to achieve a rational balance for all players. Each player strives for an outcome that provides him with the lowest possible cost , and no player can better improve performance no matter how he/she changes the decision that can be expressed in Eq. (1).
(1)
Stackelberg equilibrium has hierarchical levels among players compared with Nash equilibrium. The leader imposes his strategy on the followers, and then the followers make the optimal joint decisions based on the decisions made by the leader.
Fig. 1. General structure of the intersection
Q learning
In
RL, agents (players) learn a policy
mapping
to an action
(2)
Where is the learning rate, which determines the extent to which newly acquired information overwrites old information. The discount factor determines the importance of future rewards.
Semicooperative Nash/Stackelberg QLearning
Semicooperative Nash/Stackelberg Qlearning is an algorithm that combines reinforcement learning and game theory in a multiagent system, where the maximum operator is replaced by equilibrium solutions to update Qvalues. Qvalues are updated according to Eq. (3):
(3)
Where
is
the Qvalue of the
agent
when all agents jointly take actions
in
state
In the experiment, the setting parameters of semicooperative Nash/Stackelberg Qlearning are examined, with a learning rate and discount factor of and , respectively. The constant strategy (i.e., the time intervals of green or red lights are fixed and periodic) commonly used in reality, and the semicooperative Nash/Stackelberg Qlearning methods are implemented and compared.
Fig. 2. Qvalues of all agents with joint action that corresponds to the
current state
Fig. 3. Qvalues of all agents with joint action that corresponds to the
current state
Fig. 2 shows the Qvalues of all the agents corresponding to the different
joint actions and current state, which are illustrated in one episode (i.e.,
50,000 iterations). In semicooperative Nash Qlearning, the subset for
Fig. 4. Comparison of vehicles waiting at the traffic light "Queues" in 20 time slices (300 seconds)
Fig. 5. Comparison of vehicles passing the intersection in 20 time slices (300 seconds)
Figs. 4, 5 show the changing trend of the vehicles waiting at the traffic light "Queues'' and vehicles passing through the intersection for semicooperative Nash/Stackelberg Qlearning and constant strategy. In Fig. 4, the queue length increases as time goes by due to the vehicles coming from outside the queues, and the number of passing vehicles tends to be periodic in Fig. 5. The sum of the vehicles passing through in total in 20 time slices for semicooperative Nash/Stackelberg Qlearning and constant strategy are 2425, 2655, 2343, respectively.
These results were compared with the constant strategy, suggesting that the semicollaborative Nash and Stackelberg Qlearning achieved improvements of approximately 3.50% and 13.32%, respectively, for queues and vehicles in transit. The semicooperative Stackelberg Qlearning algorithm may perform better than the semicooperative Nash Qlearning algorithm. These two algorithms are flexible and adaptive enough to be implemented in reality to control traffic lights while the communication information is limited between agents.
Future work will model a global network of intersections. Additionally, more conditions and constraints will be added to make the simulation more realistic.
List of corresponding own publications.
[S1] J. Guo, I. Harmati. Optimization of Traffic Signal Control Based on Game Theoretical Framework. Proceedings of the Workshop on the Advances of Information Technology, Jan. 24, 2019, Budapest, Hungary, pp. 105–110.
[S2] J. Guo, I. Harmati. Traffic Signal Control Based on Game Theoretical Framework, The 22nd International Conference on Process Control (PC19), IEEE, June 1114, 2019, Strbske Pleso, Slovakia, pp. 286291.
[S3] J. Guo, I. Harmati. Optimization of traffic signal control based on game theoretical framework, The 24th International Conference on Methods and Models in Automation and Robotics (MMAR), IEEE, August 2629, 2019, Międzyzdroje, Poland, pp. 354–359.
[S4] J. Guo, I. Harmati. Optimization of traffic signal control with different game theoretical strategies, The 23rd International Conference on System Theory, Control and Computing (ICSTCC), IEEE, October 911, 2019, Sinaia, Romania, pp. 750–755.
[S5] J. Guo, I. Harmati. Reinforcement Learning for Traffic Signal Control in Decision Combination. Proceedings of the Workshop on the Advances of Information Technology, Jan. 30, 2020, Budapest, Hungary, pp. 1320.
[S6] J. Guo, I. Harmati. Comparison of Game Theoretical Strategy and Reinforcement Learning in Traffic Light Control. Journal of Periodica Polytechnica Transportation Engineering, 2020, 48 (4), pp.313319.
[S7] J. Guo, I. Harmati. Evaluating multiagent Qlearning between Nash and Stackelberg equilibrium for traffic routes plan in a single intersection, Control Engineering Practice, 2020, 102, p. 104525.
[S8] H. He, J. Guo, K. Molnar. Triboelectric respiration monitoring sensor and a face mask comprising such as a triboelectric respiration monitoring sensor. Patent application, Filed, 2021, Hungary, P2100102.
[S9] J. Guo, I. Harmati. Traffic lanechanging modeling and scheduling with game theoretical strategy, The 25th International Conference on Methods and Models in Automation and Robotics (MMAR), IEEE, August 2326, 2021, Międzyzdroje, Poland, pp. 197202.
[S10] S. Taik, J. Guo, B. Kiss, and I. Harmati. Demand Response of Multiple Households with Coordinated Distributed Energy Resources, The 25th International Conference on Methods and Models in Automation and Robotics (MMAR), IEEE, August 2326, 2021, Międzyzdroje, Poland, pp. 203208.
[S11] H. He, J. Guo, B. Illés, A. Géczy, B. Istók, V. Hliva, D. Török, J.G. Kovács, I. Harmati and K. Molnár. Monitoring multirespiratory indices via a smart nanofibrous mask filter based on a triboelectric nanogenerator, Nano Energy, 2021, 89, p. 106418.
[S12] J. Guo, I. Harmati. Lanechanging decision modelling in congested traffic with a game theorybased decomposition algorithm, Engineering Applications of Artificial Intelligence, 2022, 107, p. 104530.
Table of links.
List of references.
[1] Chiu S. Adaptive traffic signal control using fuzzy logic. In Proceedings of the Intelligent Vehicles92 Symposium 1992 Jun 29 (pp. 98–107). IEEE.
[2] Wang Y, Yang X, Liang H, Liu Y. A review of the selfadaptive traffic signal control system based on future traffic environment. Journal of Advanced Transportation. 2018 Jun 27;2018.
[3] Guo J, Harmati I. Optimization of Traffic Signal Control with Different Game Theoretical Strategies. In2019 23rd International Conference on System Theory, Control and Computing (ICSTCC) 2019 Oct 9 (pp. 750–755). IEEE.
[4] Sutton RS, Barto AG. Introduction to reinforcement learning. Cambridge: MIT press; 1998 Mar1.
[5] Abdulhai B, Pringle R, Karakoulas GJ. Reinforcement learning for true adaptive traffic signal control. Journal of Transportation Engineering. 2003 May;129(3):278–85.
[6] Junchen J, Xiaoliang M. A learningbased adaptive signal control system with function approximation. IFACPapersOnLine. 2016 Jan 1;49(3):5–10.