Guiding from A* to MARL, starting with the MAPF problem (Part 1)

In this series of articles, we embark on a journey from the fundamental problem of pathfinding to the complex realm of Multi-Agent Path Finding (MAPF). We start by delving into the classical pathfinding problem, where the focus is on finding the shortest path between two points for a single agent, using the well-known A* algorithm.

As we progress, we gradually drop assumptions, leading us to the challenge of MAPF - finding a shortest path for all agents to reach their goals without collisions. This complex problem is often encountered in many real-world applications, such as autonomous vehicles, robotics, and video games.

Two notable solvers for MAPF are the Increasing Cost Tree Search (ICTS) and Conflict Based Search (CBS). ICTS is a two-level solver that combines a high-level and low-level solver. The high-level solver searches the Increasing Cost Tree (ICT), where each node contains a vector of costs allowed for each agent path. The low-level solver validates if a joint solution can be found by planning for each agent separately under the defined cost by the high-level solver. If the low-level solver fails, the high-level solver expands the ICT root.

CBS, on the other hand, is another two-level solver that searches in the conflict space. It associates agents with constraints, where a constraint is defined for a specific agent, location, and time step. The low-level solver in CBS searches for a shortest path that is consistent given a set of constraints, while the high-level solver searches the constraint tree (CT).

The exponential search-space of MAPF poses a significant challenge, as the time complexity of A is polynomial, but for MAPF with k agents, the branching factor is b^k, making a trivial generalization of A to MAPF infeasible. To tackle this issue, various algorithms have been proposed, such as Independence Detection (ID), Operator Decomposition (OD), and Enhanced Partial Expansion A (EPEA).

In addition to these traditional methods, research on Reinforcement Learning (RL) and Multi-Agent RL (MARL) algorithms has significantly advanced over the past decade, with the rise of deep learning playing a role. The foundations of RL can be thought of as a planning problem formulated as a learning system, with roots in AI planning theory.

Throughout this series, we will focus on the "multi-agent systems path" from A* to MARL, covering optimal multi-agent pathfinding, classical planning, planning under uncertainty, planning with partial observability, and RL. We will also discuss various MAPF algorithms, their properties, and the factors to consider when choosing the most suitable algorithm for a MAPF problem.

Stay tuned as we continue to explore the fascinating world of multi-agent pathfinding!