Dynamic 3D Load Optimization Using AI and Heuristic Integration in Smart Logistics

3D Bin Packing Reinforcement Learning Hybrid Optimization Proximal Policy Optimization (PPO) Logistics and Load Efficiency Process Innovation

Authors

Downloads

Efficient 3D bin packing remains a significant challenge in logistics, supply chain management, and warehouse automation, where the objective is to maximize space utilization and maintain load stability while minimizing computational time. Traditional heuristic-based methods, such as First Fit and Best Fit, have long been used for their simplicity and speed; however, they often struggle to achieve optimal results in dynamic and complex packing environments. To address these limitations, recent works have explored the use of metaheuristic approaches like Genetic Algorithms (GAs), and more recently, Reinforcement Learning (RL), particularly Proximal Policy Optimization (PPO), to enhance decision-making under constraints. This study proposes a hybrid bin packing solution that combines the strengths of PPO-based reinforcement learning with traditional heuristic strategies to intelligently select item placements in a simulated 3D packing environment. The system was tested using four container sizes and a standardized set of boxes with constraints on volume and weight. Four algorithms—First Fit, Best Fit, Genetic Algorithm, and the proposed Hybrid PPO model—were evaluated using consistent metrics, including packing time, placement success rate, space used, total weight, access efficiency, and stability score. The experimental results reveal that while the First Fit algorithm achieves the fastest packing time (13,269s), it delivers lower placement success (48.4%) and access efficiency (0.60). The Genetic Algorithm achieves high placement rates (52.4–100%) and maximum packing performance, but at a significantly higher computational cost (92,124s). The Hybrid PPO algorithm demonstrates the most balanced performance, achieving a 100% placement success rate in the smallest container and over 72.4% in the largest, while maintaining reasonable packing time (35,712s), high access efficiency (up to 0.95), and superior stability scores (up to 0.80). The Hybrid PPO model outperforms traditional methods and standalone GAs by combining intelligent learning with domain-specific heuristics. This positions the hybrid approach as a promising and scalable solution for real-world logistics environments demanding both efficiency and adaptability in 3D load optimization.