With the rise of Industry 4.0, manufacturing is shifting towards customization and flexibil-ity, presenting new challenges to meet rapidly evolving market and customer needs. To address these challenges, this paper suggests a novel approach to address flexible job shop scheduling problems (FJSPs) through reinforcement learning (RL). This method utilizes an actor-critic architecture that merges value-based and policy-based approaches. The actor generates deterministic policies, while the critic evaluates policies and guides the actor to achieve the most optimal policy. To construct the Markov decision process, a comprehensive feature set was utilized to accurately represent the system's state, and eight sets of actions were designed, inspired by traditional scheduling rules. The formula-tion of rewards indirectly measures the effectiveness of actions, promoting strategies that minimize job completion times and enhance adherence to scheduling constraints. The experimental evaluation con-ducted a thorough assessment of the proposed reinforcement learning framework through simulations on standard FJSP benchmarks, comparing the proposed method against several well-known heuristic scheduling rules, related RL algorithms and intelligent algorithms. The results indicate that the pro-posed method consistently outperforms traditional approaches and exhibits exceptional adaptability and efficiency, particularly in large-scale datasets.