Power plant start-up scheduling is aimed at minimizing the start-up time while limiting maximum turbine-rotor stresses. A shorter start-up time not only reduces fuel and electricity consumption during the start-up process, but also increases its capability of adapting to changes in electricity demand. Also, on-line start-up scheduling would increase the flexibility of power plant operation. The start-up scheduling problem can be formulated as a combinatorial optimization problem with constraints. This problem, however, has a number of local optima with a wide and high-dimension search space. We have shown that the optimal schedule lies somewhere near the boundary of the feasible space. To achieve an efficient and robust search model, we have proposed the use of an enforcement operator to focus the search along the boundary and other local search strategies such as reuse function and tabu search used in combination with Genetic Algorithms (GA). To increase the search efficiency further and to satisfy on-line search performance, we have also proposed integrating GA with reinforcement learning. During the search process, GA would guide reinforcement learning toward the promising areas and, in return, reinforcement learning can generate a good schedule in the earlier stage of the search process. It has been confirmed that after learning representative optimal schedules the search performance virtually satisfies the goal of this research: searching optimal or near-optimal schedules in 30 seconds. To apply reinforcement learning for industrial use, the design of a reward strategy is crucial. Based on the results of analysis, we have shown that a) positive rewards succeed with bo th low and high-dimension reinforcement-learning output, and b) negative rewards succeed only with low-dimension output. In this paper, we present our proposed model with the analysis and test results.