A method and an apparatus for baseball strategy planning based on reinforcement learning are provided. The method includes following steps: collecting history data of multiple innings in past games of a team; defining multiple game states, multiple offensive and defensive actions, and multiple rewards corresponding to multiple offensive and defensive results based on multiple offensive and defensive processes that occur during the game, and using the same to establish an Q table; updating the Q table according to multiple combinations of the game state, the offensive and defensive action, and offensive and defensive result recorded in the history data; and, according to a current game state, sorting multiple Q values of all offensive and defensive actions executable under the current game state recorded in the updated Q table, and recommending the offensive and defensive action suitable for being executed under the current game state according to a sorting result. |