A method for renewable energy bidding is provided. In the method,
an actor network is created by using an actor state including a supply amount of each energy supplier and a total demand amount of multiple energy demanders as an input and using an electricity sale quotation of the energy supplier as an output. A critic network is created by using a critic state including the actor state, a next actor state and a reward obtained by adopting the electricity sale quotation as an input and using a value function as an output. Parameters of the critic network are updated through stochastic gradient descent to minimize a temporal difference error. Parameters of the actor network are updated through stochastic gradient ascent to maximize the reward accumulated by the energy supplier. The updated parameters of the actor network are transferred to a new energy supplier and the aforesaid updating steps are repeated to maximize the reward accumulated by the new energy supplier. |