This paper considers a multi-period, multi-item Newsvendor problem under budget constraints in which a decision-maker orders items with aims to minimize the total inventory cost including inventory holding cost and backlog cost. In this decision process, the order quantities are constrained by two types of budget constraint: periodic budget and flexible budget. The problem is formulated as an action-constrained Markov Decision Process (MDP). To overcome the dimensionality and ambiguity, we employed a Q-learning method for solving the MDP model. In particular, we modified the conventional Q-learning procedure to handle a constrained action space by imposing penalties for constraint violations or incentives for constraint satisfactions on Q-values. The penalties and incentives are obtained by solving a quadratic optimization problem included in the learning procedure. Numerical analysis compares the performance of the proposed Q-learning method with others such as EOQ (Economic Order Quantity), Q-learning without the budget constraint, and a heuristic method. The experimental results showed that the proposed Q-learning method lowers the total inventory cost while increasing the chance of satisfying the budget constraint.