Depression is a prevailing issue and is an increasing problem in many people's lives. Due to the lack of physical symptoms, diagnosing depression is a challenge. More importantly, not detecting depression in time might lead to tragic consequences. In this work, we studied the feasibility of depression severity prediction based on passively collected smartphone sensor data. Moreover, we explored the influence of feature extraction window size on classification accuracy and found out that longer window sizes lead to higher prediction accuracy. As a result, we achieved the highest accuracy of 77% with the largest tested window size of two weeks. Furthermore, we analyzed the associations between extracted sensor features and PHQ-9 reported item scores separately for three depression groups and explored similarities and differences among non-depressed, depressed, and severely depressed groups.