The enormous benefits of AI-powered chatbots in English classrooms have made Korean researchers work on many studies on various topics ranging from development to empirical application of chatbots. It has been suggested, however, that their accuracy may be hindered by inherent software technology and/or user expressions. Despite that, Korean schools have used AI chatbots to evaluate the vocabulary, speaking ability, and reading skills of students. Therefore, it is imperative to investigate the accuracy of AI chatbots used for English teaching, from speech recognition to data extraction. Considering the issues and challenges, the present study aimed to examine the errors in AI chatbots utilized in English classrooms, based on the preliminary research. This study included three major dimensions - three sandhi categories which were subdivided into seven, accents of users (either Korean EFL learners or native American English speakers), and two chatbots driven from different APIs, each of which represents a country. Various tests have been conducted to quantitatively analyze the collected data and to draw implications from them. Results showed that expressions including contraction, especially "will contraction," had the greatest impact on decreasing the AI chatbot's comprehensibility in terms of both ASR and intent matching. Moreover, native speakers performed better than Korean EFL students in both speech recognition and intent matching over all sandhi types. However, in comparing the efficiency of the NLU module on native speakers and Korean students, it was found that its functionality was in better working conditions for students. Finally, AI chatbots powered by Google Dialogflow API were expected to perform better than Naver CLOVA-driven chatbots when processing expressions including sandhi phenomena in an English classroom. With these results, the present study proposes some ideas for designing and employing chatbots in order to reduce the error rate and attain a higher level of efficacy in TEFL.