English as Second Language (ESL) learning widely implemented in the global education system as English is essential for workplace, diplomate communication, and contractual matters. The effective teaching and learning of ESL since early age provide the students competitive advantages. Deployment of multimedia such as Augmented Reality (AR) in ESL had been implemented for this purpose. This study used multimedia learning environment AR English Vocabulary Acquisition (ARenVA) to test its effectiveness of learning among 37 Malaysia primary school students from 4th, 5th, and 6th grade. Explanatory research was conducted with instruments namely visual text and AR 2D (ARvt), spoken text and AR 2D (ARst), visual text, spoken text and AR 2D (ARvtst). Interview, observation, pre-test/post-test questionnaire and English test were used to identify students' perceived motivation and English performance. The perceived motivation is in relation with Keller's ARCS model of Attention, Relevance, Confidence and Satisfaction. SPSS was used to analyze the quantitative data; while NVivo and content analysis were utilized to analyze the qualitative datasets of semi-structured interview. Findings indicated that the ARenVA's ARstvt treatment mode significantly improves the vocabulary acquisition of ESL learning by 30.76% and perceived motivation by 11.50%; more effective in motivating students by 32 times as compared to ARvt treatment mode. The novelty of this study situated on the effects of AR English learning towards improving motivation and English vocabulary acquisition of ESL learning among 4th, 5th and 6th grade Malaysia student, significantly identified its benefit through the numerical percentages. Nevertheless, the instructional design of AR English learning controls the success of learning results. The paper reviews on the success systematic implementation of different ARenVA treatment modes.