The development of real-time affect detection models often depends upon obtaining annotated data for supervised learning by employing human experts to label the student data. One open question in labeling affective data for affect detection is whether the labelers (i.e., human experts) need to be socio-culturally similar to the students being labeled, as this impacts the cost and feasibility of obtaining the labels. In this study, we investigate the following research questions: For affective state labeling, how does the socio-cultural background of human expert labelers, compared to the subjects (i.e., students), impact the degree of consensus and distribution of affective states obtained? Secondly, how do differences in labeler background impact the performance of affect detection models that are trained using these labels? To address these questions, we employed experts from Turkey and the United States to label the same data collected through authentic classroom pilots with students in Turkey. We analyzed within-country and cross-country inter-rater agreements, finding that experts from Turkey obtained moderately better inter-rater agreement than the experts from the U.S., and the two groups did not agree with each other. In addition, we observed differences between the distributions of affective states provided by experts in the U.S. versus Turkey, and between the performances of the resulting affect detectors. These results suggest that there are indeed implications to using human experts who do not belong to the same population as the research subjects.