In the era of digital connectivity, tourists frequently share their travel experiences on social media platforms. Among these platforms, Flickr has emerged as a valuable data source for understanding the nuances of tourist behavior, owing to its open API and transparent image rights. This study proposes a novel methodology to analyze tourist destinations through user-generated images on Flickr. By employing the BLIP system, captions are generated for each image, providing a context that extends beyond visual content. Analyzing latent topics within these captions using BERTopic, the study achieved clustering of image groups, offering insights into the diversity of tourist experiences and interests. A case study centered on Fushimi Inari Shrine processed several hundred images, clustering them to infer comprehensive themes of tourist interests. The results emphasized the efficacy of our methodology in identifying key attractions and themes from clustered images. However, it was also observed that images centered on humans often dominated clustering results, potentially overshadowing other significant themes. In conclusion, this innovative approach paves the way for a deeper understanding of tourist preferences and perceptions. By converting visual data into textual descriptions and categorizing them, stakeholders in the tourism industry can gain a richer and more nuanced perspective on what captivates visitors, aiding future marketing and development efforts.