Automatic defect detection and classification from images is becoming increasingly important for bridge deterioration prediction and maintenance decision making. The majority of existing defect detection efforts have developed their datasets for training a machine-learning algorithm for detection/classification. However, the majority of these datasets suffer from two main limitations. First, most of the datasets are relatively small in size, which is not sufficient to build a well-trained, accurate image classifier. Second, most of the datasets lack the needed variety in scenes, angles, and backgrounds, which is not adaptable to different application contexts and environments. To address these limitations, this paper proposes a semantic image retrieval and clustering method to collect a large size of relevant images with various scenes, angles, and backgrounds from the Web and cluster these images for supporting domain-specific bridge component and defect detection. The proposed method includes three primary steps: query formation and image search and retrieval, image representation, and image clustering. First, a set of domain-specific words were extracted from bridge inspection documents and used as queries for retrieving a large number of images from the Web. Second, a transfer learning technique was used to transfer knowledge in a pre-trained model for general image classification to the bridge component and defect-related image clustering task. A deep convolutional neural network (CNN) with pre-trained weights was used to extract the visual features of the images for image representation. Third, a clustering technique was used to cluster the images based on the extracted features. The performance of the proposed method was evaluated using the silhouette coefficient. The evaluation results show that the proposed method is promising.