A potential concept that could be effective for multiple applications is a "cyber-physical system" (CPS). The Internet of Things (IoT) has evolved as a research area, presenting new challenges in obtaining valuable data through environmental monitoring. The existing work solely focuses on classifying the audio system of CPS without utilizing feature extraction. This study employs a deep learning method, CNN-LSTM, and two-way feature extraction to classify audio systems within CPS. The primary objective of this system, which is built upon a convolutional neural network (CNN) with Long Short Term Memory (LSTM), is to analyze the vocalization patterns of two different species of anurans. It has been demonstrated that CNNs, when combined with mel-spectrograms for sound analysis, are suitable for classifying ambient noises. Initially, the data is augmented and preprocessed. Next, the mel spectrogram features are extracted through two-way feature extraction. First, Principal Component Analysis (PCA) is utilized for dimensionality reduction, followed by Transfer learning for audio feature extraction. Finally, the classification is performed using the CNN-LSTM process. This methodology can potentially be employed for categorizing various biological acoustic objects and analyzing biodiversity indexes in natural environments, resulting in high classification accuracy. The study highlights that this CNN-LSTM approach enables cost-effective and resource-efficient monitoring of large natural regions. The dissemination of updated CNN-LSTM models across distant IoT nodes is facilitated flexibly and dynamically through the utilization of CPS.