Being the third most common cause of disability globally, clinical depression is a serious global health concern that is characterized by melancholy, loneliness, and low self-esteem. About 10% of adults in the US alone suffer from this mental disorder, which is difficult to quantify because it is subjective. The subjectivity of traditional diagnostic techniques like surveys and interviews is a drawback. While more objective, biological markers run the risk of incorrect diagnosis. To highlight the distinctive acoustic characteristics of depressed people's speech, such as pauses, low energy, and monotonicity, this paper investigates the possibility of speech patterns serving as objective markers for depression. It talks about how research on Speech Depression Recognition (SDR) is moving toward deep learning models such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN). The difficulties encountered in SDR research are also discussed in the paper, such as the requirement for sizable, trustworthy datasets and the shortcomings of the available databases in terms of scenario diversity, imprecise labeling, and privacy restrictions. To conduct a more precise and effective analysis of depression, the conclusion highlights the significance of comprehending the physiological effects of depression on speech, improving data collection, fostering interdisciplinary collaboration, investigating various forms of depression, and integrating multimodal data.