To solve the rapidly growing data traffic demand of massive users and strengthen the integration of wide-area and near-field communication, a method of collaborative caching and transmitting contents in heterogeneous networks (HetNets) supporting device-to-device (D2D) communication was investigated. We proposed a cache-enabled HetNets model in which user devices (UDs), small-cell base stations (SBSs) and macro-cell base stations (MBSs) all participate in caching and content serving. To improve the quality of service (QoS) and network service efficiency, the transmission cost of services is defined based on service delay and energy consumption. Considering the random characteristics of the network and the uncertainty of signals, using stochastic geometry, the probability of successful transmission for each level of caching node was derived, and on this basis, the expression for the average transmission cost was derived. The caching strategy was optimized by minimizing the transmission cost of services, and the sub-optimal solution was obtained by using the standard gradient method. Finally, the proposed algorithm was compared with two deterministic caching strategies based on content popularity and a probabilistic strategy based on maximizing the successful offloading rate. The results show that the proposed algorithm has significant advantages in reducing transmission cost.