Temporal link prediction is fundamental for analyzing and predicting the behavior of real evolving complex systems. Recently, advances in graph learning for temporal network snapshots present a promising approach for predicting the evolving topology. However, previous methods only considered temporal-structural encoding of the entire network, which leads to the overshadowing of crucial evolutionary characteristics by massive invariant network structural information. In this paper, we delve into the evolving topology and propose an auxiliary learning framework to capture not only the overall network evolution patterns but also the time-varying regularity of the evolved edges. Specifically, we utilize a graph transformer to infer temporal networks, incorporating a temporal cross-attention mechanism to refine the dynamic graph representation. Simultaneously, a dynamic difference transformer is designed to infer the evolved edges, serving as an auxiliary task and being aggregated with graph representation to generate the final predicted result. Extensive experiments are conducted on eight real-world temporal networks from various scenarios. The results indicate that our auxiliary learning framework outperforms the baselines, demonstrating the superiority of the proposed method in extracting evolution patterns.