Multi-object tracking is a challenging video task that requires both locating the objects in the frames and associating the objects among the frames, which usually utilizes the tracking-by-detection paradigm. Supervised multi-object tracking methods have made stunning progress recently, however, the expensive annotation costs for bounding boxes and track ID labels limit the robustness and generalization ability of these models. In this paper, we learn a novel multi-object tracker using only unlabeled videos by designing a self-supervisory learning signal for an association model. Specifically, inspired by the cycle-consistency used in video correspondence learning, we propose to track the objects forwards and backwards, i.e., each detection in the first frame is supposed to be matched with itself after the forward-backward tracking. We utilize this cycle-consistency as the self-supervisory learning signal for our proposed multi-object tracker. Experiments conducted on the MOT17 dataset show that our model is effective in extracting discriminative association features, and our tracker achieves competitive performance compared to other trackers using the same pre-generated detections, including UNS20 [1], Tracktor++ [ 2], FAMNet [8], and CenterTrack [31].