Accent-variation is a challenging issue, either for traditional hybrid or current end-to-end (E2E) automatic speech recognition (ASR). Building an accent-invariant and high quality ASR system is very important for most real applications. In this study, we propose a Conformer-based architecture with accent-discriminative encoders, to leverage the accent attributes of input speech for enhancing an accent-invariant E2E ASR system. In this architecture, the encoders are composed of one universal, and two dominant accent-specific encoders. These encoders are first pre-trained and then jointly adapted with a single attention-based decoder in an end-to-end manner. Furthermore, different weighting methods and a multi-encoder-decoder architecture is also investigated and compared. Our experiments are performed on the public Common Voice with five different English-accents, results show that our proposed architecture outperforms the strong baseline in both in-domain and out-of-domain accented-ASR tasks, with a relative 2.9–3.8% word error rate reduction. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.