Transportation systems are highly dynamic, complex and interdisciplinary in nature, thus transiting it towards sustainability is difficult due to two reasons. One, covering every aspect of it simultaneously is a complicated issue and two, it becomes a barrier to communication among stakeholders with different areas of expertise. Indices (or composite indicators) for sustainable transportation (ST) are useful in overcoming these issues. Constructing indices for ST involves various steps in which the selection of suitable indicator set and framework forms the foundation. Moreover, the choice of most appropriate methods of normalization, weighting, and aggregation is also challenging. The main aim of this review paper is to bring forth various methodologies that have been adopted across different regions of the world (at different scales) for the development of ST indices. The effort has been made to explore the challenges (and solutions thereof) regarding the selection of suitable ST indicators and frameworks and also find the spatio-temporal suitability of methods of normalization, weighting, and aggregation. In this line, literature was first collected and then selected systematically. Then the selected literature was reviewed and analyzed. Various indicator and framework typologies followed by the challenges involved in their selection were discussed along with transportation specific examples. Next, a suite of 24 studies in which the primary focus was on the development of the ST index was further analyzed, highlighting the diversity in the choice of methods of normalization, weighting, and aggregation. Relative pros and cons of 7 normalization methods, 5 weighting methods, and 7 aggregation methods were tabulated to discuss their suitability. As a result of the systematic review and analysis of the literature, 18 specific criteria, broadly classified into representational, practical and contextual criteria were identified, which we argue will aid transportation professionals as far as the selection of "suitable indicator set" is concerned. To develop a framework for ST, seven essential attributes were identified. Moreover, it was found that answering an integrated set of questions: "why," "what," and "how" to choose and use the ST indicators are crucial in developing robust ST frameworks. The choice of methods of normalization, weighting, and aggregation with respect to time, scale and space was found critical as it would lead to varying results. It was also found that most of the studies lacked declaration and assessment of assumptions linked with these methods as doing so makes the choice of these methods vague and questionable.