In the past several decades, Single Hidden Layer Feedforward Neural Network (SLFN) has drawn a large amount of attention in the field of machine learning, data mining and pattern recognition, due to its unique characteristics, i.e., learning capability from the input samples, and universal approximation capability for complex nonlinear mappings. Although SLFN has been investigated extensively from both theoretical and application aspects, it is still quite challenging to automatically determine a suitable network architecture for solving a specific task so that the resulting learner model can achieve sound performance for both learning and generalization. Extreme Learning Machine (ELM) is a powerful learning scheme for generalized SLFN with fast learning speed and has been widely used for both regression and classification. The hidden node parameters of ELM need not be exhaustively tuned during training, but assigned with random values simply, and the output weights are then analytically determined by solving a linear equation system using the generalized inverse method. However, for ELM, the suitable number of hidden nodes is usually pre-determined by trial and error, which may be tedious in some applications and does not guarantee that the selected network size will be close to optimal or will generalize well. Therefore, how to choose a parsimonious structure for ELM and to present a good capacity of generalization is the main objective of this paper. By formulating the learning problem as a subset model selection, we present an adaptive orthogonal search method to address the architectural design of ELM (referred to as AOS-ELM) for regression problems. In AOS-ELM, the hidden nodes can be deleted or recruited dynamically according to their significance to network performance, so that the network architecture can be self-configurable. More precisely, we first randomly generate a large number of hidden nodes using preliminary ELM as the candidate reservoir. Then, the hidden node output vector that has the highest correlation with the target output is selected from the candidates and added to the existing network by orthogonal forward selection in each step. Meanwhile, after a new hidden node is added to the set of selected variables, orthogonal backward elimination is commenced to see if any of the previously selected hidden nodes can be deleted without appreciably increasing the squared error. The procedure stops when no further additions or deletions are possible which satisfy the criteria. Finally, an enhanced backward refinement is implemented to correct mistakes made in earlier steps, so that the redundant hidden nodes are able to be deleted from the model as much as possible, and then the network complexity can be further reduced. To sum up, the proposed method can take into account the intrinsic connections and interactions between the hidden nodes, therefore offers a potential for finding the parsimonious network solutions that will fit the data. We demonstrate effective performance and superiority of the proposed method with experiments on several benchmark regression problems as well as two different color constancy tasks. Simulation results show that our method not only obtains a similar or higher learning accuracy than the preliminary ELM and other well-known constructive and pruning ELMs with a small number of hidden nodes, but also achieves better or comparable illuminant estimates over most of the test error metrics in comparison to several state-of-the-art color constancy algorithms. © 2021, Science Press. All right reserved.