Rainfall-runoff modeling is of great importance in hydrological sciences. Several different models have been developed for runoff modeling in three main categories i.e. physically-based, conceptual and empirical models. Data driven models are of the most widely used models in runoff modeling besides process based models. Different studies have been done to assess the performance of various models and the effect of input datasets, data length and disparate signal processing methods on the modeling performance. However, each of these studies has examined one of these factors separately and didn't assess the effect of these factors on the accuracy of runoff forecasting. Therefore, assessing the importance of each of the mentioned factors as well as determining the optimum structure that produces the best accuracy is still challenging. The main aim of this study was to determine the importance and the optimal combination of these factors in daily runoff modeling. In order to achieve this goal, Taguchi method was used. First, five levels were defined for each of the abovementioned factors. Five different input data combinations, five data driven models i.e. Adaptive Neuro-Fuzzy Inference System (ANFIS), Support Vector Regression (SVR), Group Method of Data Handling (GMDH), Random Forest (RF) and Partial Least Square Regression (PLS), four different signal processing methods i.e. normalization, wavelet, ensemble empirical mode decomposition (EEMD) and singular spectrum analysis (SSA) as well as no pre-processing condition, and five data lengths i.e. 2, 5, 10, 15 and 20 years were considered. The L-25 Taguchi orthogonal array was selected accordingly. The required 25 tests were implemented according to the L-25 Taguchi orthogonal array in three different basins to achieve more generalizable results. The results were then used in Taguchi analysis in order to attain the optimal combination of the levels of the mentioned factors and the importance of these factors in accurate prediction of runoff. Results showed that the hybrid wavelet-GMDH model with a complete dataset as input and 20-year data length provides the highest accuracy. It was also shown that the order of mentioned factors in terms of their importance and effect on runoff prediction accuracy is as follow: input dataset, data length, preprocessing and model type. GMDH and SVR had the best performance and wavelet and EEMD signal processing methods had the highest effect on the data driven models performance.