Energy demand modelling has been widely applied in various contexts, including power plant generation, building energy simulation and demand-side management. However, it is still an ongoing research topic in terms of the choice of modelling method, feature engineering for data-driven methods, the application contexts and the type of data used. In the residential sector, survey-based and meter-based approaches are categorised according to the type of input data used, i.e. the activity records from the time use survey and energy consumption from meters respectively. These two paradigms are not necessarily easy to combine, which warrants the questions of when one may be preferred over the other and whether they need to be combined despite the significant data requirements. Other details also have a huge impact on the data structure and performance of the energy demand model, including the choice of influential factors, the historical time window of factors selected, the split between training and test data, and the choice of machine learning (ML) algorithm. There is a lack of comparative research to guide researchers and practitioners in developing energy demand modelling capability, specifically as it pertains to these issues. This study analyses three groups of test scenarios in a multi-household residential context based in the UK. Six ML algorithms (LightGBM, Random forest, ANN, SVM, KNN and LSTM), with eight sets of various influential features, at four different historical time window widths and two train-test splits were compared. An appropriate methodology was designed to capture the temporal impact of activities on energy demand and represent the overlap and interaction of activities. The results show that the combination of meterbased and survey-based energy demand models performs better in terms of modelling accuracy and robustness against sudden load variation. Particularly, integrating energy tariffs, household and individual attributes, appliance usage and general activity features can improve the energy demand model. Among the ML algorithms, LightGBM and ANN perform better than other algorithms while LSTM and SVM may not be suitable in this multihousehold short monitoring context.