Product Length Predictions with Machine Learning: An Integrated Approach Using Extreme Gradient Boosting

被引:0
|
作者
Thakur A. [1 ]
Kumar A. [1 ]
Mishra S.K. [1 ]
Behera S.K. [2 ]
Sethi J. [3 ]
Sahu S.S. [4 ]
Swain S.K. [1 ]
机构
[1] Department of Electrical and Electronics Engineering, Birla Institute of Technology Mesra, Ranchi
[2] Department of Electronics and Telecommunication Engineering, DRIEMS Autonomous Engineering College, Tangi, Odisha, Cuttack
[3] Department of Electronics and Instrumentation Engineering, Odisha University of Technology and Research, Techno Campus, Ghatikia, Odisha, Bhubaneswar
[4] Department of Electronics and Communication Engineering, Birla Institute of Technology Mesra, Jharkhand, Ranchi
关键词
Interquartile Range (IQR); Natural Language Processing (NLP); Predictive Modeling; Term Frequency-Inverse Document Frequency (TF-IDF); XGBoost;
D O I
10.1007/s42979-024-02999-8
中图分类号
学科分类号
摘要
The study aims to introduce a novel machine learning approach for the prediction of product lengths by addressing diverse data types including numeric, textual and categorical data and extracting valuable information from the dataset to enhance prediction accuracy. This is achieved by employing methods that combine text vectorization, gradient boosting algorithm and feature encoding of categorical data, specifically using Term Frequency-Inverse Document Frequency (TF-IDF), eXtreme Gradient Boosting (XGBoost) and target encoding. Our method begins with thorough data preparation, removing outliers and filling in missing values, then extracts important features from product titles, descriptions, and bullet points present in the dataset. We convert text from product titles, descriptions, and bullet points into numerical form using the TF-IDF technique. It captures the weighted frequency of words in the form of TF-IDF feature vectors enabling the effective application of the algorithm. Our training process employs RandomizedSearchCV to optimize the XGBoost model’s hyperparameters utilizing TF-IDF vectors and target encoded product type IDs. This allows the model to effectively handle variability and uncertainty for product length predictions. The techniques used contribute to the adaptability of the method and enable accurate prediction of product length in e-commerce which can be helpful in inventory management across diverse products. This can extend their utility to optimize supply chain operations, improving demand forecasting across a variety of products, and aiding in strategic planning for procurement and stock levels. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
引用
收藏
相关论文
共 50 条
  • [21] A Novel Approach to Detect Fake News Using eXtreme Gradient Boosting
    Reddy, S. Sweta
    Mandal, Santanu
    Kasyap, Varanasi L. V. S. K. B.
    Aswathy, R. K.
    2022 10TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2022,
  • [22] Drought Forecasting Using Integrated Variational Mode Decomposition and Extreme Gradient Boosting
    Ekmekcioglu, Oemer
    WATER, 2023, 15 (19)
  • [23] Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost)
    Taskin Kavzoglu
    Alihan Teke
    Arabian Journal for Science and Engineering, 2022, 47 : 7367 - 7385
  • [24] Using Machine Learning Extreme Gradient Boosting Model to Predict Major Adverse Cardiovascular Events: A Systematic Review
    Haseeb, Shahan
    Ansari, Umair
    Ali, Hassam
    JACC-CARDIOVASCULAR INTERVENTIONS, 2024, 17 (04) : S55 - S55
  • [25] Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting
    Alghushairy, Omar
    Ali, Farman
    Alghamdi, Wajdi
    Khalid, Majdi
    Alsini, Raed
    Asiry, Othman
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024, 42 (22): : 12330 - 12341
  • [26] Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction
    Tao, Hai
    Awadh, Salih Muhammad
    Salih, Sinan Q.
    Shafik, Shafik S.
    Yaseen, Zaher Mundher
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01): : 515 - 533
  • [27] Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction
    Hai Tao
    Salih Muhammad Awadh
    Sinan Q. Salih
    Shafik S. Shafik
    Zaher Mundher Yaseen
    Neural Computing and Applications, 2022, 34 : 515 - 533
  • [28] Machine learning for fatigue lifetime predictions in 3D-printed polylactic acid biomaterials based on interpretable extreme gradient boosting model
    Nasiri, Hamid
    Dadashi, Ali
    Azadi, Mohammad
    MATERIALS TODAY COMMUNICATIONS, 2024, 39
  • [29] Improved Software Product Reliability Predictions using Machine Learning
    Joshi, Sanjay
    Badhe, Yogesh
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2021, : 245 - 252
  • [30] Estimation of Hematocrit Volume Using Blood Glucose Concentration through Extreme Gradient Boosting Regressor Machine Learning Model
    Sharma, Kirti
    Tiwari, Pawan K.
    Sinha, S. K.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025, 65 (04) : 1736 - 1746