UNI-MB - logo
UMNIK - logo
 
E-resources
Peer reviewed Open access
  • A novel feature engineered-...
    Hussain, Saddam; Mustafa, Mohd. Wazir; Jumani, Touqeer A.; Baloch, Shadi Khan; Alotaibi, Hammad; Khan, Ilyas; Khan, Afrasyab

    Energy reports, November 2021, 2021-11-00, 2021-11-01, Volume: 7
    Journal Article

    This paper presents a novel supervised machine learning-based electric theft detection approach using the feature engineered-CatBoost algorithm in conjunction with the SMOTETomek algorithm. Contrary to the previous literature, where the missing observations in data are either ignored or imputed with average values, this work utilizes k-Nearest neighbor technique for missing data imputation; thus, an accurate and realistic estimation of the missing data is achieved. To mitigate the biasness to the majority data class, the proposed model utilizes the SMOTETomek algorithm, which neutralizes the mentioned effect by managing a proper balance between over-sampling and under-sampling techniques. Feature Extraction and Scalable Hypothesis (FRESH) algorithm is utilized at the later stage of the proposed NTL detection framework to extract and select the most relevant data features from the provided dataset. Afterward, the model is trained using the CatBoost algorithm to classify the consumers into two distinct categories, i.e., genuine and theft. Finally, to interpret the model’s decision for the corresponding predictions, the tree-SHAP algorithm is utilized. To validate the efficacy of the proposed ML based theft detection approach, its performance is compared with that of the traditional gradient boosting ML algorithms such as XGBoost, lightGBM, Ensemble bagging, boosting ML models, and other conventional ML models using five of the most widely used performance metrics, i.e., precision, accuracy, F1score Kappa and MCC. The proposed technique achieved an accuracy of 93% and a detection rate of 92%, which is significantly higher than all the considered competing algorithms under identical dataset and hyperparameters.