Multi-Target Energy Disaggregation using Convolutional Neural Networks

Non-Intrusive Load Monitoring (NILM) has become popular for smart meters in recent years due to its low cost installation and maintenance. However, it requires efficient and robust machine learning models to disaggregate the respective electrical appliance energy from the mains. This study investigated NILM in the form of direct point-to-point multiple and single target regression models using convolutional neural networks. Two model architectures have been utilized and compared using five different metrics on two benchmarking datasets (ENERTALK and REDD). The experimental results showed that multi-target disaggregation setting is more complex than single-target disaggregation. For multi-target setting of ENERTALK dataset, the highest individual F1-score is 95.37% and the overall average F1-score is 75.00%. Better results were obtained for the multi-target setting of the other dataset with higher overall average F1-score of 83.32%. Additionally, the robustness and knowledge transfer capability of the models through cross-appliance and cross-domain disaggregation was demonstrated by training for a specific appliance on a specific data, and testing for a different appliance, house and dataset. The proposed models can also disaggregate simultaneous operating appliances with higher F1-scores. Keywords—Energy disaggregation; smart meters; load monitoring; ENERTALK dataset; multi-target disaggregation; multi-target regression; NILM knowledge transfer


I. INTRODUCTION
Using energy in an efficient manner has become one of the highest concerns for both utility and end-users nowadays, as the world is facing challenging problems including depletion of natural resources and emissions of environmentally hazardous gases. To be able to use electrical energy efficiently, first, both utility and consumers should know the amount of energy consumption of individual appliances. For this purpose, cost effective Non-Intrusive Load Monitoring (NILM) is considered a plausible alternative. NILM is a method that can deduce energy consumption of individual appliances from aggregated smart meter power data recorded at a single source. This process is based on software techniques and requires effective and efficient techniques for successful disaggregation. It can help understand consumer behavior and energy consumption of appliances, and hence provide feedback on how to save energy. According to [1], monitoring energy consumption can save up to 12% of electrical energy with positive impacts on natural resources and reduction of hazardous gas emission. Besides getting detailed insights of the energy usage, NILM is useful for better demand forecasting and tracing behavioral patterns of dwellers [2]. NILM can be realized using three major steps as shown in Fig. 1: (1) Data acquisition which is the collection of data by installing hardware such as smart meters, (2) Feature extraction which is the derivation of features from the collected data, and (3) Learning and inferencing which is the deployment of models such as training machine learning to make prediction. NILM can be exploited for disaggregation using either classification or regression techniques. In classification methods, the detection or identification procedure is more sophisticated when unique signatures or fingerprints are needed to formulate for classification of appliances [3], [4]. In regression, the appliance's on/off state or classification is obtained by leveraging the disaggregation results based on the on-threshold value of an appliance [5].
There are a number of disaggregation studies that applied various machine learning techniques including Decision Trees (DT), K-Nearest Neighbors (KNN), Neural Networks (NN), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), Denoising Auto Encoder (deAE), Sequence-to-sequence (seq2seq) and Sequence-topoint (seq2point) learning, Subtask Gated Network (SGU) and many others [6]- [13]. Sequence-to-sequence and sequenceto-point are recent NILM disaggregation paradigms that can predict direct energy consumption with promising results than the Hidden-Markov Model (HMM) and its variants. But in these techniques, choosing the receptive field (sliding window) is somewhat crucial as there are different types of appliances with the different activation cycles. Besides, there may be multiple predictions for a single time point in seq2seq which is redundant and calculation of mean is necessary for the final prediction. Considering all these points, this work formulates the disaggregation problem as a direct point to point regression problem on the motivation that it will retain the granularity of consumption information that will help the generalization capability and knowledge transfer in the disaggregation domain which is demonstrated in the experiment section. On the other hand, it will also reduce the burden of retaining contextual information and thereby releasing the computational and memory burden.
A disaggregation model can be trained either as a singletarget [5], [14] or multi-target [15] regression problem, or as a single-label [2], [16]- [18], or multi-label [3], [7], [19]- [22] classification problem. Single-and multi-label classification and multi-class classification were explored in many works in the literature. To our best knowledge, multi-target regression models for disaggregation are still under-studied. As the principle application of NILM is to separate the individual consumption from aggregate reading, in real time scenario, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 direct disaggregation results are more preferable to the disaggregation results obtained after classification. Moreover, multitarget disaggregation models are more suitable on the reason that it can significantly reduce the training time by requiring low resources. Therefore, it is important to formulate NILM problem as a point-to-point multi-target regression model and compare its performance with single target regression models, especially for recent datasets such as ENERTALK. It is also emphasized that a multiple output classification or regression model poses challenges for disaggregation of multiple appliances of the same type [23], especially when the devices are multi-state devices and non-linear devices [24]. The works in [4], [25], [26] attempted the simultaneous detection of the load using different algorithms and features. However, their scopes were limited to classification of the load and there is still a need to explore this problem using regression models.
Keeping the above mentioned issues as main focus, this paper has the following contributions. Energy disaggregation is formulated as the direct point-to-point multi-target disaggregation learning that will help retain granular consumption meta information to foster generalization and knowledge transfer. The proposed multi-target disaggregation approach has highest individual and average F1-score of 95.37% and 75% respectively for ENERTALK dataset. For REDD dataset, the highest individual F1-score is 95.06% and overall average is 83.32%. For generalization, robustness and knowledge transferability of multiple and single target point-to-point disaggregation models, the highest F1-scores are 96.40% and 93.50% for single and multiple target, respectively. Furthermore, this study provides a base for NILM researchers to compare their results, especially for ENERTALK dataset which is a relatively recent dataset with large number of houses.
The remainder of this paper is organized as follows. Section II reviews related work and Section III describes the problem formulation and methodology. Section IV presents the details about experiments, results and discussions on ENERTALK and REDD. Finally, the paper summary and conclusions are presented in Section V.

II. RELATED WORKS
Shin and his team [13] proposed Subtask Gated Network (SGN) for non-intrusive load monitoring in 2019. Their model used a classification subnetwork and a regression subnetwork and then the output of classification is gated with that of regression. A concise study of appliance features used for NILM is presented in [27] using Random Forest (RF) and Recursive Feature Elimination (RFE). Mengistu et al. [28] described an online cloud-based NILM system using HMM and Mean-Shift Clustering (MSC) algorithm in an unsupervised fashion. Unlike previous works, the study in [29] investigated the data reduction strategy in aggregated power signals by applying non-uniform subsampling (NUS).
A Graph-based semi-supervised multi-label classification model based on active/inactive (on/off) state of appliance was proposed in [19] using three graph-based algorithms (i.e. Local and Global Consistency (LGC), Gaussian Fields and Harmonic Functions(GFHF), and Manifold Regularization (MR)). Another multi-label classification of appliances using RAndom k-labELsets (RAkEL) with Decision Tree (DT) was explored in [7]. However, it was observed that the lowpower consumption appliances were not correctly identified. Kim and Lee [3] investigated multi-label classification using audio signal processing techniques: Spectrogram, Mel-Frequency Cepstral Coefficient (MFCC) and Mel-spectrogram. The spectrogram based feature is proved to have the promising results. Inspired by the success of CNN, the authors in [16] proposed a novel appliance identification method using CNN for feature extraction, and Adaptive Linear Programming Boosting (ALPBoost) for classification. An accuracy of 95.4% for single appliance identification and 91.8% for multiple appliance identification were achieved. In the literature, the voltage-current (V-I) trajectories were found powerful for formulating appliance signature. In [21] and [22], appliances were classified using V-I trajectory converted to grey-scale and color coded image, respectively. Though the models in [21] were able to successfully detect a large number of appliances, the washing machine, fan, fridge and air conditioner were not identified with better score. Reference [22] used AlexNet transfer learning methodology. The authors in [17] used wavelet coefficients for identification of four appliances using Decision Tree (DT) and Nearest Neighbor (NN) classifiers on the setting of semi-supervised learning.
Jiang et al. in their work [5] investigated the on/off detection of appliances and energy disaggregation using CNN, RNN and Wavenet that used fast sequence-to-point learning. Kaselimi et al. [30] exploited the multi-channel CNN based architecture to include multiple input features in sequence-tosequence (seq2seq) learning. However, the energies of multistate devices were not correctly estimated. Schirmer and his team [14], like in Kaselimi's work, proposed a two-state disaggregation model using DNN and temporal contextual information (TCI). The performance for single-and multistate device without power peak and non-linear devices are relatively low. How different combination of statistical and electrical features influence the disaggregation of various device types were studied in [6]. The authors in [9] studied the energy disaggregation using CNN, LSTM and CNN+LSTM on REDD dataset. In [31], a causal 1-D convolutional neural network based power disaggregation system (Wave-nilm) was proposed and the model was tested on AMPds2 dataset. It is noticed that the use of reactive power (Q) as input feature increases performance. In [32], Markov model was used to relate activity chain to each occupant of a household and then the energy usage per appliance was calculated and again these power usages were grouped under certain appliance categories. Appliance recognition and thereby disaggregation of energy using high frequency spectrogram feature (Short Time Fourier Transform (STFT)) was conducted in [33]. The authors in (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 10, 2020 [15] used a composite deep LSTM (CD-LSTM) to study the disaggregation using multi-target setting. This is the only work that used multi-target disaggregation. However, their work was based on sequence-to-sequence paradigm which has the drawbacks mentioned in the introduction section.

III. PROBLEM FORMULATION AND METHODOLOGY
The purpose of NILM is to deduce the individual appliance level power consumption from the total signal recorded through a smart meter. Inversely, the total aggregated energy of a residence can be estimated from all individuals' appliance consumption. This can be mathematically expressed as [24]: where p t is the aggregated power consumption at time t, p (i) t is the individual appliance power consumption at time t, and e t is line loss or error of measurement at time t. Based on Eq. 1, the energy disaggregation problem is defined as the direct point to point estimationp t is the estimated i th appliance energy at time t, given the ground truth consumption p (i) t . In exact terms, this estimation can be achieved either by formulating the disaggregation as a classification or regression problem.
In this work, as described in the introduction section, the NILM is formulated as a regression problem which can have significant advantages over classification problems. Classification requires deployment of various algorithms to identify various working states of the household devices. This can become more challenging when the devices with the different level of energy demand are operating simultaneously, requiring high sampling data to create unique signatures to differentiate one from the other devices. Moreover, keeping track of on/off timestamp, duration of on/off, and calculation of average load consumption during specific 'on' periods makes the algorithms more computationally burdened. In sequenceto-sequence and sequence-to-point learning, training the model for long sequences of input to cover activation cycle requires a relatively large memory due to the need to keep track of contextual information. This sometimes jeopardizes the system in a low-resource environment. On the other hand, a regression model does not require all these operations and per-appliance disaggregation value is obtained directly from the results of regression output layer. Additionally, regression model provides detailed information about energy usage at every time point which cannot be the case with classification model.

A. Multi-Target Disaggregation Problem
the two sets of input and target variables, respectively; each set with features derived from aggregate consumption and individual appliance ground truth consumption at specific time points (steps). The training dataset is defined as, where T is the total number of time steps. Then each instance of D consists of input vector of p independent variables (predictors) and output vector of q target variables at time t, which respectively, are defined as, .., p, i = 1, 2, ..., q and t = 1, 2, ..., T . Now, the task is to learn a multi-target disaggregation model M from instances of D such that a function f maps the vector x consisting of p aggregate feature values to a vector y consisting of q consumption values.
The trained model M can be used to predict the power consumption of all included appliances denoted by For training loss, the Mean-Squared-Error (MSE) of single appliance disaggregation can be adopted for multiple devices as follows: Setting i = 1 in Eq. (4) will transform a multi-target disaggregation problem to a single-target disaggregation problem.

B. CNN Multi-Target Regression Model
CNN was originally applied for image processing and computer vision where the input and each layer is multidimensional [34]. However, in this paper, NILM is formulated using 1-D CNN because the nature of data in load disaggregation is a uni-dimensional time series that keeps track of energy consumption of each appliance at a specific point. In line with this, every aggregate consumption feature at the specific time stamp is convolved with a kernel finding the relation between the features so that the part of aggregate consumption can be best mapped to the target appliance consumption value thereby finally producing optimum weights. Inspired by [35], two architectures of CNN are designed as follows. The first one will be referred to as CNN model-I (CNN1M), is built using two blocks of convolutional layers; the first block is followed with a max-pooling layer and the second block is followed with a global max-pooling layer. The fist convolutional block consists of three convolutional layers, each with 64 filters of size 2 and the second block also has three convolutional layers but each with 128 filters of size 2. The global max-pooling layer acts as a bottleneck layer to branch each appliance as a separate output with a dense layer of 512 neurons and a final output layer with one neuron. Another deeper model called CNN model-II was designed that has one more convolutional block of three layers each with 32 filters of size 2 and all the other parameters are the same as CNN Model-I. For CNN model-II, two variants were used: one for single target (CNN2S) and the other for multi-target (CNN2M), where S stands for Single and M for Multiple. For both architectures, the 'stride' and 'learning rate' are set to 1 and 0.001, respectively. Also, 'ReLU' is used as activation for all hidden layers and 'linear' is used as activation for all output layers. The architectures are shown in Fig. 2 and Fig. 3.

IV. EXPERIMENTAL WORK
For all experimental works, a machine with NVIDIA GEFORCE GPU model GTX 950 which has 4GB dedicated graphics and 16GB RAM was used. The implementation scripts were written using Python 3.7.3 and for CNN model, Keras which run on Tensorflow back-end was deployed. For all experiments, data normalization used was L2-norm implemented in scikit-learn preprocessing.normalize(). The sketch of overall workflow is given in Fig. 4.

A. Energy Disaggregation based Metrics
Five performance measures have been reported to evaluate the proposed models. In the following formulae, y 2) Normalized Disaggregation Error (NDE): It is the ratio between the sum of squared difference of estimated (disaggregated) energy and true energy, and sum of square of actual energy, and then taking square root of it. It is a slight variation of the metric used in [36]. Its mathematical formula is 3) Normalized Error in Assigned Power (NEAP): The sum of the absolute differences between the disaggregated power and the true energy consumption of appliance i in each time point t, divided by the total power consumption of appliance. In [37], the authors have shown that the error values greater than 1 is less representative towards the disaggregation performance and less explainable.
4) Energy-based Precision (P (E) ): It can be mathematically expressed as [38]: 5) Energy-based Recall (R (E) ): It can be mathematically expressed as [38]: It is a geometric mean between the precision and recall and can be mathematically expressed as: where, where m is the number of appliances. Equation (13) can be used to calculate the F1 score of individual appliance using Equation (11) and (12).

B. Datasets and Data Preparation
Two public datasets were deployed for experimenting the proposed models in this work. The first one is ENERTALK dataset [39] that has consumption data collected from 22 residences in Korea using 15Hz sampling rate for both individual and total (aggregate) consumption. One of the challenging and crucial aspects of energy disaggregation of this dataset is the alignment of target appliance and aggregate meter reading. Keeping this in mind, the daily consumption of all appliances including aggregate reading and daily consumption were concatenated in horizontal and vertical fashion where the data were resampled to one second. To handle missing data, one-step backward filling was used and the remaining missing values were subsequently removed. The other dataset is REDD [40] that has data collected from six residential buildings in USA. For this work, the low frequency data of six houses were used; where the mains consumption are sampled at 1Hz and all appliances are sampled at 1 3 Hz. These different sampling rates were aligned by using horizontal concatenation and downsampling to 3 seconds. Afterwards, the preprocessing with L2-norm was applied to the resultant dataframe of both datasets to normalize feature values.
The active power (P) and reactive power (Q) are the two original features available in ENERTALK dataset. Based on these two features, some other features were extracted based on power triangle shown in Fig. 5. For REDD dataset, only active power for two mains are available and are used as features.
Active Power (P), where V is voltage, I is current and Φ is the phase angle.
Reactive Power (Q), Apparent Power (S), Power Factor (PF), where P and S are as described earlier.
Besides the above features, six additional features are extracted: (1) difference power between S and P (DP sp ), (2) difference power between P and Q (DP pq ), (3) difference power between S and Q (DP sq ), (4) average of P , Q and S (P avg ), (5) sine of phase angle (Sin P h), and (6) tangent of phase angle (T an P h), for ENERTALK dataset.

C. Results for ENERTALK Dataset
For all the experiments with this dataset, 20% of the total data is kept for testing whereas the rest 80% is divided again in 80:20 ratio for training and validation. The batch size of 512 and number of epochs of 20 are used for training of the models with the loss function of total Mean Squared Error (MSE) for all appliances. After all the data preparation steps were applied as explained earlier, 1983102 instances were available. This experiment used S and PF as features besides P and Q. From the scores shown in Table I, it is observed that the highest F1-score of  2) Experiments on House12 Data: With this data of House12, the CNN model-II was validated using multi-state, continuously varying (nonlinear) consumption and always-on devices. It consists of 4 months aggregate and appliance level consumption data for two WMs, TV, RC, KR and R. A total of ten features mentioned in Section IV-B from 8639267 data samples were used for multi-and single target regression. All the experiments results are shown in Table II. As seen from the table, the refrigerator has the highest F1-score of 95.37% and 99.34% in multi-and single target setting, respectively. Both actual and predicted energy consumption of the refrigerator for multi-target setting are shown in Fig. 6. These results confirm that the CNN model-II is robust enough to successfully disaggregate energy with higher F1-score. It is noteworthy that the model successfully disaggregates energy of two of the three multi-state devices with the acceptable F1-scores along with other devices except WM2. For the multi-target setting, WM2 has the lowest F1-score of 43.47%, and for the single-target setting, WM1 has the lowest disaggregation score of 32.47%. Furthermore, if we analyze the results in terms of precision and recall for the multi-target setting, it is seen that the refrigerator has the highest scores of 97.59% and 93.24%, respectively. WM2 has the worst NDE and NEAP scores in the multi-target disaggregation setting.

D. Results for REDD Dataset
This experiment was executed on the total of 1099738 data samples resulted from combination of House1, House2 and House3 data of the REDD dataset for four common kitchen appliances (three multi-state devices and one continuous consumption device) such as Microwave (MW), Refrigerator (R), Dishwasher (DW) and Washer Dryer (WD). The CNN model-I was trained with 20 epochs and batch size of 512. The CNN model-II was modified according to the features and data used, i.e. a filter size of 1 and 16 epochs were used. Moreover, the MaxPooling layer after the first convolutional block was  E. Discussion and Analysis 1) ENERTALK House00: For the analysis of energy disaggregation performance of the appliances of this house, if multi-target setting is considered, it has the average F1-score of 56.23% and the individual highest F1-score 82.13% for WP which is also the highest in the single output setting. As for the scores of single output setting, it is seen that WP has the highest F1-score of 94% and MW has the lowest F1-score of 73.03%. The performance of other appliances according   (Table I).
2) ENERTALK House12: The main purpose of this experiment is to investigate how efficiently the CNN Model-II can disaggregate energy when there are simultaneously operating multiple mixed type of appliances. The experiment was conducted taking into consideration three multi-state devices (two WMs and one RC), one nonlinear device (TV) and two continuous consumption devices (R and KR). For this, the multi-target setting was considered and the results are shown in Table II. As seen from the table, the highest F1-score of 95.37% is for refrigerator and 75% is the average of all participating appliances. For the multi-state device (RC), F1score is as high as 76.58%. This experiment also confirms that CNN mode-II can be used for future prototype for energy disaggregation of simultaneous operating of multiple multistate devices. It should be also noted that our point-to-point multi-target CNN model can disaggregate the nonlinear device with F1-score of 81.13% in the presence of multiple multi-state devices which are also hard to disaggregate.

3) Disaggregation Performance Across ENERTALK
Houses: For comparisons, please refer to Table I and  Table II. It is seen that House00 has superior overall average performance than House12, for single target model. But in terms of multi-target setting, House12 has superior overall performance than House00. When compared the performance of appliances that are common (TV, KR, R, WM and RC) in all the two houses on the basis of appliance by appliance performance in single target setting, TV and RC have higher performance in House00 than House12, and WM1, KR and R have higher performance in House12 than House00. For multi-target setting, all appliances clearly outperform in House12 than House00. That is the reason that the overall performance of House12 is better than House00. Please note that the above analyses are based on F1-scores. To explore the differences, the data patterns of all appliances for all houses were analyzed. In that vein, we look into the data statistics in the houses and from the observation, it can generally be concluded that the usage patterns of individual appliance and the duration of usage have an impact on the performance. This is based on the fact that the usage pattern and duration have the direct impact on the data distribution and range of data which in turn has remarkable influence on the performance. Neural network generally maps inputs to outputs based on certain mathematical operations, and penalizing the model for wrong mapping by adjusting the weights. This adjusting of weights become difficult for the model when data are sparsely distributed with most of the feature values with zero and some with very large values. This task becomes more difficult when the model has to extract small portion out from large values (blind-source separation) which is the case with the energy disaggregation. This point is particularly more applicable for the appliances that are less frequently used and disaggregation is done on event-less fashion. In addition, form the analyse of the average performance of multi-target regression models, it is observed that the highest F1-score of 75% is achieved by House12 and the lowest F1-score of 56.23% is achieved by House00. It is noteworthy that the reason of higher overall F1-score of House12. This may be due to the fact that when targets are correlated with one another, the multi-target produce better results. Again the correlation between the targets depend on data distribution which again is as mention above, depend on the usage pattern and duration. The study of correlation among the targets is beyond the scope this work. It is emphasized that in terms of NDE and NEAP also, the multi-target model in House12 performs better than both single and multi-target models in House00. Moreover, if we analyze the disaggregation performance of non-linear device (TV) in all two houses for multiple output regression model, the highest F1-score achieved is 81.13% for House12.

4) REDD Dataset:
In combined data of House1, 2 and 3, the CNN1M has the highest individual F1-score of 95.06% with R and the overall average F1-score is 83.32%. In terms of individual scores CNN Model-I (CNN1M) has the best performance. This combined data experiment is further elaborated in IV-F.

F. Disaggregation Performance and Model Generalization
In general, according to our detail observation and analysis, the disaggregation performance of multi-target model is lower than single target model because in the single target setting, the weight is to adjust to multiple targets of different statistical differences using common features and parameter sharing. However, the multi-target models require less training resources and time than the single target model. On the other hand, the single target model can be customized for specific appliance but at the expense of extra training resources and time.
Generalization and robustness of single and multi-target disaggregation models are tested using CNN Model-II and CNN Model-I for ENERTALK and REDD dataset, respectively. To check the robustness of multi-target regression model, the model trained for four appliances (MW, R, DW, WD) on the combined data of House1+2+3 in REDD dataset was used to predict the energy of selected appliances of House 4, 5 and 6 of the same dataset. This is more challenging than the single output setting because there are ghost (unknown) appliance and miscellaneous outlet in House4 and House6. The scores and estimations are shown in Table IV   of cross-appliance disaggregation, it is seen that the overall average F1-score is 73.99%. Cross-appliance disaggregation is using the model trained on one specific appliance for disaggregation of other different appliances. From Table VI, it is noticed that the model trained on KR of House21 data has successfully disaggregated energy of most of the appliances in House00, especially, nonlinear device TV with F1-score of 88.83% (see Table VI) and permanent consumer devices that have different operating states such as KR, R with F1-score of 69.63% and 96.40, respectively (see Table VI). But in terms of recall score, all seven devices have above 92% performance. This emphasizes that the direct point-to-point disaggregation models are powerful in learning granular information about consumption. Fig. 8 shows the disaggregation analysis of four selected appliances from House00 of ENERTALK. The scores are reported for three different scenarios: (1) The model trained and tested on House00 (CNN2M), (2) The model trained on REDD combined data (CNN1M) and tested on House00 (ENERTALK) and (3) The model trained on KR (CNN2S) of House21 (ENERTALK) and tested on House00 (ENERTALK). According to the scores in this figure, except MW, the other three appliances have higher disaggregation F1-scores in both scenario 2 and 3 than scenario 1. From previous analysis of generalization and robustness capability of single and multi-target point-to-point disaggregation, it can be emphasized that cross-appliance disaggregation and crossdomain disaggregation have very good performance.

V. CONCLUSION
This work investigated the NILM problem in terms of training disaggregation algorithms in multiple and single target regression setting, i.e. energy disaggregation of simultaneously operating multiple devices of same types. We used EN-ERTALK dataset, which is the latest publicly available energy dataset containing records of 22 Korean Houses. We also used REDD dataset, which is the first released energy dataset. We deployed state-of-the-art deep learning algorithms based on CNN and evaluated the disaggregation performance using five energy-based performance metrics. In NILM, aligning the consumption of the target appliances with the aggregate consumption is more challenging. To achieve this, for EN-ERTALK, we used vertical and horizontal concatenation of appliance and dates, with mean resambling of data in one second, and for REDD, we used horizontal concatenation. For the experiments, dataframe was divided into training, validation and testing sets. To reflect better generalization capability of the models, the actual prediction is made on the data that was not seen during the training. In ENERTALK, CNN model-II has the highest superior disaggregation performance with F1-score of 95.37% for individual appliance and 75.00% for overall average performance across the dataset for multi-target disaggregation model. Moreover, in ENERTALK, the proposed models can disaggregate the energy of non-linear devices with higher F1-score of 81.13% in multi-target model. The generalization and robustness of the models were also validated by training the model for one appliance and testing it for different appliances in different houses, which shows the knowledge transferability in NILM using the proposed direct point-topoint disaggregation models. As an overall conclusion, the proposed point-to-point regression models have demonstrated computational efficiency and disaggregation effectiveness with superior scores. As future work, other techniques such as energy disaggregation using wavelets can be explored.