Machine Learning based Electromigration-aware Scheduler for Multi-core Processors

Jagadeesh Kumar P, Mini M G
Department of Electronics Engineering
Government Model Engineering College
Thirkkakara, Cochin 21
India

Abstract—The rising performance demands in modern technology devices see the need to pack more functionality per area and are made possible with the advent of technology scaling. The extremely down-scaled, high-density processors used in such technology devices functioning at high frequencies and greater temperatures expedite various aging effects which impact the reliable lifetime of computing systems. Electromigration is considered to be an important intrinsic aging effect that reduces the useful lifetime of modern microprocessors. The objective of this work is to use machine learning methods to develop an electromigration-aware scheduler for assigning workloads to cores based on reliability and performance requirements. Aging estimation of the processor cores is performed based on the proposed computationally efficient and accurate regression-based thermal prediction models. According to experimental findings, the suggested technique can significantly extend the lifetime of multi-core architectures while allowing performance to degrade gracefully. The maximum error in the prediction of the lifetime of the cores using the proposed methodology is estimated to be 2.85%.

Keywords—Electromigration aware scheduler; useful lifetime; multi-core processor reliability; machine learning model

I. INTRODUCTION

The computing requirements of modern embedded systems in application areas including automotive, storage, networking, and 5G, among others, necessitate the use of high-performance processors. To deliver higher functionality per area demand for such applications, manufacturers create dense integrated multi-core processor chips operating at higher speeds [1]. Increased power density and operating temperature of processor cores can expedite aging phenomena including electromigration, lowering the processor's quality and reliability [2]. Runtime task execution management is a pressing research topic to minimize such aging effects in processor cores [2], [3], [4]. A rise of 10 - 15°C in the operating temperature could reduce the processor lifespan to half [5]. The lifetime of multi-core processors can be improved by estimating the aging effects of tasks that are ready for execution. The workload assignment to cores and frequency adjustments can be done at runtime to minimize the effect of aging of the cores. Such strategies need to be computationally precise and quick for real-time implementation.

Transistor aging is considered an important phenomenon that affects the reliability and lifetime of modern CMOS integrated circuits [6]. Time-Dependent Dielectric Breakdown (TDBB), Hot Carrier Injection (HCI), and Negative Bias Temperature Instability (NBTI) are the key aging effects that challenge the reliable operation of modern integrated circuits. The intrinsic effects, Electromigration (EM) and Stress migration (SM), which are due to the interconnect aging, are also significant in present CMOS technology devices [7]. New methods for investigating and predicting degradation effects are important for enhancing the lifetime reliability of modern CMOS technology devices.

The suggested method to increase the multi-core processors' lifespan reliability is presented in two sections. In the first section, machine learning models based on regression to predict the processor cores' steady state temperature for the incoming jobs are proposed. A fine-grained approach is followed in this work for predicting the thermal properties of the cores' processing components. The fine-grained approach is more suitable for understanding the localized characteristics of the cores. In the second section, a scheme for mitigating the aging effects of processor cores by implementing a runtime frequency control policy is presented, which takes the temperature estimates of the proposed machine learning models as input. The operating speeds of the cores are decided to take into account the aging effects and the performance requirements. Our experimental results show that the proposed aging-aware scheduler, which is guided by the developed machine learning models could predict the core's lifetime with a maximum error of 2.85%. The average error in temperature estimation is assessed, for each of the proposed models, and the maximum value is observed as 0.492°C.

The rest of the paper's contents are arranged in the order mentioned below. A number of the significant efforts on thermal prediction and the regulation of thermal challenges of the processor cores are discussed in Section II. Linear regression and polynomial regression schemes for the estimation of the core temperature and the concept of the proposed aging-aware scheduler for enhancing processor core lifetime reliability are presented in Section III. An analysis of the results obtained for the proposed schemes are mentioned in Section IV. Finally, Section V concludes the work and mentions the possibilities for further research in the field.

II. LITERATURE REVIEW

The increasing performance demands of current technology gadgets have prompted various studies to concentrate on optimal job scheduling methods in multi-core systems.
operating under real-time constraints. A significant body of published works focuses on techniques for a better performance-power trade-off of multi-core processors [8], [9], [10]. The effect of using different styles of coding for balancing between performance and energy consumption of the processing cores is presented in [11]. A core allocation technique to lower the energy usage of mobile devices by engaging the LITTLE cores to a maximum extent and ensuring the performance of the device is proposed in [12]. The execution time of complex programs can be minimized and thus the performance of the applications can be improved by the use of today’s high-performance computing systems employing multi-core processors. Parallelization schemes [13] can be employed for converting the serial execution of programs into a hybrid parallel mode to take the advantage of the processing capability of multi-core processors. Reliability-aware scheduling techniques [14] can be used to reduce soft errors of heterogeneous chip-multiprocessors. The performance and power efficiency in heterogeneous multi-core CPUs can be enhanced with smart workload schedulers [15].

An aging-aware scheduler needs an accurate thermal estimate of the various logical components to make run-time decisions when a workload gets executed in a processor core. Several research works have looked into the feasibility of using model-based methodologies to determine the CPU core temperature characteristics [16], [17], [18]. An architectural level thermal behavioral modeling technique known as Thersmid [19] builds temperature models from the observed temperature and power statistics. Multiple scheduling schemes based on the efficient and simple thermal model are used [20] for managing the operations of homogeneous processor platforms. Thermal Estimation Accelerator (TEA) [21], a processing element level monitoring scheme for the temperature at runtime using hardware accelerators can serve as a benchmark for Dynamic Thermal Management (DTM) methods.

For the development of thermal models, it is required to estimate thermal profiles of the benchmark tasks executing on selected cores at defined operating frequencies. HotSpot [22] is a popular tool used in the temperature estimation of processor cores and is helpful in architectural studies. Various logical components in the processor architecture are represented as its equivalent thermal resistor and capacitor values along with the thermal package information [23]. Information about the floor design and power estimations for the logical components are supplied to HotSpot to determine the temperature profile. The power estimation of the logical components can be done with the tool McPAT (Multi-core Power, Area, and Timing) [24]. Based on high-level data, such as the frequency of operation of the core, McPAT can predict the architectural level power usage of the processor core containing caches and memory controllers. A McPAT-monolithic framework is presented in [25] for the architecture modeling of 3-D hybrid monolithic multi-core systems. The work in [26] proposes a micro-architectural framework to estimate the performance and energy consumption of cores in a multi-core processor. A detailed validation of McPAT’s power models done with the help of a toolchain used in industrial practice is presented in [27]. In this study, McPAT is utilized to calculate the dynamic power of the core’s logical subsystems. McPAT requires the operation statistics of the applications and micro-architectural characteristics as its inputs to calculate the power consumption of every system component. The operation statistics of the tasks can be evaluated using the gem5 simulator [28]. CPU models with different types of memory configurations and cache organizations can be defined for the analysis. The current generation of widely used commercial Instruction Set Architectures (ISAs) including ARM and x86 are supported by gem5. A significant number of these simulators are utilized in the relevant fields of research since they are useful in analyzing the performance and power consumption of various processor models and can be used to validate various design options. A study of the basics of several multiprocessor simulation methodologies and a summary of the correctness of six architecture simulators including gem5 are presented in [29]. Gem5-X, a framework for system-level simulation based on gem5 [30] may be employed to assess the potential benefits of the architectural extensions for many image processing-related applications.

Computer architecture simulators can be used to closely examine the execution properties of applications that run in a processing core. Regression-based models with high computational efficiency and accuracy may well be constructed by relating the thermal figures estimated with HotSpot to the application characteristics estimated using gem5. Representative applications in open-source benchmark MiBench [31] can be used to develop and validate thermal models of typical workloads running in embedded processors. MiBench presents a collection of 35 embedded programs organized in six categories, each of which focuses on a specific segment of the embedded market. Workloads belonging to the application areas: network, security, telecommunication, and consumer from MiBench suite are used in this work. The characteristics of the tasks jpeg encode/decode, 32-bit Cyclic Redundancy Check (CRC), Dijkstra, and Secure Hash Algorithm (SHA), representing the mentioned application areas, running on x86 architecture-based processing cores are analyzed using gem5 and are used to train the regression models. The MiBench suite’s consumer device benchmarks are designed to simulate the consumer device applications found in products including Personal Digital Assistants (PDAs), scanners, and digital cameras. This category largely focuses on image processing, and one of the representative image compression and decompression technologies is JPEG encoding/decoding. The telecommunications category of applications stands close to consumer applications because of the increased demand for consumer devices with wireless communication capability. Cyclic redundancy checks (CRC) are often used in data transmission for the detection of errors. A 32-bit cyclic redundancy check is performed on a sound file from the adaptive differential pulse code modulation benchmark as part of the CRC32 test.

Devices such as switches and routers have embedded applications that fall under the network category. Finding the shortest path through a graph is one of the methods used to illustrate the networking category. The Dijkstra benchmark determines the shortest route across each set of nodes of a graph by repeatedly applying Dijkstra’s algorithm. In
applications related to e-commerce, data security is a key factor. Security applications frequently employ a variety of hashing, encryption, and decryption algorithms. The Secure Hash Algorithm (SHA) is frequently used to create digital signatures and transfer cryptographic keys in a secure manner. The secure hashing method, SHA benchmark in MiBench, generates a 160-bit code for an input.

Estimating the temperature and power at runtime using the tools HotSpot and McPAT is computationally expensive and limits their implementation in real-time schedulers. A regression-based model can be developed based on the thermal and power profiles of the workloads estimated using HotSpot and McPAT. Such trained and created regression-based models can quickly and accurately estimate the temperature of the logical components, and they are suitable for the successful application of aging-aware scheduling methods. A method for constructing a compact machine learning-based thermal prediction system appropriate for fast decision-making is presented in [32]. A temperature control strategy based on machine learning to determine the appropriate core frequency and encoder configuration for High-Efficiency Video Coding (HEVC) is proposed in [33]. A machine learning and simulation-based approach can be employed [34] to estimate the temperature map of a chip using the power consumption, utilization of the core, and recorded sensor temperatures. As discussed, a significant amount of the related research works published in recent years propose different approaches for determining the critical parameters associated with the workloads using software tools. Many of these works emphasize the increasing need for runtime techniques to regulate the processor temperature by changing its operating behavior. Such recent research works propose many techniques for better performance-power trade-offs. But, in our understanding, the use of scheduling techniques to reduce aging with the help of computationally efficient runtime temperature estimation methods for high-performance applications running on multi-core processors is a largely unexplored topic.

III. PROPOSED WORK

The first part of the proposed work deals with thermal profile modeling problems. Here, a regression-based model is employed to predict the steady-state temperature of the processing elements in a multi-core architecture. The temperature estimation model is driven by the properties of the workload. In the second part, an aging-aware scheduler is presented, whose scheduling activities are based on the thermal estimates of the regression model proposed in the first part of the work. The scheduler utilizes the thermal estimations and the performance need of the workloads taking into account the operating speed of the cores and trying to minimize the effect of processor aging.

A. Development of Regression-based Models

Regression analysis is one of the powerful multivariate statistical techniques to infer and form a functional relationship in a population [35]. In this work, regression analysis is used to relate the workload characteristics with the thermal effects of the processing elements. Workloads in embedded applications are suitable for implementing the suggested scheme because they typically have high levels of predictability in their execution characteristics, such as the instruction execution behavior, memory operations, and the kind of information processing. The execution patterns of the MiBench applications running on a Hexa-deca homogenous multi-core architecture are analyzed using the gem5 simulator. The gem5 can be set up to operate in either Syscall Emulation (SE) mode or Full System (FS) mode. In SE mode, gem5 can simulate system calls made by applications. When configured in the FS mode, gem5 creates a bare-metal context for executing an operating system. In this work, gem5 is configured in the SE mode for analyzing the patterns of execution of the benchmark applications. The cores of the multi-core processor are selected as having x86 architecture. The cache hierarchy is defined to be of two levels, with level 1 private cache and level 2 shared cache. A subset $j$ of the workload parameters is employed in this study, which directly affects the thermal profiles of the various logical components of the core. i.e., $j \in \{W\}$, where $W$ represents the complete set of workload characteristics analyzed using gem5.

The above-mentioned workloads from the MiBench suite, which cover several embedded application areas, are used for model development and analysis. The selected applications are analyzed using the gem5 simulation tool and the characteristics are determined. The power usage of the various functional parts of the CPU architecture is calculated using McPAT. The thermal model of HotSpot is driven by the estimated power traces, and the chip and package characteristics. The configuration file specifies the parameters of the processor core for the HotSpot tool, which are shown in Table I.

Linear Regression (LR) and Polynomial Regression (PR) are the two regression models used in this work for estimating the thermal values of the functional elements of the CPU. In the Linear Regression (LR) model, the steady-state temperature of a functional element is represented as a weighted sum of the selected workload characteristics. In the LR model, the predicted temperature is represented as in (1).

\[
y(w, x) = w_1 x_1 + \cdots + w_p x_p + b
\]  

<table>
<thead>
<tr>
<th>TABLE I</th>
<th>HOTSPOT CONFIGURATION PARAMETERS</th>
</tr>
</thead>
<tbody>
<tr>
<td>HotSpot Parameters</td>
<td>Value</td>
</tr>
<tr>
<td>Thickness of the chip (in meters)</td>
<td>0.00015</td>
</tr>
<tr>
<td>Specific heat of Silicon (in J/(m²-K))</td>
<td>1.75 x 10e6</td>
</tr>
<tr>
<td>Thermal conductivity of Silicon (in W/(m-K))</td>
<td>100.0</td>
</tr>
<tr>
<td>Resistance (Convection) (in K/W)</td>
<td>0.1</td>
</tr>
<tr>
<td>Capacitance (Convection) in J/K (Heat sink)</td>
<td>140.4</td>
</tr>
<tr>
<td>Thickness (Heatsink) (in meters)</td>
<td>0.0069</td>
</tr>
<tr>
<td>Heatsink side (in meters)</td>
<td>0.06</td>
</tr>
<tr>
<td>Thermal conductivity of Heatsink (in W/(m-K))</td>
<td>400</td>
</tr>
<tr>
<td>Specific heat (Heatsink) (in J/(m²-K))</td>
<td>3.55 x 10e6</td>
</tr>
<tr>
<td>Side (Heat spreader) (in meters)</td>
<td>0.03</td>
</tr>
<tr>
<td>Thickness (Heat spreader) (in meters)</td>
<td>0.001</td>
</tr>
<tr>
<td>Thermal conductivity (Heat spreader) in W/(m-K)</td>
<td>400</td>
</tr>
</tbody>
</table>
where $X = (x_1, x_2, \ldots, x_p)$ represents the features used to train the models, $w_1, w_2, \ldots, w_p$ are the coefficients and $b$ represents the bias. To reduce the sum of the squared estimate of errors between the measured values of the data, a linear regression fits a linear model using weights $W = (w_1, w_2, \ldots, w_p)$. The loss function which indicates the adequacy of the fit is given by (2).

$$L(\hat{y}, t) = \frac{1}{2}(\hat{y} - t)^2$$

where $\hat{y}$, $t$, and $(\hat{y} - t)$ represent the prediction, target, and residual values respectively. The coefficients $w_1, w_2, \ldots, w_p$, and $b$ are selected in a manner to reduce the loss function as represented in (3) and (4).

$$E(w_1, w_2, \ldots, w_p, b) = \frac{1}{n} \sum_{i=1}^{N} L(y^i, t^i)$$

$$= \frac{1}{2N} \sum_{i=1}^{N} (\sum j w_j x_j^i + b - t^i)^2$$

In this study, the Python-based Scikit-learn machine learning package [36] is used, which is regarded as an effective and reliable tool for predictive data analysis.

In the development of the Polynomial Regression (PR) models of steady state temperature, polynomial regression needs to be performed on the data set, which are the workload characteristics, to fit a polynomial equation to it. This work extends linear regression by building polynomial features from the coefficients. For instance, the features in the second-order polynomials are utilized to fit a paraboloid to the data rather than a plane, giving the model represented in (5):

$$\hat{y}(w, x) = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + w_4 x_1^2 + w_5 x_2^2$$

The above model can be considered as a linear model creating a set of features given in (6).

$$z = [x_1, x_2, x_1 x_2, x_1^2, x_2^2]$$

This renaming of the data allows for the formulation of the problem as in (7).

$$\hat{y}(w, x) = w_0 + w_1 z_1 + w_2 z_2 + w_3 z_3 + w_4 z_4 + w_5 z_5$$

The derived polynomial regression belongs to a similar category of linear models as those previously mentioned, which can be evaluated using the same methods. The model is flexible enough to accommodate a broader range of data by taking into account the linear fits in a higher-dimensional space constructed with these basis functions. To transform an input data matrix into a new data matrix of a specific degree, the polynomial properties converter in the Scikit-learn Python machine learning toolkit is used. The parameters of X have been converted from $[x_1, x_2, \ldots]$ to $[x_1, x_2, x_1 x_2, x_1^2, x_2^2]$ and are now applicable to any linear model.

Linear and polynomial regression are used to model the temperature profiles of the processing units of the core. Fig. 1 represents the architecture of the processor core considered in this work. The models thus developed are used to predict the steady-state temperature of the various processing elements in the architecture. The regression models presented in this work are most appropriate for multi-core systems executing embedded tasks since their task characteristics are often highly predictable. The proposed model-based prediction logic is computationally efficient and is more suitable for real-time thermal estimation. The estimated data can be used by an aging-aware scheduler to determine the best course of action for controlling the temperature below threshold levels while maintaining performance goals and extending the useful lifetime of processor cores.

**B. The Aging-aware Scheduler Design**

With the advancements in integrated circuit design and fabrication technology, more transistors can now fit into a square millimeter of a silicon wafer. When clocked at higher speeds to address the execution constraints, such densely integrated processors have increased power and heat dissipation, which has a negative impact on lifetime dependability. The major challenge is to develop algorithms that can forecast device-level degradation behavior based on the application features. This work presents a strategy for extending the useful lifetime of multi-core processors. A fine-grained approach is followed for estimating the aging effects of the various processing components of the multi-core processor. The data available for the recent industrial-grade embedded processors manufactured by Texas Instruments [37] is taken as the reference. Referring to [37], the operational lifespan of the semiconductor core is taken as ten years, when the junction temperature $T_J$ is 105°C. The crucial factor affecting silicon lifespan is the junction temperature $T_J$ when the circuit is functioning within the limits of voltage and frequency stated in the data sheet.

Due to continual operation at high temperatures, wear-out processes begin to develop in semiconductor products during their useful life. The wear-out processes commonly considered in the design of integrated circuits include Gate Oxide Integrity (GOI) [38], Electromigration (EM) [39], [40], and Time-Dependent Dielectric Breakdown (TDDB) [41]. Additionally, the lifespan of the present semiconductor devices is affected by processes such as Negative Bias Temperature Instability (NBTI) [42] and Channel Hot Carriers (CHC) [43]. Among these, electromigration is a major aging effect in present integrated circuits. The primary factor that influences electromigration is the junction temperature $T_J$. The junction temperature $T_J$ is thus the critical factor affecting silicon lifespan under electrical bias when the chip is operating within the prescribed data sheet conditions, and the lifetime can be represented using an Acceleration Factor (AF). The Arrhenius equation, which links the chemical reaction rate to temperature,
can be used to analyze the damage that occurs in electronic devices over time for various working temperatures. The accelerating factor (AF) [37] can be represented as in (8).

\[
AF = \exp\left(\frac{E_a}{K} \left(\frac{1}{T_{\text{use}}} - \frac{1}{T_{\text{stress}}} \right)\right)
\]

(8)

where AF represents the Acceleration Factor, \(E_a\) is the Activation energy in eV, \(K\) is the Boltzmann’s constant \((8.63 \times 10^{-5} \text{ eV/K})\), \(T_{\text{use}}\) is the use temperature in Kelvin and \(T_{\text{stress}}\) is the stress temperature in Kelvin.

This work proposes a methodology for improving the useful lifetime of the processor cores by considering electromigration as the primary failure mechanism. The aging-aware scheduler estimates the temperature of the processing elements of the core when a job is ready, by using the workload characteristics as input to the developed models. Our goal is to allocate the tasks to cores based on the aforementioned inputs to maximize chip lifetime while meeting the performance requirement bounds. For the reliability-aware scheduler design, two types of frequency adaptations are proposed in this work, where the processor clock can either be controlled in discrete values or in a continuous manner. Based on the workload characteristics, the performance requirement, and the end system reliability requirement, the aging-aware scheduler selects a core from the pool of feasible cores and decides its frequency of operation. The architecture of the proposed aging-aware scheduler is represented in Fig. 2.

![The Aging-aware Scheduler.](image)

The proposed aging-aware scheduler supports two types of clocking schemes for the processor core, i) cores whose operating frequencies can be selected from a discrete set of values and ii) cores whose frequency of operation can be varied continuously within a defined range. For the discrete frequency selection scheme, the scheduler will select the maximum possible operating frequency from a set of possible values, such that the lifetime reliability requirement can be met. In the continuous frequency selection scheme, the scheduler will have fine control of the operating frequency. In this case, the scheduler will use interpolate functions to estimate the closest frequency of operation that meets the lifetime reliability requirement with graceful performance degradation. For the specified workload characteristics, the data set includes known frequency and related temperature values of the logical components of the core.

Linear Interpolation (LI) and Spline Interpolation (SI) are the two types of interpolation schemes attempted in this work. To build a function utilizing fixed frequency datasets for linear interpolation, the interp1d class from the "scipy.interpolate" package is utilized. SciPy [44] is a free and open-source Python library used for scientific and technical computing. When using linear interpolation for curve fitting, additional data sets are generated inside the boundaries of a finite collection of existing datasets using linear polynomials. In applications where smoothing is necessary, the piecewise-defined spline function is employed instead of polynomial interpolation because it produces good results for low-degree polynomials while minimizing Runge's phenomena for higher degrees.

Algorithm I illustrate the aging-aware scheduler's pseudo code. The scheduler, based on the workload characteristics, forms a feasible set of cores \(\{C_1, ... C_m\}\), from the set of available cores during the scheduling interval. The temperature patterns of the logical components are predicted using the prediction models developed in the first part of this work, and using these parameters, the aging factor AF of each core \(C_i\) related to the lifetime reliability is determined. Based on the workload's performance requirements, the lifetime reliability requirement of the cores, and the type of frequency control supported by the architecture, i.e., either a discrete frequency control or fine frequency control, the frequency of operation of the core is determined.

Algorithm I. Aging Aware Core Selection

<table>
<thead>
<tr>
<th>Inputs: workload characteristics, performance constraints, lifetime reliability requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td>1. while (true) {</td>
</tr>
<tr>
<td>2. foreach schedule window Ts, perform {</td>
</tr>
<tr>
<td>3. foreach task ({T_1, T_2...T_n}) in the process queue Q perform {</td>
</tr>
<tr>
<td>a. analyze the characteristics of (T_i) and form the feasible set of cores ({C_1, ..., C_m}).</td>
</tr>
<tr>
<td>b. estimate temperature characteristics of the processing elements ({L_1, ..., L_p}) of the feasible cores (C_i).</td>
</tr>
<tr>
<td>c. estimate aging factor AF of each core (C_i) related to the lifetime reliability.</td>
</tr>
<tr>
<td>d. select a core based on the lifetime reliability and performance requirements.</td>
</tr>
<tr>
<td>e. determine the core’s operating frequency (f):</td>
</tr>
<tr>
<td>i. if discrete frequency control</td>
</tr>
<tr>
<td>(f = f_i) where (f_i \in {f_1, ..., f_i})</td>
</tr>
<tr>
<td>(\text{else } f = f_i) where (f_{\text{min}} \leq f_i \leq f_{\text{max}})</td>
</tr>
<tr>
<td>4. } end for each task</td>
</tr>
<tr>
<td>5. } end for each schedule window</td>
</tr>
<tr>
<td>6. } end while</td>
</tr>
<tr>
<td>Outputs: i) Mapping of the tasks ({T_1, ..., T_n}) to cores ({C_1, ,..., C_m}) if (n \leq m); stall the remaining (n-m) tasks if (n &gt; m), ii) frequency of operation of the selected cores.</td>
</tr>
</tbody>
</table>

The regression models for predictive modeling need to be updated if the error in prediction is more than a threshold value because of the change in the data. The accuracy of the prediction logic is verified periodically with an updating interval. The updating interval, a customizable parameter, is kept substantially greater than the scheduling interval to reduce the computational overhead for the assessment of the actual temperature levels. The model updating process is represented in Algorithm II.
Algorithm II. Model Updation

Inputs: workload characteristics, core id, maximum allowable error in prediction (threshold)
1. while (true) {
2.   for each updating interval, T_u do {
3.     for each core \( C_i \)} do {
4.       compute the actual temperature of the processing elements (with on-chip sensors and/or software tools).
5.     }//end for each core
6.     if (prediction error > threshold) {
7.       update the prediction models.
8.     }//end if
9.   }//end for each updating interval …
10. }//end while

Output: updated prediction models.

IV. RESULTS AND DISCUSSION

The experiments of this research work were conducted using the application benchmarks belonging to consumer, telecommunications, network, and security categories taken from the well-known MiBench suite as represented in Table II.

Multiple instances created by altering the data set handled by the tasks are used to model the temperature characteristics of the logical components. As a result, \( n \) versions of a task \( w \) are created and evaluated its execution on \( m \) number of cores of the multi-core processor. Using new instances of the workloads derived from MiBench, the per-logical unit temperature is estimated to evaluate the developed models. The integer ALU, integer register file, floating-point unit, floating point register file, Data Translation Lookaside Buffer (DTLB), Instruction Translation Lookaside Buffer (ITLB), and load/store queue are characterized as the key power-consuming processing elements of the cores. Fig. 3 illustrates the validation of the developed models for the task CRC. The Steady State Temperature (SST) of the logical elements which are having significant power consumption is evaluated using the developed LR and PR models and is compared with the values estimated using the tool HotSpot. The operating frequency of the core is defined as 3400 MHz. The difference in estimation using the two methods is represented as a percentage error and is shown in Fig. 4. With a maximum prediction error of 0.008 percent for linear regression and 0.826 percent for polynomial regression-based models, the suggested regression-based models exhibit good consistency with HotSpot.

### Table II. Representative Benchmarks

<table>
<thead>
<tr>
<th>Category</th>
<th>MiBench Benchmark</th>
</tr>
</thead>
<tbody>
<tr>
<td>Consumer</td>
<td>JPEG encoding/decoding (cjjpeg/djpeg)</td>
</tr>
<tr>
<td>Telecom</td>
<td>Cyclic Redundancy Checks (CRC)</td>
</tr>
<tr>
<td>Network</td>
<td>Dijkstra</td>
</tr>
<tr>
<td>Security</td>
<td>Secure Hash Algorithm (SHA)</td>
</tr>
</tbody>
</table>

Fig. 3. SST of the Processing Elements Estimated using HotSpot, Linear Regression, and Polynomial Regression Models.

Fig. 4. Differences in the Estimation of SST of the Processing Elements are represented as a Percentage Error.

Fig. 5 illustrates the validation of the thermal models of the logical component integer ALU. The tasks used in the analysis are executed in cores set to operate at a clock frequency of 3400 MHz. The percentage error in the estimation of temperature is shown in Fig. 6. Simulation results show that the model is comparable to the HotSpot model in estimating the thermal profile of the logical components of the processor core.

Fig. 5. Steady State Temperature of Integer ALU Estimated using HotSpot, Linear Regression, and Polynomial Regression Models.
The proposed aging-aware scheduler uses predicted SST of the logical components to estimate the degradation in the lifetime of the processor core during the scheduling of workloads. The lifetime of the critical components is represented using the Acceleration Factor (AF) while considering the junction temperature (Tj) of silicon as the primary variable impacting the lifetime of the cores. Fig. 7 shows the validation of AFs of the principal power-consuming processing elements of the core for the task djpeg where the AFs are computed using the SST values estimated using HotSpot, LR, and PR models. Fig. 8 represents the validation of the AF estimation of int_ALU for the different tasks. The processor core is set to operate at a frequency of 3400 MHz.

The proposed aging-aware scheduler estimates the temperature profiles of the logical components of the feasible cores based on the characteristics of the task in the service queue and decides the appropriate frequency of operation of the core. The lifetime reliability requirement of the core and the workload’s performance requirements are used to determine the core operating frequency. In this work, the performance of the workloads running on a core clocked at 4000 MHz is taken as the reference, and the performance of the cores executing workloads with different operating conditions is represented relative to the reference performance. Fig. 9 represents the lifetime improvement of the cores and the corresponding relative performance degradation of the tasks when implementing a discrete frequency control scheme. In this case, the lifetime reliability requirement is taken as ten years, corresponding to an AF of 1. The core operating frequency is selected from the set {4GHz, 3.4GHz, 2.4GHz, 1.4GHz} based on the lifetime reliability requirement. Fig. 10 shows the corresponding values for AF = 1.5. The scheduler uses the thermal profile of the logical component having the highest value, for the estimation of AF of the core. This work assumes that the CPU core is designed to function for ten years when the junction temperature is at 105°C.

When the reliability-aware scheduler functions in the continuous frequency mode, the core operating frequency can assume a value within the defined range of 1400MHz to 4000 MHz. Linear Interpolation (LI) and Spline Interpolation (SI) are the two interpolation schemes employed, for estimating the closest frequency required for meeting the reliability and performance requirements. The frequency values for interpolation are determined based on the estimated temperature of the cores. Thermal estimation is performed using HotSpot, the standard method, and with the proposed LR and PR methods. The frequencies estimated by the scheduler for these temperature values are shown in Table III. The frequencies are determined for meeting the lifetime reliability requirement of ten years (corresponding AF = 1). Lifetime reliability of ten years corresponds to a threshold value of the silicon junction temperature, Tquad of 105 Degree Celsius.

**Fig. 6.** Differences in the Estimation of Steady-state Temperature of Integer ALU.

**Fig. 7.** Acceleration Factor of the Processing Elements Estimated using HotSpot, Linear Regression, and Polynomial Regression Models.

**Fig. 8.** Acceleration Factor of the Tasks Estimated using HotSpot, Linear Regression, and Polynomial Regression Models.

**Fig. 9.** Useful Lifetime Improvement and Relative Performance of Tasks (Discrete Frequency Control with AF = 1).
TABLE III. CORE OPERATING FREQUENCIES FOR AF=1

<table>
<thead>
<tr>
<th>Case</th>
<th>Case 1</th>
<th>Case 2</th>
<th>Case 3</th>
<th>Case 4</th>
<th>Case 5</th>
<th>Case 6</th>
</tr>
</thead>
<tbody>
<tr>
<td>Temp Estimation scheme</td>
<td>Hot-Spot</td>
<td>Hot-Spot</td>
<td>LR</td>
<td>LR</td>
<td>PR</td>
<td>PR</td>
</tr>
<tr>
<td>Freq estimation scheme</td>
<td>LI</td>
<td>SI</td>
<td>LI</td>
<td>SI</td>
<td>LI</td>
<td>SI</td>
</tr>
</tbody>
</table>

Validation of the proposed scheme is carried out for the case where there is a lifetime reliability requirement of ten years. In this case, the aging-aware scheduler will adjust the operating frequency to limit the core temperature to 105°C. The actual temperature of the cores for the frequencies of operation mentioned in Table III is computed using HotSpot and is shown in Table IV. It can be seen that the proposed algorithm is adjusting the core frequencies in such a way that the temperature of operation of the cores is very close to the required value of 105°C. The average prediction error of the proposed scheme in the estimation of the core temperature is compared with the results reported in recent publications and is shown in Table V.

The lifetime of the cores, when operating with the temperatures shown in Table IV, is computed using (8). The theoretical lifetime corresponding to AF=1 is ten years. Table VI shows the lifetime of the cores corresponding to the steady state temperature values mentioned in Table IV. In this case, the algorithm is driving the operating frequency of the cores in such a way as to meet the lifetime requirement of ten years. The lifetime of the cores when operating with the frequencies estimated by the algorithm is having a maximum deviation of 2.85% from the required lifetime.

Using the methodology proposed, embedded application developers can perform a fast design space exploration between the lifetime reliability of the processor cores and the performance requirement of the tasks. Fig. 11 depicts the compromise between AF and the performance of the benchmark applications when linear regression is employed for temperature estimation along with linear interpolation for frequency estimation. At higher values of AF, processing cores have better lifetime reliability but at the expense of application execution performance. Fig. 12 represents the corresponding trade-off when polynomial regression is employed with continuous frequency assignment.

The concept suggested in this paper, where the aging-aware scheduler selects a task from a predetermined list of tasks to assign to a core, works well enough to extend the lifespan of multi-core systems running embedded workloads. A task might run at multiple scheduling points with a varying level of computational load because the complexity of the jobs that get executed on the cores may change over time. It is possible to account for these diverse computational costs at various execution times by building the regression model utilizing the heat profiles of logical units carrying out activities of varied computational costs at various execution times.
V. CONCLUSION AND FUTURE WORK

The aging-aware scheduler proposed in this work uses the developed computationally efficient models to estimate the steady-state temperature of the processing elements in multi-core processor architecture. Temperature values estimated with the models are used to predict electromigration-induced aging. The scheduler performs an aging-aware application mapping strategy for enhancing the lifetime reliability of the cores. The suggested scheduler will estimate the operating frequency of the processing cores for satisfying the lifetime reliability constraints with a gentle decline of the performance as opposed to a no-aging aware scheduler, where the workloads are distributed to the cores based on the performance need. Results from simulations show that the suggested approach can increase the lifespan of the operation of multi-core processor systems.

The algorithm proposed in this work is extensible and configurable. The proposed framework is configurable, as it is possible to use on-chip thermal sensor data for estimating the temperature and aging effects of the logical components along with the temperature data computed using the software tools. In the future, the framework may be extended to take into account the impacts of aging brought on by Hot Carrier Injection (HCI), Positive-Bias Temperature Instability (PBTI), and Negative-Bias Temperature Instability (NBTI), along with electromigration.

REFERENCES


