An Efficient Application Specific Memory Storage and ASIP Behavior Optimization in Embedded System

Low power embedded system requires effective memory design system which improves the system performance with the help of memory implementation techniques. Application specific data allocation design pattern implements the memory storage area and internal cell design techniques implements data transition speeds. Embedded cache design is implemented with simulator and scheduling approaches which can reduce the cache miss behavior and improve the cache hit quantities. Cache hit optimization, delay reduction and latency prediction techniques are effective for ASIP design. The design functionality is simply specifying the tradeoff among various design metrics like performance, power, size, cost and flexibility. ASIP behavior and memory storage area optimized for low power embedded system and implements cycle time with effective scheduling techniques which implements the system performance with low power


I. INTRODUCTION
Embedded systems uses some specific constraints such as Real time design metrics are a measurement of application features such as Cost, Size, Power and High Performances.Reactive and real time required to implement our system environments and computed application results in real time without any delay [Fig.1].Currently embedded system designer are being designed on a silicon chip and also design for critical applications like killer application (smart phone), smart card, video game, mobile internet, handheld embedded system, GBPS device, gigabyte per second LAN system.Embedded design technologies used to improve the design technology to enhance productivity has been a focus on software and hardware design mechanism.
In HLS design mechanism, Xilinx simulator software is used to verify all the functionality and timing custom peripheral design architecture [18,20].ASIP design used to implement the functional unit may then either be integrated on a chip or implements peripheral devices.Profiler is effectively used in Pre-allocation memory design and implements preallocation based execution delay time.Recently a memory implementation technique is attracting strong research interest in ASIP.ASIP is a heterogeneous platform composed of programmable processor core and used customized hardware environments [1,2,3].ASIC architecture is not flexible for specific application design architecture.DSP processor is also flexible and fully programmable; it can't achieve high performance with low power consumption and not suitable for various complex application development mechanisms.
VLIW processor unit require compiler support and VLIW architecture is characterized by instructions such that each specifies several independent operations.This is compared to RISC instructions that typically specify one operation and CISC instructions that typically specify the several operations with sufficient registers, A VLIW machine can place the results of speculative executed instructions in temporary registers.The level of sophistication in VLIW compiler is significantly higher.
The heterogeneous vector width method use to expose the heterogeneous vector widths for VLIW ASIP [10,13].Effective automation is analyzed for VLIW ASIPs.The lower bound latency is effective for VLIW ASIP.Latency bound mechanism implements the data transfer delays [9].By the help of these approaches a window data flow graph and lower bound deign mechanism reduce the delay penalties due to operation serialization or data transfer mechanism.
An effective emulation tool chain designed for ASIP design architecture [5].The FPGA based emulator is alternative to pure software cycle-accurate simulation and this tool chain to reduce the design exploration time [13].Fast and accurate processor simulator used for high performance ASIP simulation [4] and an integrated tool chain design also evaluated for ASIP systems [5].ASIP architecture also design for a Discrete Fourier transform (DFT)/Discrete cosine transform (DCT) /Finite impulse response filters (FIR) engine [14].
Memory data storage and operational optimal delay frequency analyzed according application computational conditions.Embedded process system analysis is presented in the next section.Section 3 and 4 represents the application specific data storage and data storage is effectively optimized in memory system.Last section represents application specific data storage in ASIP system and implements system performances with various techniques such as delay reduction, latency prediction and operational scheduling mechanism.www.ijacsa.thesai.org

II. APPLICATION SPECIFIC EMBEDDED PROCESS ANALYSIS
The basic process of embedded system is implemented with three basic mechanisms such as application compilation, synthesis and implementation, IP based integration and test and verification by specific simulator.By the help of this mechanism we implement the application based embedded systems design for low power embedded devices.In embedded system the HLS design mechanism Memory designer used high level language and implements behavioral specifications into register-transfer (RT) specifications by converting behavior on general-purpose processors to assembly code.The memory Designer also refines the register-transfer-level specification of a single-purpose processor into a logic specification and finally implements machine code for general-purpose processors and utilizes the gate-level net list.First Compilation/Synthesis process the designer specifies desired functionality in an abstract manner.A compiler translates the source language into its target machine language without having the option for generating intermediate code.
Each new machine have a full native compiler is required [Fig.2].The Software compiler converts a sequential program to an assembly code, which is essentially a register-transfer code and a system synthesis tool converts an abstract system specification into a set of sequential programs on general and single-purpose Processors.A logic synthesis tool converts Boolean expressions into a connection of logic gates (called a net list).A register-transfer (RT) level synthesis tool converts finite-state machines and register-transfers into a data path of RT components and a controller of Boolean equations.A behavioral synthesis tool converts a sequential program into finite-state machines and register transfers.
Second Libraries/IP based implementation phase is used the logic-level library and it consists of layouts for gates and cells.The RT-level library may consist of layouts for RTL components, like registers, multiplexers, decoders, and functional units.A behavioral-level library may consist of embedded components, such as compression components, bus interfaces, display controllers, and even general-purpose processors.IP integration design is used to implement the memory or various peripheral devices and integrating the device according to our application requirements.Finally, a system-level library might consist of complete systems, solving particular problems, such as an interconnection of processors, memory with accompanying operating systems and programs to implement an interface.
Finally, Test/Verification phase we have analyzed the functionality of the design is correct or not and checked the mechanism with low abstraction levels to high abstraction levels.Simulation mechanism better utilizes the testing for correct functionality.The Logic level, gate-level simulators provides output signal timing waveforms with a given input signal waveform.And finally RTL level, hardware description language (HDL) simulators used to execute the RTL-level descriptions and provide output according to the given input waveforms.The behavioral level, HDL simulators used to simulate sequential programs and co-simulators connect HDL and processor simulators to enable hardware/software coverification at the system level.Model simulator simulates the www.ijacsa.thesai.orginitial system specification using an abstract computation model, that independently of any kind of processor technology and these simulators verify the correctness and completeness of the specification [Fig.3].When all data element is filling in the cache only n/E cache misses occur for a fixed value index and the entire total operation use n 2 /E.If the cache is big enough that all n 2 /E cache lines holding column Y can reside together in the cache, then no more cache misses [Fig.14] occurred.Column index implementation technique implements the repositioning of memory data arrangements which reduces the cache misses or data cluster and it's easily serialize operational frequency.The total number of misses is depending upon 2n 2 /E, half for x and half of y.The Single processor will be computed n 2 /E elements of Z; performing n P /E where operation complexity p is changed according to application computation.

B. Scheduling approaches for reduction of memory space
Scheduling techniques are required to schedule the memory operations and operation scheduling effective determine the memory cost area.The scheduling algorithm will attempt to parallelize the operation to meet the timing constraints and scheduler mechanism will serialize the operation to meet the resource constraints [17].Various scheduling problem implemented with different requirement such as time constraints, resource constraint, feasibly constrained [19,20].Memory operation scheduling implemented with three conditions such as FCFOP (First Come First Operational), LCLOP (Last Come Last Operational) and operational optimal degree based operational [Fig.19, Fig. 20 and Fig. 21].Scheduling approaches implement according to some conditions time, resource and feasible levels.The max no. of time step finds the cheapest schedule which satisfied the constraints.Lower resources find the fastest with satisfied the constraints [Fig.22].Feasible conditions decide if there exists a schedule which satisfied the constraints or not.Memory optimization techniques and performance area is determined by standards application.An application specific memory simulation analyzed by various simulators such as trace driven, cheetah, cache, ARM DS-5 etc.The advantages of SRAM used in programming technology so designer reuse the chip during prototyping and a system can be manufactured using in system programming.
In co-design technology effective memory performance area is analyzed by various simulators.The Co-design technology of ASIP used hardware and software implementation designs system to achieve an effective performance in the form of cycle count, low power consumption, latency and execution time [Fig.23].The source code profiling approach easily understands the application to guide the ASIP design methodology.

A. Application specific profiling and compilation Overview
Profiler have used to analyze the target source programs by collecting information on their execution based due to their data granularity scheme [10].Profiler implements Preallocation of memory architecture and implements the execution time of application.A memory profiler used which implements dynamic profiling techniques to generate memory traces [10].Memory object is computed load/store information for ASIP design mechanism.Micro-profiling approach also fills the gap between source level and instruction level profiler and implements speed and accuracy for ASIP design system [11].LANCE [15] is mainly intended to facilitate C compiler design for embedded processors, so as to eliminate the need for time-consuming assembly programming [Fig.24].Figure 24 shows the basic framework of LANCE profiler overview which requires profiling library, source code and instrumented binary file for profiling.Embedded processors for which LANCE based C compilers have been successfully built include both RISCs and DSPs design.The implementation of edge profiling, path profiling methods combines profiles with in the Low Level Virtual Machine [16] compiler infrastructure [Fig.25].A Codelets EXTRACTOR and RE player implements the code isolation.Codelet is basically designed for implementing, compiled, run and measure independently for the original application.The ISA design require an effectively for a fine grained profiling mechanism is based on C compiler mechanism.

B. Application specific Latency prediction
Recently high level synthesis design is used efficient latency prediction techniques which implements the applications specific system performances and latency prediction design also used in clock cycle reduction mechanism or operational serialization.The number of time unit's clock cycles between initiations of stage is the latency between them.A latency of k means that the initiation are separated by k clock cycles.Any attempt two or more initiations to use the same stages at the same time they will cause a collision and collision must be avoided by scheduling a sequence initiations stages.In state diagram mechanism we have analyzed the function x from the initial stage (101010101110), only five outgoing transition are possible, corresponding to the five permissible latencies 10,8,6,4 and 1 in the initial collision vector.Similarly Free State (10101011), one reaches the same state offer three, five or seven shifts.
When condition is n+1 or greater, all the data transitions are redirected back to the initial states.A Collision can be implemented them by greedy cycles.Greedy cycles from the state diagram we can determine optimal latency cycles which result in the MAL.6),( 8), (6,8), (10,12) are simple cycles the cycles (6,12,10,12) are a complex cycle because of its travels these the states (101010101110) twice or more.Similarly (4,6,4,6,8,6) is not simple it repeats the state so we need greedy cycles is one whose edge are all made with minimum latencies from their respective starting states.The greedy cycles (1,12) average latency is 6.5, which is lower than that of the simple cycle (10,12) is 11[Fig.26].www.ijacsa.thesai.orgGreedy cycles have a constant latency which is equal the MAL (minimal average latencies points) for evaluating function X without causing collision the collision free scheduling approaches is thus reduced to finding greedy cycles from the sets of simple cycles.The greedy cycles yielding the MAL are the suitable choice for performance improvements.A latency sequence is a sequence of permissible forbidden latencies between the successive task initiations.A latency cycle is a latency sequence which repeats the same sequence indefinitely.Repeating of the cycles that reduces the collision between them and used the average latency that reduces the collision.Constant cycles contain is the latency cycles which contain only are latency value.The average latency cycles of a constant cycle are simple the latency itself.The target machine [RISC, CISC, and VLIW] can deploy more sophisticated instructions, which can have the capability to perform specific operations much efficiently.
If the target code can accommodate those instructions directly, that will not only improve the quality of code, but also yield more efficient results [Fig.27].Fixed point based latency optimal frequency optimized according various mechanism such as delay point and optimal operational frequency prediction mechanism.Operational serialize means how application computation complete the task with the least waste of time or least waste of hardware resources.Optimal condition is required to serialize the computational operations so resource reducible or operational optimal condition implements the latency design for ASIP system.Recently ASIP in our embedded system provides the benefits of flexibility and achieving excellent performances with low power consumption and ASIP also improves the functionality and design complexity with retargateable compiler technology.In real time embedded system designer implements the processor and memory architecture according to our application specific operational probability.ASIP system used the target machine can deploy more sophisticated instructions, which can have the capability to perform specific operations much efficiently for low power embedded system.Compilers and profiling mechanisms are also analyzed for ASIP and implements memory area reduction technique which improve the application execution performance.An effective cycle time, delay and scheduling prediction mechanism is used for memory implementation.An Efficient latency prediction technique is designed for operational serialization with the help of profiler and application specific computational complexity analyzed according to profiling execution delay time which is used in various high performance embedded devices.

Fig. 1 .
Fig. 1.Application Specific Requirements Based Embedded System Design

Fig. 2 .
Fig. 2. Design process used in embedded system

Fig. 4 .
Fig. 4. Application specific operations frequency analysis with design complexity

Fig. 12 .
Fig. 12. Block filter mechanism which Increase data probability A. Delay reduction design optimization The critical path would be combination logic delay plus the logic circuits setup time, plus the clock output delay.The critical path analysis with various nodes based implementation.Complex memory structures have various critical sections.Various critical paths cell delay analyzed and combinational path delay is implemented with column based cell architecture.Application based column design implements cycle reduction and this design is used for data allocations which can implement data shifting and reduce the memory misses [Fig.16].We have analyzed the performance based lower and higher frequency order based access time variations used in memory implementation mechanism [Fig.13, Fig. 14 and Fig. 15].The probability degree based access time pattern implements the critical section area.Higher critical section area has longer access time probability and it takes longer access time [Fig.16].Various approaches such as scheduling, allocation and binding pattern implements the access time and have a low probability frequency design which reduces the critical section area [Fig 17 and Fig. 18].Node based critical section is implemented the high and lower order path for access time point of view and column implemented shortest path have lower access time path which have global impact in system performances.

Fig. 23 .
Fig. 23.Application specific Memory Integration and Performance Area

Fig. 27 .
Fig. 27.Memory area implemented with resources Reducible flow mechanism and operational optimal condition VI.CONCLUSION