Compressed Sensing of Multi-Channel EEG Signals : Quantitative and Qualitative Evaluation with Speller Paradigm

In this paper the possibility of the electroencephalogram (EEG) compressed sensing based on specific dictionaries is presented. Several types of projection matrices (matrices with random i.i.d. elements sampled from the Gaussian or Bernoulli distributions, and matrices optimized for the particular dictionary used in reconstruction by means of appropriate algorithms) have been compared. The results are discussed from the reconstruction error point of view and from the classification rates of the spelling paradigm. Keywords—Compressed sensed; EEG; Brain computer interface; P300; Speller Paradigm


INTRODUCTION
In recent years, compressed sensing (CS) has attracted considerable attention in areas like applied mathematics, computer science, and electrical engineering by showing that, in certain conditions, it is possible to surpass the traditional limits of sampling theory.CS builds upon the fundamental fact that many signals can be represented using only a few non-zero coefficients in a suitable basis or dictionary.Nonlinear optimization can then be used to recover such signals from very few measurements [1].The concept of compressed sensing is an example of practical use of new mathematical results.The difficulties for using in applications of such results are related to the way such concepts are understood, in a more or less intuitive manner, in order to facilitate the fusion between theory and applications.
The literature of recent years shows an impressive number of papers in the CS field, covering both 1D and 2D medical signals.Among the 1D signals the most frequently used in CS applications are the electrocardiogram (ECG) and electroencephalogram (EEG) since they are most used in the medical world as well.In the case of EEG signals, there is often a need of records for longer periods of time (i.e., during the night) or for a large number of channels.
A brain computer interface (BCI) is a communication system that does not depend on the normal exit ways towards peripheral nerves and muscles.The development of a BCI or a system based on the communication by means of the electroencephalographic signals (EEG) is capable to connect directly the human brain with the computer.Using the EEG signal as a communication vector between human and machine is one of the new challenges in signal theory.The main element of such a communication system is known as "Brain computer interface -BCI".The purpose of the BCI is to translate the human intentionsrepresented as suitable signals into control signals for an output device, e.g. a computer or a neuro-prosthesis.A BCI must not depend on normal output traces of peripheral nerves and muscles.In the last two decades many studies have been carried out to evaluate the possibilities of using the recorded signals from the scalp (or from the brain) for a new technology than does not imply the control of the muscles [2] [3].
The BCI that uses the EEG signal is capable of measuring the human brain activity, detecting and discriminating certain specific features of the brain.The recent advances in BCI research widened the possibilities of applicability domains.
In this paper, we propose a compression method for EEG signals based on CS using universal EEG specific megadictionaries.In order to validate the proposed method, there were used the EEG recordings from the competition for Spelling, BCI Competition III Challenge 2005 -Dataset II.In order to rate the reconstructed signal, both quantitative and qualitative types of evaluation were used.As qualitative evaluation, we used the classification rate for the watched character based on P300 detection in the case of the spelling paradigm applied on the reconstructed EEG signals and using the winning scripts (Alain Rakotomamonjy [4]).For quantitative evaluation, there were used distortion measures such as PRD (percent of root-mean-square difference) and PRDN (namely PRD normalized) between the reconstructed and original signals.

II. BRAIN COMPUTER INTERFACE -P300 SPELLER PARADIGM
P300 speller paradigm uses the P300 waves that are expressions of event related potential produced during decision making process.P300 has two subcomponents (as shown in Fig. 1 a): the novelty P3 (also named P3a), and the classic P300 (renamed as P3b).P3a is a wave with positive amplitude and peak latency between 250 and 280 ms; the maximum values of the amplitude are recorded for the frontal/central electrodes.P3b has also positive amplitude with a peak around 300 ms; higher values are recorded usually on the parietal areas of the brain.Depending on the task, the latency of the peak could be between 250 and at least 500 ms.www.ijacsa.thesai.orgOne of the first examples for BCI is the algorithm proposed by Farwell and Donchin [5] that relies on the unconscious decision making processes expressed via P300 in order to drive a computer.The P300 speller paradigm was described in [5].The subject should watch a 6x6 matrix containing all letters and digits (as shown in Fig. 1 b) and should focus the attention on characters from a given word.The protocol contains several stages: Step 1: the matrix is presented to the subject for 2.5 seconds; Step 2: all lines and all columns are highlighted randomly and alternatively each for 100ms.
The procedure consists in repeating step 2 for 15 times (15 epochs) for each char-acter, followed by a pause of 2.5 seconds (step 1).For each given character, there will be 6x2x15=180 intensifications: 2x15 will contain the target character (once when the column is highlighted, second for the line it belongs to, repeated for 15 epochs) and the rest will not contain it.
For the BCI III competition the dataset has been recorded from two different subjects in five sessions each and signals have been bandpass filtered in the range 0.1 -60Hz and digitized at 240Hz.Each session is composed of runs, and for each run, a subject is asked to spell a word.For a given acquisition session, all EEG signals of a 64-channel scalp have been continuously collected.The train set contained 85 characters and the test set 100 characters for each of the two subjects.A more detailed description of the dataset can be found in the BCI competition paper [6].
The competition winners, Alain Rakotomamonjy and Vincent Guigue propose a method that copes with such variability through an ensemble of classifiers approach [4].Each classifier is composed of a linear Support Vector Machine trained on a small part of the available data and for which a channel selection procedure has been performed.They succeeded a classification rate of 95.5% for 15 sequences and 73.5% for 5 sequences [4].

III. COMPRESSED SENSING
In case of a vector n x   which can be represented using only few elements from the basis defined by the columns of the matrix nxn B   , the x vector can also be written as: where  is the sparse decomposition of x in B.
The approach of compressed sensing assumes to take only a set of m measurements of x which can be obtained by projecting x on m random vectors with n m  .By considering these vectors as the lines of a matrix mxn P   , the acquisition operation is described by the equation: is also called effective dictionary and with this notation the known form of CS is expressed as: This equation shows how the sparse vector  is acquired by means of matrix A .The name "compressed sensed" indicates that the number of projection vectors m is much smaller than the signal dimension n .
The key problem of CS is the recovery of the sparse n - dimensional vector  from the n m  projections contained by y .The system of equations is undetermined, but the number k of nonzero elements of  is small under the sparsity hypothesis . From  , the original signal x can be obtained using the equation CS uses the property of the signal that it is sparse in a certain basis.A fundamental result published in [7], [8] is: if  has enough entries with value zero and matrix A fulfills certain conditions, then  is the sparsest solution of the acquisition system of equations.Namely,  can be obtained as a solution The first case, where the 0 l norm is used, is an NP-hard problem [9] and this requires an algorithm of non-polynomial complexity for solving it; NP-hard problems are practically impossible to be solved for usual dimension of data.The second case that uses the 1 l norm is known as Basis Pursuit [10].BP is a convex optimization problem that can be reformulated as a linear programing problem for which there are available many efficient algorithms.

IV. METHOD
The key element in the success of signal compression based on compressed sensing is the right choice of the dictionary based on which the reconstruction will be done.Generally, the ECG and EEG biomedical signals don't have a very high sparsity in standard dictionaries.Due to their specificity, in the case of ECG signals the alignments of waveforms from which the dictionary atoms are selected with respect to the R wave or QRS complex improve very much the results.For EEG signals, alignments are difficult or even impossible since they do not usually contain repetitive elements in the time domain.Thus, the EEG signals might be discussed in the frequency domain as well.
In the case of the BCI spelling experiment, a change in the waveforms, the P300, has been observed after about 300ms after the stimulus.This temporal behavior has been clearly put into evidence by averaging several EEG signals with the natural alignment represented at the moment when the stimulus was applied.This temporal alignment based on the start time of the stimulus and tacking into account that P300 appears after about 300 milisec does not allow a real time compressed acquisition.For this alignment, preprocessing is needed for both the acquisition and the decompression of the signal.The introduction of preprocessing at acquisition has as a major drawback the elimination of the advantage of the compressed sensed technique, namely, very low complexity of calculations in the acquisition stage.
Starting from the above statements, we tested the possibility to build a universal mega-dictionary consisting of EEG segments from all 64 channels.Thus, there were selected for each channel three atoms, consisting in EEG segments from the corresponding channel, so in total one obtained a dictionary made up of 3x64 = 192 atoms.The size of the dictionary is 192x240, because each atom has the size of 240.For the construction of the dictionary, the training signal from the paradigm of spelling was used.The testing of the method was done using EEG test signals which consist in compressed sensed EEG signals [11].
As acquisition matrix, we tested three types of matrix:  Bernoulli matrix with elements -1, 0 and 1  Random matrix  optimized matrix depending on dictionary [12] -(product of random matrices and the dictionary transposed) Compared with [11] where it was made an analysis of the dictionaries used in the reconstruction phase in this work is intended to analyze the projection matrix used in the compression stage.

V. EXPERIMENTAL RESULTS AND DISCUSSIONS
For the evaluation of the analyzed methods we used the dataset II of the BCI Competition III 2005 -P300 Spelling.
For compression evaluation we used the compression rate (CR) defined as the ratio between the number of bits needed to represent the original and the compressed signal.For qualitative evaluation of the method based on the classification rate in spelling paradigm, we used scripts from the winners, A. Rakotomamonjy and V. Guigue [4] (the scripts implement classification based on all 64 EEG channels).
For the construction of the dictionary, we used the training EEG signals from subjects A and B, respectively, and for the testing of the proposed method we used the test EEG signal from subject B.
Next we will present the results of EEG signals decompression from subject B. The results will be presented both as a measure of distortion of the original and decompressed EEG as well as the classification rate in spelling paradigm.
Thus, Table 1 presents the classification results in paradigm spelling using original data and software from [4].It shows an average classification using all channels of 89.37% and on most individual channels there is a classification rate of 93%.www.ijacsa.thesai.orgTables 2 -4 presents the classification results for all 64 channels for a compression of 10:1, respectively 5:1 for subject B and Bernoulli matrix vs. random matrix vs. optimized matrix depending on dictionary.Results obtained with Bernoulli matrix are comparable to those achieved with random matrix.But when using optimized matrix depending on dictionary results are much improved, being comparable to results obtained with original signals.If we strictly reference to the classification rate in paradigm of spelling, classification rates are obtained even improved for CR = 10: 1 and 5: 1. www.ijacsa.thesai.org

VI. CONCLUSIONS
In this paper a comparative analysis of the results obtained using several types of projection matrices (matrices with random i.i.d.elements sampled from the Gaussian or Bernoulli distributions, and matrices optimized for the particular dictionary used in reconstruction by means of appropriate algorithm) and a mega-dictionary for EEG signals compressed sensing is presented.For the evaluation of the proposed method we used the dataset from the BCI Competition III 2005 -P300 Spelling.In order to evaluate the results of the EEG signal reconstruction the PRDN was used in parallel with the classification rate of the spelling paradigm assessed using the scripts from the winner of the competition (the version of classification using all 64 channels).The best results were obtained with matrices optimized for the particular dictionary used in reconstruction.
Thus, for the mega-dictionary the best results in terms of classification at the spelling paradigm are obtained for CR = 5:1 and 10:1 when the achieved classification rate was 90%, respectively, 92% (for the original signals the classification rate was 89.37%).In terms of error, the PRDN was 29.77 for the 5:1 compression and PRDN = 42.32 for the 10:1 case.
The results demonstrate that the proposed method with mega-dictionary and optimized matrix depending on dictionary provides greatly improved results compared to the standard matrices.

Fig. 1 .
Fig. 1.P300 wave and the classical P300 spelling paradigm described by Farwell-Donchin 1988 of the following optimization problem with constrains: the most common values of p , To validate the compression we evaluated the distortion between the original and the reconstructed signals by means of the PRD and PRDN (the normalized percent-age root-meansquare difference): of the original and the reconstructed signals respectively, is the mean value of the original signal, and N is the length of the window over which the PRDN is calculated.

TABLE I .
[4]SSIFICATION PERFORMANCE% IN P300 SPELLING FOR ORIGINAL DATA (FOR B SUBJECT) AND SOFT FROM[4]

TABLE IV .
CLASSIFICATION PERFORMANCE% IN P300 SPELLING FOR RECONSTRUCTED EEG SIGNAL WITH SOFTWARE FROM [4] FOR A COMPRESSION CR = 10:1 RESPECTIVELY CR = 5:1 SUBJECT B AND OPTIMIZED MATRIX DEPENDING ON DICTIONARY

TABLE VI .
THE TOPOGRAPHY OF PRDN FOR EEG COMPRESSED SENSING (CR = 5:1 AND RESPECTIVELY CR = 12:1 FROM UP TO DOWN)