Comparison of Inter-and Intra-Subject Variability of P 300 Spelling Dictionary in EEG Compressed Sensing

In this paper, we propose a new compression method for electroencephalographic signals based on the concept of compressed sensing (CS) for the P300 detection spelling paradigm. The method uses a universal mega-dictionary which has been found not to be patient-specific. To validate the proposed method, electroencephalography recordings from the competition for Spelling, BCI Competition III Challenge 2005 Dataset II, have been used. To evaluate the reconstructed signal, both quantitative and qualitative measures were used. For qualitative evaluation, we used the classification rate for the observed character based on P300 detection in the case of the spelling paradigm applied on the reconstructed electroencephalography signals, using the winning scripts (Alain Rakotomamonjy and Vincent Guigue). While for quantitative evaluation, distortion measures between the reconstructed and original signals were used. Keywords—Biomedical signal processing; Brain-computer interfaces; Compressed sensing; Classification algorithms; Electroencephalography


INTRODUCTION
In the last years, the CS method has attracted considerable attention in areas such as applied mathematics, computer science, and electrical engineering.Basically, the method speculates the fact that, in certain conditions, many signals can be represented using only a few non-zero coefficients in a suitable basis, and nonlinear optimization can be used to recover such signals from very few measurements [1].The concept of compressed sensing is a classic example of practical use of the new mathematical concepts.The difficulties for using in applications of such concepts are related to the way such results are perceived, in a more or less intuitive manner, in order to facilitate the fusion between theory and applications.The literature of recent years shows a large number of papers in the CS field [2,3], covering both 1D and 2D medical signals [4,5,6].Among the 1D signals currently used in CS applications, are the electrocardiogram (ECG) and electroencephalogram (EEG) since they are commonly used in the medical world as well.In the case of EEG signals, very often there is a need of records for longer periods of time (i.e., during night) or for a large number of channels.Paralyzed persons (e.g. with lateral amyotrophic sclerosis, cerebral stroke or severe polyneuropathy) or with other motor disabilities need alternative methods for communication and control.Using the EEG signal as a communication vector between human and machine is one of the new challenges in signal theory.The main element of such a communication system is known as "Brain Computer Interface -BCI".The purpose of BCI is to translate human intentionsrepresented as suitable signalsin control signals for an output device, e.g. a computer or a neuroprothesys.A BCI must not depend on normal output traces of peripheral nerves and muscles.In the last two decades, many studies have been carried out to evaluate the possibilities that recorded signals from the scalp (or from the brain) to be used for a new technology that does not imply muscles control.
The BCI that uses the EEG signal is capable of measuring the human brain activity, of detecting and of discriminating certain specific features of the brain.Recent advances in BCI research widened the possibilities of applicability fields.Intelligent devices that are capable to compensate some drawbacks associated with the lack of information from the EEG signals are also useful to persons with milder disabilities.
The definition of BCI largely accepted by the research community given in [7], states that BCI is a system of communication in which the messages or the commands to the outside world by an individual are not passing through the normal brain ways, i.e., those implemented by the peripheral nerves and muscles.
The first mentioning of communication by means of the BCI was made by Vidal in 1973 [8].Nowadays there are many research teams involved in BCI research.Different approaches and results are achieved, but they are not always precise and imply complicated hardware [8].Since the development of a BCI combines a great variety of disciplines (e.g.medicine, biology, physics, bioengineering, electronics, computer science, mathematics), the implied aspects are numerous and diverse.
The BCI framework used in this paper is based on P300 Event Related Potentials (ERP), which are natural responses of the brain to some specific external stimuli.

II. COMPRESSED SENSING
Shannon's sampling theory represents, in many cases of signal classes, a too severe limitation.It can be overcome by using the "Compressed sensing" theory (compressive sensing, compressive sampling and sparse sampling) perfected in the www.ijacsa.thesai.orgpast few years by prestigious researchers such as D. Donoho [10,11], E. Candès [12], M. Elad, etc.The concept of compressed sensing (CS) is a new and revolutionary method which attracted the attention of many researchers and it is considered to have a high potential, with multiple implications and applications, in all fields of exact sciences.Basically CS is a technique for finding sparse solutions to underdetermined linear systems.In the signal processing domain, CS is the process of acquiring and reconstructing a signal that is supposed to be sparse or compressible.
The advantage of compressed sensing is that the acquisition stage is very fast, with very low complexity, and it is done in real time leading to a compressed EEG signal.The difficult part is the EEG reconstruction where two aspects are crucial: computing complexity (currently there are many mathematical algorithms which can be chosen depending on the needed accuracy, time and available resources) and knowledge of a dictionary for which the initial EEG signal has a satisfactory sparsity.
CS studies the possibility of reconstructing a signal x from a few linear projections, also called measurements, given the a priori information that the signal is sparse or compressible in some known basis  .The vectors on which x is projected onto are arranged as the rows of an nxN projection matrix  , n < N, where N is the size of x and n is the number of measurements.Denoting the measurement vector as y, the acquisition process can be described as The system of equations ( 1) is obviously undetermined.
Under certain assumptions on  and  , however, the original expansion vector  can be reconstructed as the unique solution to the optimization problem (2); the signal is then reconstructed with (3).Note that (2) amounts to finding the sparsest decomposition of the measurement vector y in the dictionary  .Unfortunately, ( 2) is combinatorial and unstable when considering noise or approximately sparse signals.Two directions have emerged to circumvent these problems: (i) pursuit and thresholding algorithms seek a suboptimal solution of ( 2) and (ii) the Basis Pursuit algorithm [12] relaxes the 0 l minimization to 1 l , solving the convex optimization problem (4) instead of the original.
Using the 0 l norm is an NP-hard problem [13] that requires an algorithm of non-polynomial complexity.Such problems are practically impossible to be solved for usual dimensions of data.

III. BRAIN COMPUTER INTERFACE -P300 SPELLER PARADIGM
P300 is an event related potential which occurs at 300 ms after a rare and relevant event.
P300 has two subcomponents (as shown in Fig. 1 a): the novelty P3 (also named P3a), and the classic P300 (renamed as P3b).P3a is a wave with positive amplitude and peak latency between 250 and 280 ms; the maximum values of the amplitude are recorded from the frontal/central electrodes.P3b has also positive amplitude with a peak around 300 ms; higher values are recorded usually on the parietal areas of the brain.Depending on the task, the latency of the peak could be between 250 and 500 ms.Most of the paradigms that use the P300 evoked potentials are derived from the one proposed by Farwell and Donchin in [14].
The P300 speller is based on the so-called oddball paradigm which states that rare expected stimuli produce a positive deflection in the EEG after about 300 ms.It consists of a 6 × 6 matrix of characters as shown in Figure 1.This matrix is presented on computer screen and the row and columns are flashed in a random order.The user is instructed to select a character by focusing on it.The flashing row or column evokes P300 response in EEG.The non-flashing rows and columns do not contribute in generating P300.Therefore, the computer can determine the desired row and column after averaging several responses.Finally, the desired character is selected.
For the BCI III competition, the dataset has been recorded from two different subjects in five sessions each.The procedure consists in repeating for 15 times (15 epochs), for each character, followed by a pause of 2.5 seconds.For each given character, there will be 6x2x15=180 intensifications: 2x15 will contain the target character (once when the column is highlighted, second for the line it belongs to, repeated for 15 epochs) and the rest will not contain it.The signals have been bandpass filtered in the range 0.1 -60Hz and sampled at 240Hz.Each session is composed of runs, and, for each run, a subject is asked to spell a word.For a given acquisition session, all EEG signals of a 64-channel scalp have been continuously collected.The train set contained 85 characters, and the test set of 100 characters for each of the two subjects.A more detailed description of the dataset can be found in the BCI competition paper [15].
The classification problem can be formulated as follows: given the 64-channel signals collected after the intensification of a row or column, we want to predict if such signal includes or not a P300.This first part of the problem is thus a binary classification problem.In accordance with the classification of each post-stimulus signal, the goal is to correctly predict the desired character using the fewest sequences as possible.A second part of the problem deals with a 36-class classification problem as it seeks recognition of a symbol from the 6 × 6 matrix, as shown in Figure 1 [9].
The competition winners, Alain Rakotomamonjy and Vincent Guigue, proposed a method that copes with such variability through an ensemble of classifiers approach [9].Each classifier is composed of a linear support vector machine (SVM) trained on a small part of the available data and for which a channel selection procedure has been performed.They achieved a classification rate of 95.5% for 15 sequences and 73.5% for 5 sequences [9].Thus, in the preprocessing stage, for each channel, all data samples between 0 to 667 ms, posterior to the beginning of an intensification, were extracted.Afterwards, each extracted signal has been filtered with an 8order band-pass Chebyshev Type I filter with cut-off frequencies 0.1 and 10 Hz and has been decimated according to the high cut-off frequency.At this point, an extracted signal from a single channel is composed of 14 samples.The solution proposed by the winners consists of an ensemble of classifiers, the 85 characters from the training set being divided into 17 groups of 5 characters.The individual classifications are SVM with linear kernel.Each single SVM training involves a model selection procedure for setting its regularization parameter C [9].

IV. METHOD
In general, the biomedical signals do not have a good sparsity in the standard type dictionaries as wavelet, DCT, DST etc. [16].This is why, for EEG and ECG signals, in most of cases, it is preferred to build signal specific dictionaries, which take into consideration the statistic of the signal, or the repetitive elements from the signal.For example, the ECG signal has a pseudo-cyclicity for the QRS complex, and the P and T waves which can be exploited.The EEG signal is a much more complex signal that has no visible repeated elements.The EEG signal is mainly composed of alpha, beta, theta, and delta waves which have significance in clinical interpretation but they are visible only in the frequency domain.

A. The dictionary
Taking into account the missing visible, repeated elements from EEG signals and the results obtained previously in [17,18,19,20], it is apparent that, in case of EEG signal, an option to build the dictionary is that of using the EEG signal itself.In the case of the spelling paradigm, the dictionary will be built from the data used in the training set.
We tested the possibility of building an universal megadictionary consisting of EEG segments from all 64 channels.Thus, for each channel, three atoms were selected, consisting in EEG segments from the corresponding channel, so that a dictionary made up of 3x64 = 192 atoms has been obtained.The size of the dictionary is 192x240, since each atom has the size of 240.For the construction of this dictionary, it was used the training signal from the paradigm of spelling.
The testing of the method was done using EEG test signals which consist in compressed sensed EEG signals.The proposed method is tested also for the inter-subject variability of the dictionary, namely the dictionary with signals from the training set of a subject was tested with signals from the testing set of the other subject.The spelling data base has only two subjects and this led to the following possible combinations to validate the proposed method:

B. The acquisition matrix
At the acquisition matrix level, namely the projection matrix, three types of matrices can be used: random matrix, Bernoulli-type matrix (with values of -1, 0, 1in equal ratios) or an optimized matrix that takes into account the used dictionary for reconstruction.Thus, taking into account the previous results from [17,19,20], in this work, we have used the optimized matrix.Shortly, for a given dictionary, if we multiply the projection matrix with the transposed dictionary we will get an optimized projection matrix for that dictionary.This optimization procedure was detailed in [19].

V. EXPERIMENTAL RESULTS AND DISCUSSIONS
For the evaluation of the proposed method, we used the dataset II of the BCI Competition III 2005 -P300 Spelling (the dataset has been recorded from two different subjects; The train set contained 85 characters, and the test set of 100 characters for each of the two subjects).www.ijacsa.thesai.orgFor compression evaluation, we used the compression rate (CR) defined as the ratio between the numbers of bits needed to represent the original and the compressed signal.We have also evaluated the distortion between the original and the reconstructed signals by means of the PRDN (the normalized percentage root-mean-square difference ( 6)) for to validate the compression: For qualitative evaluation of the method based on the classification rate in spelling paradigm, we used scripts from the winners, A. Rakotomamonjy and V. Guigue [9] (the scripts implement classification based on all 64 EEG channels).
In Table 1, we present the classification results in paradigm spelling using original data and the software from [9].It can be observed that after the reconstruction using a dictionary built with signals for the training stage of subject B, the obtained classification rates in the spelling paradigm are better than in the case of the original signal.This scenario is true for both subject B (92.4% versus 89.37%), and subject A (89.15% versus 87.10%).In the case of using a dictionary built with signals for training from subject A, the obtained results are very close, but a slightly under the performance for the original signals.In Figure 2, the PRDN errors for the two subjects using the two dictionaries vs. EEG channels are presented.It can be observed a consistency of errors reported to channel, that can be explained as follows: some of channels are reconstructed with errors, independently of the used dictionary.A possible hypothesis is that those channels have different statistics and a higher variability compared to the other channels.In Figure 3, we present an original EEG segment (red) and its reconstructed variant based on a dictionary built using its own training set (blue) and the alternative with a dictionary built from the signals from the training set of the other subject (black).It can be observed that the shape of the EEG signal is preserved, but there are some variations.
Taking into account the classification results from the spelling paradigm, we may state that those variations of the reconstructed signal do not influence the classification in this paradigm.In this paper, it is presented a comparative analysis of results obtained using a mega-dictionary for EEG signals compressed sensing related to the spelling paradigm and using a mega-dictionary built from pieces of the train EEG signals.
For the evaluation of the proposed method, the dataset from the BCI Competition III 2005 -P300 Spelling has been used.In order to evaluate the results of the EEG signal reconstruction the PRDN was used in parallel with the classification rate of the spelling paradigm assessed using the scripts from the winner of the competition (the version of classification using all 64 channels).
The main result is the verification of the hypothesis that the mega-dictionary is not patient-specific.The testing of this hypothesis involved the construction of a dictionary from the train set of a subject, and using it for the reconstruction of the test signals for other subjects.Even though the used database had only two subjects, the recorded EEG signals were long enough.Thus, both the usages of dictionary for the same subject, and for the other subject were tested.Even though the quantitative measure of the EEG signals reconstruction error expressed by PRDN was around 45, it has been found that the classification rates in the spelling paradigm are very close to the values obtained for the original signal or even above them.
These results can be read in the sense that for the classification rate within the paradigm of spelling it is very important to preserve the shape of the EEG signal while small reconstruction errors do not matter significantly.
The advantage of compressed sensed is that the acquisition stage is very fast, with very low complexity, it is done in real time and, after this stage, it results a compressed EEG signal.The difficult part is the EEG reconstruction and this is due to the following two aspects:  The complexity of computing, but currently there are different mathematical algorithms and, depending on the needed accuracy, time and available resources, a favourable algorithm can be chosen;  The knowledge of a dictionary in which the initial EEG signal has a satisfactory sparsity.
The obtained results, in particular the classification rate in the spelling paradigm, demonstrate that the built dictionary ensures the reconstruction of the EEG signal with good results, regardless of the train EEG signal used for the dictionary construction.

Fig. 1 .
Fig. 1.P300 wave and the classical P300 spelling paradigm described by Farwell-Donchin 1988 www.ijacsa.thesai.org of the original and the reconstructed signals, respectively, x is the mean value of the original signal, and N is the length of the window over which the PRDN is calculated.

Fig. 2 .
Fig. 2. PRDN_Mean vs. channel for subject A and respectively subject B using dictionaries construct by train test A and train test B

Fig. 3 .
Fig. 3. Example by original signal (subject A with red) and reconstructed signals (TrainA -TestA blue and TrainB-TestA black)

Fig. 4 .Figure 4
Fig. 4. The topography of PRDN for EEG compressed sensing for subject A and CR = 10:1 (by TrainA -TestA and by TrainB -TestA from up to down) Figure 4 and 5 show the PRDN topography for subject A and respectively B. This topography shows that frontal/central electrode sites present a PRDN smaller than the other electrodes, a specific area for the P3a wave.Next area, as PRND error is parietal area, specific area for the P3b wave, and the biggest errors are in temporal zone.The temporal area has not too much significance for the P300 generation.