Adaptive Error Detection Method for P 300-based Spelling Using Riemannian Geometry

Brain-Computer Interface (BCI) systems have become one of the valuable research area of ML (Machine Learning) and AI based techniques have brought significant change in traditional diagnostic systems of medical diagnosis. Specially; Electroencephalogram (EEG), which is measured electrical activity of the brain and ionic current in neurons is result of these activities. A brain-computer interface (BCI) system uses these EEG signals to facilitate humans in different ways. P300 signal is one of the most important and vastly studied EEG phenomenon that has been studied in Brain Computer Interface domain. For instance, P300 signal can be used in BCI to translate the subject’s intention from mere thoughts using brain waves into actual commands, which can eventually be used to control different electro mechanical devices and artificial human body parts. Since low Signal-to-Noise-Ratio (SNR) in P300 is one of the major challenge because concurrently ongoing heterogeneous activities and artifacts of brain creates lots of challenges for doctors to understand the human intentions. In order to address above stated challenge this research proposes a system so called Adaptive Error Detection method for P300-Based Spelling using Riemannian Geometry, the system comprises of three main steps, in first step raw signal is cleaned by preprocessing. In second step most relevant features are extracted using xDAWN spatial filtering along with covariance matrices for handling high dimensional data and in final step elastic net classification algorithm is applied after converting from Riemannian manifold to Euclidean space using tangent space mapping. Results obtained by proposed method are comparable to state-of-the-art methods, as they decrease time drastically; as results suggest six times decrease in time and perform better during the inter-session and inter-subject variability. Keywords—Brain Computer Interface; EEG; P300; Riemannian geometry; xDAWN; Covariances; Tangent Space; Elastic net


I. INTRODUCTION
In recent ML (Machine Learning) techniques have been vastly used to solve the various medical classification problems such as Liver, Heart, Neurological disorders and others.Particularly; human intuition interpretation using P-300 signals from EEG (Electroencephalogram) have become one of the significant problems, since BCI (Brain Computer Interface) establishes point of communication between doctors and patients where the subject is found with complete paralyzed situation.For example, locked-in syndrome paralysis is a condition in which patient is found awake and fully aware but no expression could be made by patients to communicate with doctors.The patients with spinal cord injury leading to ALS (Amyotrophic Lateral Sclerosis) so called neuro-degenerative disease in which loss of voluntary control muscles produces human paralysis and patient become unable to communicate.In these conditions Brain Computer Interface can help in controlling of external devices or as a reliable tool for communication.Electrical activity of neurons in brain can be recorded using Electroencephalogram (EEG).This EEG data is of great use: neurologists can use for diagnosing neurological disorders.BCI experts can use for means of communication.Nowadays BCI has become one of assistive tool for human beings, it can control wheel chairs, video games, entertainment, spellers, and other assistive technologies [1].In order to address above stated challenge this paper proposes a system called Adaptive Error Detection method for P300-Based Spelling using Riemannian Geometry, the system comprises of three main steps, in first step raw signal is cleaned by preprocessing.In second step most relevant features are extracted using xDAWN spatial filtering along with covariance matrices for handling high dimensional data and in final step elastic net classification algorithm is applied after converting from Riemannian manifold to Euclidean space using tangent space mapping.Since BCI is vastly used for controlling external devices and as a reliable tool for communication.In these circumstances as stated above; BCI can be used to establish a reliable communication channel directly from patients brain signals to the computer [2], but low Signal-to-Noise Ratio (SNR) in P300 is one of the major challenges because concurrently ongoing heterogeneous activities and artifacts of brain create lots of difficulties for doctors to understand the human intentions.P300 signal is one of the vastly explored non-invasive Brain Computer Interface scenario.Response of P300 signal can be considered as an oddball scenario, where low probability of desired events is mixed with high probability of undesired events.An ERP or event-related-potential can be described as recorded P300 response during the course of decision making when subject reacts to different kinds of stimuli.P300 [3] speller has 6X6 matrix of characters containing alpha-numeric characters as shown here in Figure .1. Different kinds of stimuli can be explored such as auditory, visual etc., however P300 speller uses visual stimuli [4] only.In P300 Speller scenario a cap containing electrodes, is placed on scalp of subject.These electrodes record brain waves data produced against a visual stimuli of letter displayed on screen.Raw EEG signal is inherently very noisy; eventually, it has very low SNR.Therefore, it is very hard to differentiate true P300 response from recorded EEG signal.Low SNR is due to different ongoing electrical activities of neurons in brain In this scenario, goal of classifier is to predict based on training data; whether P300 speller has recognized Target letter correctly or not according to subject's intention.In this paper, we are proposing a ML pipeline for classification of EEG data from P300 speller having robust and reliable feature extraction method and classification algorithm, which minimizes misinterpreted classified commands against user's actual desired intentions and provide reliable means of communication.For this objective, BCI system must possess two key properties: 1) Better classification technique for P300 Speller which enhances across-subject generalization and acrosssession generalization, having optimal accuracy.2) Fast convergence to optimal parameters requiring minimal calibration and training data.
This paper is organized in five sections.Section one is used to describe the introduction of this paper, section two describes the related works, section three defines the materials and methods, section four represents the results and in section five conclusion and discussion is discussed.

A. Related work
Classification of P300 speller is of one the active research area(s) of ML (Machine Learning) and many approaches have been proposed to solve the classification problem.Our approach deals as productive modelling for P300 classification in machine learning domain.Some of the related works have been presented as below.
For P300 classification, comparison of different classifier algorithms is presented in [6], between different classifiers such as stepwise linear discriminant analysis (SWLDA), support vector machines (SVMs) and LDA (Linear Discriminant Analysis).Results suggest that SWLDA performs better than other classifiers.Although SWLDA works as simple LDA, but at start SWLDA has no feature in discrimination function; it starts adding features one by one using their statistical significance from p-value.In this paper only 16 electrodes were used.This paper proposes a scalable system, which can incorporate as many as 56 electrodes.
A system [7] was proposed to classify ERP data from P300 experiment with optimal accuracy.Spatial as well as spatiotemporal features are used along with Linear Discriminant Analysis with shrinkage as classifier and proposed an analytical method for estimating regularization parameter for LDA.Then comparison between different classifiers such as SWLDA [6] and simple LDA is presented.Results suggest their model performs better than other two.They used 55 channels but for only 7-time sample points after occurrence of an event.Epoch window size of 7-time sample points is very small; it is likely that important temporal information for whole trial can be missed.This paper proposes a scalable system, which can incorporate as many as 260 time points, maintaining full spatial as well as temporal information.
Comparison of different feature selection and classification methods is presented by [8] using five different datasets from Brain Computer Interface paradigm.For model building; spectral, spatial and spatio-temporal filtering as feature selection was applied and for classification four methods LDA, SWLDA, rLDA (regularized LDA) and rLR (regularized logistic regression) were used.Results suggest that regularized classifiers are better in terms of performance.This suggested approach requires a lot of manual parameter tuning and results are not generalizable across sessions and across users.
System based on Riemannian geometry was firstly introduced by Congedo et al. [9] for classification of BCI systems.Data from all three modalities of BCI Motion Imagery (MI), Steady State Visually Evoked Potential (SSVEP) and ERP from 22 electrodes.They proposed a classifier known as MDM (Minimum Distance to Mean) classifier, which works on principle of Riemannian geometry.This classifier works by calculating covariance mean matrices for each class in training data as a representative and assigns the label to test data by calculating their distances from mean covariance matrices of classes.For comparison they used Common Spatial Patterns+ [10] + Linear Discriminant Analysis [11].Results suggest that, Riemannian geometry based classifier Minimum Distance to Mean (MDM) works better and requires small amount of data for training and also works better in across-subject and acrosssession generalization.As robust and efficient classification algorithms like Neural network, SVM, etc. work in euclidean space; so cannot be applied to covariance matrices belonging to riemannian manifold directly, this is main limitation of this approach.
Feature selection technique known as xDAWN based on spatial filtering was introduced by [12].Main motivation behind this technique was to discriminate and enhance P300 evoked response.This technique works on assumption that P300 response related to an ERP is least frequent and target response occurs simultaneously, has very little spatial subspace span in the recorded signal.This algorithm is based on QR factorization of matrices [13].For comparison, 3 subjects data having 29 electrodes were used.Results comparison was done between different feature extraction techniques such as xDAWN spatial filtering, ICA and PCA, and classification using Bayesian Linear Discriminant Analysis (BLDA).Results suggest that xDAWN spatial filtering technique has brought significant improvement on accuracy as compared with ICA and PCA.However, if xDAWN components are increased, then performance slightly decreases and computational complexity increases, this is main limitation of algorithm.
As discussed earlier famous classification techniques works in euclidean space and hence cannot be applied directly in Riemannian Manifolds.To overcome this limitation [14], [15] introduced Tangent Space mapping (TS).Tangent Space bridges euclidean space and Riemannian manifolds.Tangent space mapping projects covariance matrices belonging to Riemann manifold into Euclidean space vectors.By using this mapping, one can use classical and efficient classifiers such as LDA, SVM etc. on covariance matrices directly, instead of using MDM [9].Results suggest that, significant improvement can be achieved; without need of spatial filtering of electrodes as they are exploited in their native space.Although, from BCI; Motor Imagery (MI) was considered in these both papers [14], [15], though this technique can be extended for classification of P300 signal.
All above stated approaches have performed better, but some techniques work well on one data set while perform poorly on other data set.However, [5], [ 9], [16] proposed a system that uses Riemannian based covariance as features along with other additional meta features for instance word length, feedback detail, etc. to enhance the prediction performance.These newly introduced techniques has brought significant improvement for classification, but also brought adverse effect on computational complexity of the classifier.Hence, P300 classification system direly needs a novel approach which can minimize computational complexity along with consideration of optimal features towards time accuracy trade-off.

II. MATERIAL AND METHODS
Proposed approach is based on modelling machine learning pipeline, which deals with the classification problem of the P300 speller; whether an event is erroneous or not.Our proposed methodology as graphically shown in Figure .2, is divided into four interconnected steps where each step is assigned several inputs to interact with each other.In first step; raw data is cleaned using simple preprocessing techniques, in second step BCI related features are extracted from cleaned data using xDAWN spatial filters along with covariance matrices and tangent space mapping.In third step classification model is constructed by using elastic net and in fourth step the classification accuracy have been measured.These steps are discussed in detail below.

A. Preprocessing
In preprocessing step, EOG(ElectroOculoGram) channel is removed which gives information about the noise caused by blinking of subjects eye. 5 th order butter-worth filter is employed to bandpass rest EEG channels between 1 to 40 Hz.Butter-worth filter is maximally flat magnitude filter, which have response in its passband.For each trial each epoch window is set to 1.3 seconds after the occurrence of feedback event.After preprocessing of raw signals, features are extracted using below defined ML pipeline.

1) xDAWN:
As discussed earlier, brain waves recorded through EEG are noisy, this recorded EEG signal has information about true P300 signal as well as other ongoing background activities of the brain such as muscular artifacts, eye blinking etc.Therefore, acquired EEG data has very mcuh noise and this low SNR makes classification task challenging.Different techniques have been investigated in literature to improve EEG signal, spatial filtering one of them.For instance, independent component analysis (ICA) was used [17] to enhance SNR.Limitation of such techniques are that, they are not designed specifically for Brain Computer Interface.To overcome this limitation [12] an unsupervised spatial filtering technique known as xDAWN was introduced.This technique works on assumption: P300 related signal is less frequent and has smaller synchronous subspace in the recorded EEG signal.These synchronous responses for each channel are calculated and then spatial filters are calculated using these responses, in order to make true P300 signals standout form other artifacts and the noise.
Let us X ∈ R NtxNs denotes actual recorded raw EEG signals.In which each entry (i, j) th is x j (i), which represents data from j th channel at instant of time i.Here N s represents index of channel and N t represents time index of a trail.Let us assume a j (t) denote the Event Related Potential signal from j th electrode channel at time course of index t for that trail, whereas A ∈ R NexNs denotes Event Related Potential signal, where (i, j) th value corresponds to a j (i).Here N s represents index of channel and N e is index of time point for Event Related Potential which here corresponds to 260 which is given by single trial duration of 1.3 seconds.Recorded signal of EEG can be formulated as combination of Event Related Potential A and Noise N: Where D represents Toeplitz matrix D ∈ R NexNs .Solution of ( 1) is given by QR factorization [13].
2) Covariances: As discussed above Signal to noise Ratio (SNR) can be enhanced by spatial filtering for better design of Brain Computer Interface systems, but it requires substantial amount of data for training purpose, therefore for BCI scenario more repeated trials from the subject are needed, this makes spatial filtering soleley unfeasible choice for on-line scenario.For BCI systems, covariance matrices for feature extraction were introduced by [15], further [16] improved covariance matrices by including temporal information as well.This all has been possible due to recent theoretical advancements in field of Information geometry [18].The information geometry is a field of mathematics, which takes probability distributions as points of a Riemannian manifold (Manifold has resemblance to homemorphic euclidean space near each point.).Nowadays it is widely applied in various fields.For example, processing of radar signals [19], diffusion tensor [20] and digital image processing [21].Processing the covariance matrices in their native manifold has added benefit.Sample covariance matrix (SCM) also known as prototyped ERP response matrix is calculated for each class by taking average of all epochs belonging to same class.Let P 1 ∈ R s * t be the sample covariance matrix (SCM) for class 1.
Where I is the indexes of i th trail.For each trial x i , modified trail xi is given by combining it with SCM: Final covariance matrices are computed using SCM: The resultant covariance matrix of each epoch is concatenated with xDAWN spatial filters and passed to next step.One can directly only apply Minimum Distance to Mean (MDM) classifier on covariance matrices [9], [16] because they belong to Riemannian manifold of symmetric positive definite matrices (SPD) [22].
3) Tangent Space: As discussed earlier [9,16] that we cannot apply classical classification algorithms to covariance matrices without modification, because they are from riemannian manifold.Although [15] used support vector machine (SVM) by modifying kernel, but this is not an obvious choice.Tangent space mapping helps us in using robust and efficient classifiers for instance svm, elastic net etc. easily by converting riemannian covariance matrices to euclidean space.Tangent vectors for euclidean space are calculated using tangent space mapping on covariance matrices belonging to riemannian Fig. 3. Tangent space TcM corresponds to manifold M at point C. Si tangent vectors are estimated using Logarithmic map LogC(:).The Exponential-map ExpC(:) is used for invers mapping [15].manifold.Process of tangent space mapping is geometrically shown in Figure .3. Logarithmic mapping is used to calculate tangent vectors Si of tangent space TcM, for each point C (which here is actually covariance matrices) of manifold M as shown in Figure .3. Tangent vector space TcM for that point in euclidean space, which is locally homomorphic to manifold can be calculated using:

C. Model building
Different classifier models were used, but elastic net performed better than others.Elastic net learning algorithm is widely used to predict in different classification problems.It is one of the linear regularized regression algorithm, which addresses shortcomings of lasso as well as ridge regression by introducing l1 and l2 penalty [23] and also handles numerical data very well.After feature extraction, elastic net is trained on training set comprises of 5440 trails, after tangent space mapping while p = 2211 number of predictors, class labels of training data is represented by Y = (y 1 , y 2 , y 3 , ...y n ) T and X = [X 1 ]...[X p ] be model matrix where x j = (x 1j , ...x nj ) T .Let λ 1 and λ 2 be the fixed non-negative integers, then objective function for elastic net algorithm in terms of minimization can be defined as: For training and testing purpose Elastic net is used.For tuning of hyper parameters of elastic net were tuned by cross validation.Prediction is done using conventional linear regression equation as:

D. Model evaluation
In performance evaluation, area under the ROC (Receiver Operating Characteristic) curve (AUC) is used, as it has already been used widely for binary classification tasks [24].AUC is calculated using: Where N+ denotes frequency of instances belonging to positive class and N-denotes frequency of instances belonging to negative class.While x 1 , . . ., x N + are probability scores predicted by model for N + positives and y 1 , . . ., y N − denotes probability scores predicted by model for N − negative class.On other hand values of x i and y j are normalized between (0, 1).
For model building and evaluation Data set is used from an experiment [25].In this experiment, brain activity was recorded with total 56 passive EEG electrode channels.Data was collected from 26 subjects (16 training subjects and 10 test subjects) during 5 different spelling sessions.All subjects participated in five sessions, only 12 five-letter words were spelt in first four sessions by subject and in last session 20 five-letter words were spelt.For each letter, trail is lasted for 1.3 seconds.All subjects in these five sessions constitute total training 5440 trials and 3400 test trials.

III. RESULTS
In figure.4 presents the results of fully automated process after applying proposed ML pipeline using ROC curve having 0.79 AUC (Area under curve).While performance for different users across different sessions is shown in [Table 1] in terms of AUC. Figure .5 compares the AUC obtained by proposed method with state-of-the-art and award winning method by Alexandre [5], which shows almost equivalent AUC obtained by both methods, this is also supported by figure.7; where samples of size 200, 500, 800 and 1200 were used from test data and AUC is compared for both models.Another comparison is performed by considering time as a variable.Figure .6 shows comparison between time taken by proposed approach and Alexandres [5] approach, which shows huge difference in time taken by proposed approach and Alexandres approach; on average proposed approach takes 37 seconds while Alexandres [5] approach took 224 seconds.This result is also supported by results presented in figure.8; where time taken by both approaches for samples of size 200, 500, 800 and 1200 from test data is compared.

IV. CONCLUSION
Due to low Signal to Noise Ratio and high dimensional nature of EEG data, classification of P300 signal is one of the challenging problem.Although various approaches have been reported in the literature for classification of P300 signals based on feature extraction methods but most of them uses spatial filters [11] and exploits riemannian geometry [9] but classification of P300 data requires significant method to preprocess.This paper proposes a machine learning pipeline from preprocessing to construction of classification model for  This study can be extended in different ways.As this paper only linear classifiers were used, but other complex classifiers such as neural networks can be used to improve accuracy.Furthermore, we used all electrodes for classification.However, one would like to minimize and select only most relevant electrodes for classification.

TABLE I .
OVERALL PERFORMANCE OF PROPOSED METHODOLOGY