A Review and Classification of Widely used Offline Brain Datasets

Brain Computer Interfaces (BCI) are a natural extension to Human Computer Interaction (HCI) technologies. BCI is especially useful for people suffering from diseases, such as Amyotrophic Lateral Sclerosis (ALS) which cause motor disabilities in patients. To evaluate the effectiveness of BCI in different paradigms, the need of benchmark BCI datasets is increasing rapidly. Although, such datasets do exist, a comparative study of such datasets is not available to the best of our knowledge. In this paper, we provided a comprehensive overview of various BCI datasets. We briefly describe the characteristics of these datasets and devise a classification scheme for them. The comparative study provides feature extractors and classifiers used for each dataset. Moreover, potential use-cases for each dataset are also provided. Keywords—BCI; dataset; brain-computer interface; amyotrophic lateral sclerosis; classification


I. INTRODUCTION
Human brain controls all internal and external body functions.It is responsible for activities such as learning, creativity, memories and many others [1].The functions and structure of the human brain have always fascinated researchers [2], [3].A clear understanding of human brain helps in disease diagnostics and a similar mechanism can be used to develop intelligent machines [4], [5].To reverse engineer the magnificent brain, many sensory computational models have been developed [6], [7].These models cover various aspects of human brain and the data collected from these models is available for further exploration.The data collected from these models have various applications ranging from medical diagnostics to autonomous robot navigation [8], [9].
The sensory models used for brain recordings are broadly divided into two categories: Invasive and Non-Invasive [10].The invasive method is used for medical diagnoses of diseases such as seizure, sclerosis, tumors, epilepsy, and spinal cord trauma.The treatment of these diseases requires surgery for the placement of electrodes in brain gray matter.Invasive methods record brain activity from the cortical surface and include techniques such as Electrocorticography (ECOG) and Intracranial Electroencephalography (IEEG).Non-invasive methods do not need surgery or insertion of an instrument in the patient's body.These methods include models such as Electroencephalography (EEG), Magnetoencephalography (MEG), Functional Magnetic Resonance Imaging (FMRI), Positron Emission Tomography (PET), Infrared (IR) Imaging,
EEG recordings are measured in Hertz (cycles per second) and are divided into different frequency bands as slow, moderate, and fast waves [11] as shown in Table I.To record brain activity, a complete paradigm needs to be designed.The paradigm involves the presentation of cues to the subject.Electromagnetic activity as a result of these cues is recorded using a headset.The cues might include audio or visual information which facilitates us in the collection of brain activity for particular classes which is later used for testing and evaluation.
The brain datasets developed based on various computational models using both invasive and non-invasive techniques.In this paper, we present short description of each datasets and describe different characteristics of the datasets in tabular format.We further classify the datasets according to our proposed classification scheme.
Moreover, these datasets are saved using different formats.These datasets can be used for conducting new experiments and hypothesis validation.Using existing datasets saves the time and energy of researches.These existing datasets and availability of the work already done on them accelerate further development in the area of brain-computer interfacing.The datasets will be help full for fresh researcher in this domain has limited resources and can take full advantage of these datasets.Distribution of paper is given as: In Section 2 we present the review of datasets.The datasets classification scheme is illustrated in Section 3. In Section 4 we summarize classification of datasets and discuss their importance features.www.ijacsa.thesai.org

II. DESCRIPTION OF DATASETS
In this section, we describe different characteristics of each dataset.Many of the datasets are collected using different devices, sampling rate, filter, and classes.Most of the data set are collected by using more than 64 electrodes device for experimentation.Furthermore, we also described how much participants (healthy or unhealthy) takes part in each of the experiment.Next, each of the datasets is concisely explained.

1) Motor Imagery, Uncued Classifier Application:
The dataset on motor imagery uncued classifier [12] was recorded from seven healthy subjects.Data was collected using 64 EEG channels (0.05-200Hz) with a sampling rate of 1000Hz.The task for collecting data was motor imagery and without subject feedback.Motor Imagery dataset from Institute for Knowledge Discovery contains data from 9 healthy subjects [13].22 Ag/AgCl electrodes EEG channels and 3 EOG channels were used and data was recorded with a sampling frequency of 250Hz.Data was notch filtered between 0.5Hz and 100Hz.

2) Hand movement direction in MEG:
Dataset Hand Movement Direction in MEG was recorded from 2 right-handed healthy subjects [14].It was recorded using 10 MEG channels and a band pass filtered between 0.5 Hz to 100 Hz with a sampling frequency of 400 Hz.The subjects performed wrist movements in four different directions using MEG with 625 Hz sampling rate.

3) Finger movements in ECoG:
Finger Movements in ECoG dataset was recorded from epileptic patients at Harborview Hospital in Seattle [15].The data was recorded using 48 to 64 ECoG channels band pass filtered between 0.15Hz to 200Hz having 1000Hz sampling rate.The dataset contains ECoG data during individual flexions of the five fingers; movements acquired with a data glove.

4) Error-related potentials (ERPs) during continuous feedback:
Error-Related Potentials (ERP) During Continuous Feedback dataset [16] was recorded from 10 healthy subjects having age between 24 years to 25 years.28 EEG electrodes were used to record EEG and 3 EOG electrodes were used to measure EOG.The data has a sampling rate of 512 Hz, notch filterer at 50 Hz and band-pass filtered between 0.5 Hz and 60 Hz.

5) Two-finger game-play with deliberately failing controller:
Another dataset on Two-Finger Gameplay with Deliberately Failing Controller comprises of data on 12 subject performing a paceman game [17].For recording purpose the EEG and Biomedical signals, BioSemi ActiveTwo EEG system was used having a sampling rate of 512 Hz. 32 Ag/AgCl active electrodes were used to record the EEG signals.4 EOG was used to measure ocular and muscle artifacts and 4 EMG signals over the muscles used to press with the index finger.

6) Covert and overt ERP-based BCI:
Covert and Overt ERP based BCI Dataset [18] contains recordings of P300 evoked potentials.It was recorded with BCI2000 using two different paradigms like overt attention and covert attention based on P300 Speller [19] and GeoSpell interface [20] respectively.The EEG signals digitized at 256 Hz with frequency range in between of 0.1 Hz and 20 Hz were recorded with 16 Ag/AgCl electrodes using g.USBamp amplifier.It was recorded from 10 healthy female subjects having a mean age of 26.8+-5.6.

7) Neuroprosthetic control of an EEG/EOG BNCI:
The dataset on Neuroprosthetic Control of an EEG/EOG BNCI used 26 5 EEG channels for EEG recordings using an active electrode EEG system.EEG signals had a sampling rate of 200 Hz and band-pass filtered between 0.4 Hz to 70 Hz. 1 EOG channel was also used with a sampling rate of 200 Hz.The dataset was collected from a highly defected spinal cord patient with upper limb paralyzed.

8) Individual imagery:
Another dataset on Individual Imagery [21], in which EEG data was recorded from 9 patients instructed to relax and avoid eye moments, suffering from spinal cord injury and brain stroke.EEG signals were recorded using 30 channels.The g.tec GAMMAsys a system with g.LADYbird active electrodes and 2 g.USBamp bio signal amplifiers were used for recording.The EEG signals were band-pass filtered between 0.5-100 Hz with a notch filter at 50 Hz having a sampling rate of 256 Hz.

9) ECOG-based BCI based on cognitive control:
ECOG-Based BCI Based On Cognitive Control dataset [22] is about cognitive control network for BCI purposes.They used FMRI for non-invasive localization of the cognitive control network and recorded data from an epilepsy patient, who was implanted with subdural grid electrodes over the left and right frontal cortex temporarily.The subject performed two target tasks in several runs using the high-frequency power of 55-95 Hz.

10) Emergency braking during simulated driving:
Emergency Braking During Simulated Driving dataset [23] was collected from 18 subjects by using 59 EEG electrodes and 2 bi-polar EOG with Ag/AgCl electrodes mounted on a cap with a sampling rate of 200 Hz.BrainAmp hardware was used to amplify and digitize the EEG and EMG signals.TORCS software was used to provide information about technical and behavioral markers.

11) Mental arithmetic:
Mental arithmetic dataset [24] was recorded from 8 subjects (3 male and 5 female) having a mean age of 26 years with a standard deviation of 2.8 years.A multi-channel system was used which contain 16 photo detectors and 17 light emitters, and it was a 3 x 11 grid having a total of 52 FNIRS with a sampling rate of 10 Hz.To record brain oxygenation continues wave system was used.An aggressive hemodynamic response [24] was shown during the tasks of mental arithmetic.www.ijacsa.thesai.org

12) Auditory oddball during hypnosis:
Auditory oddball during hypnosis dataset was recorded from 2 healthy subjects, one male and one female of righthanded by using 27 EEG active electrodes and 4 EOG channels were used to record eye movement by using a sampling frequency of 512 Hz.Data was band-pass filtered between 0.01-100 Hz and notch filtered at 50 Hz.

13) SCP Training in stroke:
The dataset on SCP Training in Stroke was recorded from 2 chronic stroke patients by using a single electrode Cz with a Nexus-10 MKII DC amplifier having a sampling frequency of 256 Hz.For a record of eyes movement, 2 bipolar EOG electrodes were used.

15) Motor imagery, small training sets:
Motor Imagery dataset [26] is focused on applying machine learning approach to BCI.The EEG data were recorded from 5 healthy subjects by using BrainAmp amplifiers, in which 118 out of 128 channel Ag/AgCl electrode were used and data were band-pass filtered between 0.05-200 Hz, digitized at 1000 Hz.For analysis purpose another version of data with a down-sampled rate of 1000Hz.

16) Monitoring error-related potentials:
Monitoring error-related potentials dataset [27] is about Error Related Potential (ERP) recorded via EEG.The subject had to monitor the performance of an external device that was not controlled by subject.The EEG data were recorded from 6 subjects having a mean age of 27.83 +-2.23 by using 64 electrodes of Biosemi active two systems at full DC with a sampling rate of 512 Hz.

17) Emotion recognition using EEG signals:
Another dataset is Emotion Recognition Using EEG Signals [28], collected from 15, 7 males and 8 females had a mean age of 23.27 years and standard deviation of 2.37 years.ESI NeuroScan System is used, in which a total of 62 Ag/AgCl electrodes channels were used with a sampling frequency of 1000 Hz.A band-pass filter with a frequency range between 0-75 Hz was used.

18) Visual search within natural images:
Visual Search within Natural Images [29] dataset is a short demo of 5 experimental trials.Brain Products amplifiers having 25 recording channels were used to record EEG signals and this data was band-pass filtered between 1-40 Hz offline.For eye movements binocularly was used and SMI IView X tracker was used and data was recorded with a sampling rate of 500 Hz.

19) EEG Eye State (Planning &\ Relax):
EEG Eye State (Planning and Relax) dataset [30] contains EEG recordings that are used for classification for two mental stages namely planning of motor imagery actions and relaxed state.The dataset was recorded from a 25 years old healthy right-handed subject.A Medelec Profile Digital EEG machine was used for recordings which contain 8 EEG Ag/AgCI electrodes and has a sampling rate of 256 Hz This data was filtered with a high frequency of 50 Hz and low frequency of 1.6 Hz, notch filtered at 50 Hz.

20) Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity:
The dataset for Indications of Nonlinear Deterministic and Finite Dimensional Structures in Time Series of Brain Electrical Activity: Dependence on Recording Region and Brain State [31] was provided to study the dynamic properties of brain EEG signals recorded from different brain regions and different physiological and pathological brain states.The datasets were recorded from 5 healthy subjects by using 128 electrodes with a sampling rate of 173.61Hz.The datasets were low-pass filtered at 40 Hz and band-pass filtered at 0.53 Hz to 40 Hz.

21) Self-regulation of SCPs:
The dataset for Self-Regulation of SCPS [32] was presented to study cortical positivity and cortical negativity.The dataset was recorded from an artificially respired Amyotrophic Lateral Sclerosis (ALS) patient using 6 channels using PsyLab EEG Amplifier with a range of +/-1000 microV with a sampling rate of 256 Hz.

22) Self-paced:
Self-Paced dataset [33] was presented to predict the upcoming finger movement for both hands, 130 milliseconds before the key press.The data was recorded from a healthy subject with no feedback provided by using 28 EEG channels with a sampling rate of 1000Hz.NeuroScan amplifier and Ag/AgCl electrode cap from ECI was used for recording EEG signals.The dataset is band pass filtered between 0.05-200Hz and down sampled with 100Hz.

23) EEG Motor Movement/Imagery Dataset:
Motor Movement and Imagery Dataset [15] was provided by the developers of the BCI2000 instrumentation system [34].The data were collected from 109 subjects by using 64 EEG channels and each signal sampled at 160 samples per second.The data is provided in European Data Format (EDF+).DREAMS Subject dataset [35] was recorded during DREAMS project to analyze, test and train classification algorithms for automatic sleep stages.

24) The DREAMS Subject database:
The datasets were collected from 20 subjects 14 females, 4 males having age between 20 years to 65 years by using polysomnographic (PSG) for the whole night with a sampling frequency of 200Hz.The PSG recording was annotated in different sleep stages according to criteria of Rechtschaffen and Kales (R and K) [36] standards introduced by American Academy of Sleep Medicine (AASM).The data was acquired www.ijacsa.thesai.org in sleep lab of Belgium Hospital.For each recording, at least 1 EOG channel, 3 EEG channels and 1 EMG channel were used.

25)
The DREAMS REMs Database: DREAMS REMs database [35] is about Rapid Eye Movement (REM) [37] sleep, which is a period of sleep during which one experiences clear dreams.This dataset was recorded for DREAMS project to analyze, train and test the classification algorithms for automatic detection.There were 9 excerpts, each of which was 30 minutes long.The dataset was acquired in a sleep lab of Belgium Hospital and recorded from 5 healthy subjects of both males and female having age between 20 years to 46 years by using 32 channel polygraph of BrainnetTM System of MEDATEC, Brussels, Belgium.The recordings involved 2 EOG channels, 3 EEG channels and 1 EMG channel with a sampling frequency of 200Hz.

26) 4 Class EEG Data:
The dataset for Multi-Class Motor Imagery EEG [38] is about multi-class cued motor imagery having four classes as left hand, right hand, foot, and tongue.The data were recorded from three subjects with 60 EEG channels amplifier from Neuroscan, where left mastoid was used as a reference and right mastoid was used as ground.The EEG data were sampled at 250 Hz and filtered between 1-50 Hz with notch filter on.There were 60 trials per class and data across all trials were concatenated.
This dataset was stored as Geographic Data Files (GDF) [39].

27) Motor imagery in ECoG recordings, session-tosession transfer:
Dataset for Motor Imagery in ECOG Recordings, Sessionto-Session Transfer [40] was provided with the goal that classifiers for BCI systems usually do not perform better for the data acquired on different days and sessions from the same subject without retraining.The dataset was recorded by using 8x8, 64 channel ECoG platinum electrode grid, placed on the right motor cortex also partly covering surrounding cortex areas and was sampled at 1000 Hz.The data was also filtered at a frequency range of 0.016-300 Hz.The data was about the cued motor imagery of left pinky and tongue from 1 subject.

28) P300 speller paradigm:
The dataset for P300 Speller Paradigm [19] was presented with the goal to estimate the probability of subject paying attention to the letters in a 6 x 6 matrix by intensifying the rows and columns respectively.This dataset was collected from 2 healthy subjects by using BCI2000 system with 64 EEG channels, which is digitalized at a sampling rate of 240Hz and data were band-pass filtered between 0.1 Hz to 60 Hz.

29) ERP-based Brain-Computer Interface recordings:
The dataset for ERP-based Brain-Computer Interface [41], [42] was provided with the goal to identify the factor affecting the performance of BCI based on event-related potentials and to improve the usability and transfer rate of these interfaces.The data were recorded from 12 subjects by using a BioSemi ActiveTwo EEG system.A total of 64 EEG electrodes were used with a sampling frequency of 2048 Hz.

30) EEG brain wave for confusion:
The data was stored in European Data Format (EDF+) with signals and annotations.EEG dataset for confusion [43], [44] contains EEG signals recorded from 10 college students who watched 10 Massive Open Online Course (MOOC) video clips.There are two types of videos confusing and nonconfusing for each student.For recording signals, singlechannel wireless-Mindset over frontal lobe of subjects are used.There are 100 data points and sampling rate was 0.5 seconds and high-frequency signals were reported during this 0.5 seconds.

31) Motion VEP Speller:
Motion VEP Speller dataset [45] was provided with the goal to estimate the usability of gaze-independent communication.The data were recorded from 16 healthy subjects, 10 males, and 6 females, having age between 21 years to 30 years with a mean age of 23.8 years.To record EEG signals, Brain Products (Munich, Germany) actiCap active electrode system with 64 electrodes and a BrainAmp EEG amplifier was used.The data were sampled at 1000 Hz and band-pass filtered at a frequency range of 0.016-250 Hz.An Intelligence IG-30 (Alea Technologies) eye-tracker was used to control eye movements with a sampling rate of 50 Hz.

32) Center Speller:
Center Speller dataset [46] was provided with the goal to develop a visual speller named Center Speller that does not require eye movements.The data was recorded online from 13 healthy subjects, 8 males 5 females aged 16 years to 45 years with mean age of 27 years.The EEG data was recorded with Brain Products (Munich, Germany) actiCAP using 64 electrodes with a sampling rate of 1000 Hz.A band pass filter was used at a frequency range of 0.016-250 Hz.

33) SSVEP-based BCI with LED:
SSVEP-Based BCI with LED dataset [39] was recorded from 5 male and female subjects having age between 22 years to 30 years by using 8 EEG channels with g.Mobilab+ device with a sampling frequency of 256 Hz.

34) Global datasets for autism disorder:
Global Datasets for Autism Disorder [47] was collected from 18 boys, 10 normal and 8 abnormal having age between 10 years to 16 years.To record the EEG signals, a recording system from BCI2000 with active electrodes and the Active digital EEG amplifier was used.The recording system contained 16 Ag/AgCl, g.tec EEG cap, electrodes, g.tec GAMMAbox, g.tec USBamp and BCI2000 with a sampling frequency of 256 Hz.The data were band-pass filtered at a frequency between 0.1 Hz to 60 Hz and notch filtered at 60 Hz.

35) Event-related potential datasets based on a threestimulus paradigm:
The dataset for Event-Related Potential Datasets Based On a Three-Stimulus Paradigm [48] was provided with the goal to introduce three-stimulus paradigm for the P300 component and provide datasets for three-stimulus paradigm EEG/ERP www.ijacsa.thesai.orgfreely available to the researcher.This dataset was collected from 25 healthy subjects, 9 males and 11 females having age between 20 to 26 years and 19 of them were right-handed.The EEG data was recorded of event-related potentials (ERP) of 20 subjects by using BrainVision Recorder 1.2 and data was stored in BrainVision format.The data of other 5 subjects were discarded due to excessive eye blinking.The data was sampled at 100 Hz and low-pass filtered with a cut-off frequency of 250 Hz.

III. CLASSIFICATION
In this section, we present the classification background and the proposed classification scheme under which various datasets can be classified.

A. Classification Background
The datasets can be classified based on the cognitive behavior of human or functional atlases of the brain.The classification helps us understand which part of the brain is being activated and which brain processes are generated in response to a particular cognitive action.We present a classification scheme for datasets that cover various aspects of human behavior, cognitive states, and abilities.
Behavioral neuroscience [49] is the study of physiology, genetics, behavioral evolution, and development mechanism in animals and humans under the principles of biological sciences.It mainly deals with brain functions and components, neural activity, neurotransmitters, hormonal changes, behavioral evolution and their effects on behavioral changes.It is also termed as biological psychology or psychobiology [50].
Studies in the field of behavioral neuroscience are mainly directed towards animals and humans to better understand the human pathology and mental processes.Due to the technological advancements and development of non-invasive methods, behavioral neuroscience now also deals with linguistics, philosophy, and psychology [51].The datasets can be broadly divided into following categories:

1) Sensation and Perception:
Human is considered to have five basic senses as proposed by Aristotle [52].The sensation is the body's way of detecting some external or internal stimulation.Particular brain regions generate, receive and interpret specific signals based on sensation.The various senses are as follows [53]:  Sight: This sense to see something.There are two distinct receptors present related to sight, one for color (cones) and one for brightness (rods) [54].
 Taste: sweet, salty, sour, bitter, and umami (umami receptors detect the amino acid glutamate, which is a taste of meat and some artificial flavoring) [55].
 Touch: This has been found to be distinct from pressure, temperature, pain, and even itch sensors [56].
 Pressure: It is a type of skin pressure which results from persisting pressure on the skin.
 Itch: This is a distinct sensor system from other touchrelated senses.
 Thermoception: This is the ability to sense heat and cold.There are different types of thermoreceptors for detecting heat or cold in the brain.These thermoreceptors in the brain are used for monitoring internal body temperature.
 Sound: Detecting vibrations along some medium, such as air or water that is in contact with your ear drums [57].
 Smell: This sense is due to sensors that work off of a chemical reaction.This sense combines with taste to produce flavors.
 Proprioception: This sense gives you the ability to tell where your body parts are, relative to other body parts.This sense is one of the things police officers test when they pull over someone who they think is driving drunk.The close your eyes and touch your nose test is testing this sense.This sense is used all the time in little ways, such as when you scratch an itch on your foot, but never once look at your foot to see where your hand is relative to your foot [56].
 Tension Sensors: These are found in such places as your muscles and allow the brain the ability to monitor muscle tension.
 Nociception: In a word, pain.This was once thought to simply be the result of overloading other senses, such as touch, but this has been found not to be the case and instead, it is its own unique sensory system.There are three distinct types of pain receptors: cutaneous (skin), somatic (bones and joints), and visceral (body organs) [58].
 Equilibrioception: The sense that allows you to keep your balance and sense body movement in terms of acceleration and directional changes.This sense also allows for perceiving gravity.The sensory system for this is found in your inner ears and is called the vestibular labyrinthine system.www.ijacsa.thesai.org Stretch Receptors: These are found in such places as the lungs, bladder, stomach, and the gastrointestinal tract.A type of stretch receptor, that senses dilation of blood vessels, is also often involved in headaches.
 Chemoreceptors: These trigger an area of the medulla in the brain that is involved in detecting blood born hormones and drugs.It also is involved in the vomiting reflex.
 Thirst: This system more or less allows your body to monitor its hydration level and so your body knows when it should tell you to drink.
 Hunger: This system allows your body to detect when you need to eat something [59].
 Magnetoception: This is the ability to detect magnetic fields, which is principally useful in providing a sense of direction when detecting the Earth's magnetic field.Humans do not have a strong magnetoception, however, experiments have demonstrated that we do tend to have some sense of magnetic fields.It is theorized that this has something to do with deposits of ferric iron in our noses [60].
 Chronoception: This refers to how the passage of time is perceived and experienced.Humans have a startling accurate sense of time, particularly when younger.Long term time keeping seems to be monitored by the superchiasmatic nuclei (responsible for the circadian rhythm).Short term time keeping is handled by other cell systems [61].
 Electroreception: Electroreception (or electroception) is the ability to detect electric fields.
 Hygroreception: This is the ability to detect changes in the moisture content of the environment.
 Equilibrioception: Balance, equilibrioception, or vestibular sense is the sense that allows an organism to sense body movement, direction, and acceleration, and to attain and maintain postural equilibrium and balance.

B. Proposed Datasets Classification Scheme
By analyzing different components of the brain and their associated functions they perform, we can classify the datasets on the basis of our classification scheme as shown in Table I, which shows that there is a wide range of mental tasks that need to be considered for BCI research.But there are some potential problems in recording many brain activities.Dataset column in Table II, show the category of each of the dataset.

IV. DISCUSSION
The brain contains about 100 billion neurons and each neuron is constantly sending and receiving signals through a complex mechanism.During certain activity performed by the human brain, neurons make thousands of connections through these processes which are difficult for EEG electrodes so the signals need to be disentangled.Our thoughts, movements, actions, learning, and decision are the result of complex electro-chemical processes in the brain.The BCI datasets target a limited number of mental tasks as it is very complex www.ijacsa.thesai.org to map brain signals into human actual intentions.Brain signals corresponding to certain activities such as sneezing is quite difficult to capture.It requires specific environment parameters and settings to get better results.Also, the cost of devices used for recording signals is reasonably high and complexity of brain structure is a hindrance in the way of recording, analyzing and mapping brain signals to certain human activities.These may be the reasons why such datasets available online are limited to simple activities.Comparing the sampling frequency of these datasets shows that most of the datasets lie in the range of 200 Hz to 1000 Hz but we found only one dataset with a high sampling rate of 2048 Hz.The bar chart in Fig. 2 shows the datasets with a specific frequency.
By looking at the dataset from another perspective, the Fig. 1 shows that 47% of the datasets are related to motor skills -either imagery or physical.21% datasets are related to attention and concentration, 14% related to memory and language, 12% related to goal-oriented behavior, 4% related to biological rhythms and only 2% related to perception and concentration.This comparison reveals that brain activities corresponding to human senses and biological rhythms such as dreams and sleep are difficult to capture.Special equipment, environmental and experimental setup is required which is either costly or hard to achieve.Datasets are collected by many institutes, research groups who are continuously working and struggling to achieve accurate results.Many researcher groups have worked to collect different datasets.
Table III illustrates which paper make use of the datasets and which feature extractor and classifier used in the research.While a datasets references are also presented which includes references to research papers in which these datasets have been described and elaborated.The references to the papers that used or cited these datasets for their research work is also shown in Table III.
As the datasets contain records of the subject's specific activity for a limited period of time which contain noise ratio that creates a problem when getting output.Some of the researcher groups and institutions used a hybrid system for EEG signals acquisition where they used EEG, EOG, ECOG, MEG, and FNRI to get the precise outcome.It is also well known that EEG signals capturing devices are non-invasive, low-cost, and modest.It was the reason that most datasets were collected using EEG method as shown in Fig. 3.
The datasets are recorded for brain activities using different devices, mapped to some mathematical form and stored using different formats.Some formats in which datasets are stored are .edf(European Data Format), .dgf(General Data Format), .mat,.txt,.bdf(Glyph Bit Distribution Format), .dat,.vhdr,.vmrk,.set,.avg,.eeg,.cntetc.As there is a wide range of data formats, the processing of the datasets is rather a complex task.Also, the software available for brain signal processing, support a limited number of data formats.There is a lack of standard formats and structures in which datasets are recorded.
The datasets presented here have been used by many researchers over time.Understanding and recognizing human intentions via brain signals is an important step and needs complex data analysis and processing.Various softwares are available for analyzing brain activity with limited techniques.Therefore, some standards and tools are required to make research easy in the field of brain-computer interface.Software tools can be helpful in determining a comparison between different methods of data processing, determining hyper-parameters required for particular algorithms and defining compatibility of certain concepts.BCI datasets are mostly goal oriented.Researchers working on specific BCI application prefer to generate their own datasets that are mostly not available publicly.V. CONCLUSION We have explored and discussed different datasets of the BCI research in which most of them are based on EEG.We presented a comparison of datasets with respect to the frequency of different datasets which shows that most of the datasets are collected in a frequency range from about 200-1000 Hz.A classification schemewith six categories -was proposed for the datasets categorization.A comparison of datasets with respect to their respective categories shows that most of the datasets are related to brain activity during motor skills.Mostly, SVM and LDA classifiers were used to process and classify these datasets.


Sensation and Perception  Motivated Behavior  Control of movement  Learning and memory  Sleep and biological rhythms  Emotion  Language  Reasoning and decision making  Consciousness

Fig. 1 .
Fig. 1.Comparison of different datasets based on various numbers of channels used in EEG, EMG, ECOG, MEG, EOG and FNRI.
has 32 integrated electrodes (DC-256Hz) having a sampling rate of 512 Hz were used to collect data from 3 normal subjects during 4 nonfeedback sessions.The dataset is presented in two ways, raw EEG signals, and precomputed features.Raw EEG signals have a sampling rate of 512 Hz.On the other hand, in precomputed features, Surface Laplacian was used to spatially filter raw EEG signals.Imaginary repetitive self-paced righthand movements and generating words start with same random letter.