Novel Methods for Resolving False Positives during the Detection of Fraudulent Activities on Stock Market Financial Discussion Boards

Financial discussion boards (FDBs) have been widely used for a variety of financial knowledge exchange activities through the posting of comments. Popular public FDBs are prone to being used as a medium to spread false financial information due to larger audience groups. Although online forums are usually integrated with anti-spam tools, such as Akismet, moderation of posted content heavily relies on manual tasks. Unfortunately, the daily comments volume received on popular FDBs realistically prevents human moderators to watch closely and moderate possibly fraudulent content, not to mention moderators are not usually assigned with such task. Due to the absence of useful tools, it is extremely time consuming and expensive to manually read and determine whether each comment is potentially fraudulent. This paper presents novel forward and backward analysis methodologies implemented in an Information Extraction (IE) prototype system named FDBs Miner (FDBM). The methodologies aim to detect potentially illegal Pump and Dump comments on FDBs with the integration of per-minute share prices in the detection process. This can possibly reduce false positives during the detection as it categorises the potentially illegal comments into different risk levels for investigation purposes. The proposed system extracts company’s ticker symbols (i.e. unique symbol that represents and identifies each listed company on stock market), comments and share prices from FDBs based in the UK. The forward analysis methodology flags the potentially Pump and Dump comments using a predefined keywords template and labels the flagged comments with price hike thresholds. Subsequently, the backward analysis methodology employs a moving average technique to determine price abnormalities and backward analyse the flagged comments. The first detection stage in forward analysis found 9.82% of potentially illegal comments. It is unrealistic and unaffordable for human moderators or financial surveillance authorities to read these comments on a daily basis. Hence, by integrating share prices to perform backward analysis can categorise the flagged comments into different risk levels. It helps relevant authorities to prioritise and investigate into the higher risk flagged comments, which could potentially indicate a real Pump and Dump crime happening on FDBs when the system is being used in real time. Keywords—Financial discussion boards; financial crimes; pump and dump; text mining; information extraction


INTRODUCTION
The internet has become the number one source for information.Unsurprisingly, this includes financial advice and investor sentiments.There are many online forums where likeminded people can hold conversations in the form of posted messages.Financial Discussion Boards (FDBs), also known as Financial Message Boards or Financial Forums allows investors to exchange knowledge, information, experience and opinions about the investment opportunities.There are a few popular share price based FDBs based in the UK which specifically allows investors to discuss share prices.These FDBs include the London South East1 , Interactive Investor (III) 2 and ADVFN 3 .
Normally, forum content is moderated by human moderators when it is discovered or reported for breaching forum rules such as racism, sexism, hatred, foul language, third party advertisements and so on.Although online forums seem to be a useful source of information, not all information shared on the forums is accurate or truthful.Even anti-spam plugins such as Akismet 4 can only prevent spammers from registering or posting generic spam messages.There is little to no measurements taken by forum moderators or financial surveillance authorities to monitor and detect potential crimes on the FDBs, such as comments indicative of Pump and Dump (P&D).
P&D can happen if an organised group of false investors decided to attack shares by buying and selling a specific share in a scheduled time frame and giving the market false statements about the share throughout the process.Textual comments such as -This is the right time let's start pumping this share‖ can reveal a hidden potential illegal activity of P&D on these FDBs.Novice investors can be easily deceived and make huge losses during the -dump‖ while the fraudsters take huge profits.Without a tool, manual monitoring and detection of potentially illegal activities on popular and active FDBs can www.ijacsa.thesai.orgcost significant time and money, which is impracticable in the long run.
There has been research conducted around the area of share price based FDBs associated with P&D financial crimes [1]- [6].Research from recent years highlighted that the comments on FDBs were found manipulative and positively related to the market returns, volatility and trading volumes [7]- [11].However, there has been very little attempt [5], [6] made to build tools for monitoring and detection of potential financial crimes on share price based FDBs.Furthermore, other than the initial work presented in [12], none of the other existing research take share prices into account when designing a financial surveillance tool for detection of potentially illegal FDB comments.
FDBs contain semantically understandable artefacts (i.e.FDBs' artefacts that can be processed by computers) such as stock ticker symbols, date, time, prices, comment author usernames and comments.Information Extraction (IE) is defined as the process of extracting information automatically into a structured data format from an unstructured or semistructured data source [13].Therefore, IE techniques are used in this research to extract and analyse these data.IE has been used in other areas such as accounting [14] and search engine [15].However, other than the initial work described in [6] and [16], there is very little usage of IE techniques in FDBs' financial crimes related research.
Two novel methodologies, i.e. forward analysis and backward analysis, are introduced in this paper are implemented in a prototype system named FDBs Miner (FDBM).The methodologies are used to detect potential P&D crimes on FDBs by flagging potentially illegal comments and reducing false positives (i.e.errors present in evaluation processes or scientific tests that are mistakenly found) during the detection process.FDBM could significantly support financial surveillance authorities to regulate by enabling realtime monitoring and alerting based on fraudulent risk levels.
In the forward analysis methodology, all the potentially illegal comments will first be highlighted and flagged.This is done by analysing the comments against the predefined P&D IE keywords template.Next, the method matches and appends the price figure to the flagged comments which share the same or closest date and time based on same ticker symbol.Subsequently, the forward analyser takes each flagged comment's price as a base price and calculates ± 2 days' worth of prices to check if there is any price hike 5%, 10% and 15% more than the base price.Finally, it appends the price hike threshold labels to these flagged comments.By doing so, a relevant authority can pick the comments belonging to any threshold depending on the severity for investigations.Although the forward analysis in this research has drastically reduced the number of comments needed to be read by relevant authorities, the amount of categorised flagged comments could still be somewhat large to read on a daily basis.Thus, a backward analysis methodology is designed to overcome this issue.
In the backward analysis methodology, a simple moving average method is used to calculate and highlight the price hikes.Any price hikes that hit certain price hike thresholds will be matched backwards to the flagged comments found in the forward analysis stage.Such matches are done so that the already flagged comments can be further classified to reduce false positives and allow investigators to quickly examine the higher and highest risked flagged comments before everything else.
Section II describes some examples of FDBs related financial crimes and reviews the background and usage of Information Extraction (IE) and Text Mining.Section III presents the architecture overview of the FDBs Miner (FDBM) prototype system and an overview of the FDBs dataset (FDB-DS).This followed by Section IV and V introducing the two novel methodologies (i.e.forward analysis and backward analysis) respectively and discussing the findings.Lastly, Section VI concludes the research and proposes some future work.

II. BACKGROUND
This section first provides a few related and significant examples of financial crimes on share price based FDBs, followed by the literature review related to IE and text mining which are the techniques used in this research for locating meaningful information, and collection and formation of datasets respectively.Lastly, Pump and Dump (P&D) and FDBs related literature review will also be presented.

A. Financial Crimes on Share Price based FDBs
Generally, there are many P&D financial crimes which are actively investigated and dealt with by the Security Exchange Commission (SEC) for many years.However, P&D crimes on FDBs are loosely monitored by FDB moderators and relevant authorities.There were several popular FDB related P&D financial crimes in the early years, which are highlighted as follows:  15-year-old Jonathan Lebed was the first minor to involve in a stock market fraud in 2000 [3].Lebed earned a total revenue of US$800,000 by pumping the share price through Yahoo!Finance Message Board over half a year and charged by Security Exchange Commission (SEC) [3], [4].
 In 2000, two people were being charged for pumping the price of a share by 10,000% by posting on the Raging Bull message board and then dumped millions of shares which the profit made were at least US$5 million [3].
 In addition, in 2009, eight participants were charged by the Security Exchange Commission (SEC) 5 for being involved in penny stock (i.e. stock prices that are less than a dollar) manipulation throughout the year of 2006 and 2007.These wrongdoers met each other through a popular penny stock message board.
Based on the above FDBs related P&D financial crimes, there is certainly a need to create methods and tools for detection of potentially illegal FDB comments in real time.www.ijacsa.thesai.orgThis is instead of investigating the crimes after being committedwhich is probably too late as the harm has been done.

B. Information Extraction and Text Mining
This research makes use of Information Extraction (IE) and Text Mining.IE is defined [17] as the process of extracting information automatically into a structured data format from an unstructured or semi-structured data sources.It was suggested [18] that there is a need for systems that extract information automatically from text data.IE is not Information Retrieval (IR) [19].The difference between IE and IR is that IE extracts information that fits predefined templates or databases and then presents the information to the users, whereas IR finds data and presents the information to the users.IE systems are knowledge-intensive as these systems extract only snippets of information that will fit predefined templates (fixed format) which represent useful and relevant information about the domain then display to the user.
IE is divided into two fundamental classes i.e. the Knowledge Engineering (KE) approach and the automatic training approach.The KE approach is also called as the rulebased approach since it requires rules to be developed by the human expertise.Rule-based approach is usually ignored in the research community, but it is mostly favourable in the commercial market even by the large vendors such as IBM (for text analysis systems) and Microsoft (enterprise search platform) [20].Rule-based IE systems are easy to maintain and comprehend as well as errors being traced and fixed easily.On the other hand, although the automatic training approach, also known as machine learning approach, requires less manual efforts, the approach requires pre-labelled data and retraining for adaptation [20].This paper focuses on IE implementation since it is designed to support the financial market surveillance authorities.
Text mining was described [21] as the process to extract useful information from unformatted textual data or natural language text into a form of meaningful knowledge for processing.According to [22], the internet users have been seeking and sharing opinions and information using the Internet more easily than ever and this raises concerns about the credibility of the information sources.This means the likelihood of getting deceptive information is also significant.Similarly, on popular share price based FDBs that receive a significant amount of comments in each day, novice investors who seek investment advice could also be deceived easily.Also, a text mining based study was conducted [23] on a Twitter dataset and its relationship to be able to predict stock prices.In addition, stock price trends were also being successfully forecasted via press releases using text mining techniques [24].
In this paper, text mining is used alongside IE rule-based technique to extract and analyse FDB artefacts such as comments, prices and stock ticker symbols.

C. Pump and Dump and Share Price Based FDBs
Traditionally, Pump and Dump (P&D) happens mostly through word of mouth.But with the existence of the Internet, it becomes so common that the fraudsters commit crimes through various channels such as emails, discussion boards and social media.
The use of spam emails is one of the older tactics.Regulators like Securities and Exchange Commission (SEC) has been actively taking actions against P&D spam campaign fraudsters.Email spam filters are also constantly being improved by Internet services such as Google and Symantec.In research conducted in [25], a total of 1,299 suspicious stock recommendation emails was obtained.It involved 221 stocks recommended in 252 advertising campaigns.An event study and a sentiment analysis have been conducted on whether P&D involving the internet is still an issue in today's world.Unsurprisingly, the research empirically proved that the internet still plays a major role in enhancing this type financial crime.Due to the limitations in spam emails, newer tactics such as social media and discussion boards were adopted mainly because these channels allow more freedom of speech.
Other researchers [7]- [11] have found the relation between FDB comments and market performance.FDB comments can be manipulative and affect the share prices.
In [5], the authors introduced a novel classification technique for a classifier training in order to automate moderation tasks on online discussion sites (ODSs).A partially labelled corpus is used for the training purpose and then attempt to moderate the inappropriate content on ODSs using the technique.The authors implemented and tested the technique on a corpus of comments posted on a popular Australian FDB named HotCopper 6 .The results indicated that the classification technique is helpful and can be used to decrease the number of comments that need to be moderated by human moderators.However, this system is not yet a fully automated moderation system due to the use of partially labelled corpus.According to the authors, the misclassification errors remain too significant.Besides, the research takes only comments into account and no prices involved during the classification of comments.
A system named Financial Discussions Detection System (FDDS), an initial work to this research, was proposed by the authors in [6] to flag potentially illegal comments made on FDBs.The system allows users to create and modify predefined templates (i.e.lists of potentially illegal keywords that commenters may or frequently use on FDBs), download comments from FDBs and matches the downloaded comments against the potentially illegal keywords created in earlier steps.By looking only at the comments during the detection processes appear to be insufficient in terms of accuracy.Thus, this paper introduces the novel methodologies in attempt to reduce false positives by integrating share prices in the detection process.
The authors in [11] examined whether the messages posted on the largest stock message board in Australia, HotCopper, has an impact on the Australian Stock Exchange (ASX) market.Results show that the FDB messages have impacts on the small capitalisation stocks but not affecting the large stocks.www.ijacsa.thesai.org In [26], the authors introduced a software prototype (FMS-DSS) to support decision making in financial market surveillance.FMS-DSS consists of three components i.e. data, models and user interface.The system collects both unstructured and structured data of the selected listed companies.The models take into account of attributes such as market segment, market capitalisation, trading volume, age of company and so on.Subsequently, attribute scales ranging from very low to very high were defined by the regulatory authority members.The scales were then used for aggregation to determine whether there is suspicious activity happening.
In the research presented in this paper there is an attempt to resolve what was missing in existing research.Share prices are taken into account when flagging potentially illegal comments, accompanied by two key novel built-in methodologies (namely, the forward analysis and the backward analysis) for resolving false positives during the comments flagging process.

III. ARCHITECTURE OVERVIEW
This section presents the FDBM architecture which consists of several key components.These key components are the data crawler, data transformer, FDB dataset (FDB-DS), IE keyword template, forward analyser and the backward analyser (Fig. 1).Fundamentally, FDBM collects data, transform unstructured data into structured data format and analyse the data using both forward and backward analysers.The forward analyser and backward analyser components are used within the novel methodologies introduced in this paper attempt to resolve false positives during the process of detection of potentially illegal comments.

A. Overview
Fig. 1 provides an overview of the FDBM architecture of the prototype system.Each component in the architecture diagram is described as follows: 1) Data Crawler: The data crawler is responsible for automatically collecting unstructured data from the three FDBs (i.e.LSE, III and ADVFN) at different time intervals for a period of 12 weeks (from 23 rd September 2014 to 22 nd December 2014).These unstructured and semi-structure data consist of 941 ticker symbols that were listed on London Stock Exchange (LSE), FTSE100 and FTSE AIM All-Share, 1-minute bar price figures for all the 941 companies and all the available FDB comments belong to the 941 companies.FTSE100 index consists of the first hundred companies with the highest market capitalisation listed on LSE, whereas FTSE AIM All-Share consists of all the UK and non-UK companies listed on the Alternative Investment Market (AIM).As an effort for potential future work, director deals data and broker ratings data were also collected.Table I in Section B summarises the total sum of collected data.
2) Data Transformer: Once the data collection is done by the data crawler, the data transformer extracts and converts the collected unstructured data in various formats such as HTML, CSV and XML into structured data.www.ijacsa.thesai.org 3) FDB Dataset (FDB-DS): After the collected data is being processed by the data transformer, the structured data such as price figures, comments, comment author usernames, date and time of comments and prices are stored in the FDB-DS accordingly.For example, the ticker symbols are parsed into `ticker` table, price data are parsed into `price` table and comment data are parsed into `comment` table.The FDB-DS is also responsible to store additional data produced from research analysis.
4) IE Templates: The Pump and Dump IE keyword template has been created and saved locally in the prototype system in a text (TXT) file format.It can be easily modified whenever needed.The IE keyword template consists of a series of keywords and phrases that were thoroughly researched [2], [27]- [29] and has been validated by experts in the relevant field.The IE keyword template will be used by the forward and backward analysers for the comments flagging process.Section C shows a sample list of the keywords and phrases.
5) Forward Analyser: The forward analyser matches the Pump and Dump IE keyword template against the comments in order to flag potentially illegal FDB comments, followed by matching the prices to the flagged comments, calculating and labelling price thresholds.The novel methodology used in this component is further discussed in Section IV.
6) Backward Analyser: Backward analyser performs the calculation and labelling of price hikes using a price moving average technique i.e. simple moving average (SMA).SMA is calculated by adding the prices for a specific time period and divide by the number of the time period.This calculation is applied against a total of 29 million price figures which belong to 941 companies.Subsequently, price hike SMA alerts will be matched back towards the initially flagged comments in forward analysis process.This methodology is further elaborated in Section V.

B. Dataset Acquisition
Table I provides an overview of the FDB dataset (FDB-DS) in this research.These data were collected between 23 rd September 2014 and 22 nd December 2014.
As mentioned in Section III, A, these 941 ticker symbols were collected from two of the LSE's indices, i.e., 100 ticker symbols from FTSE100 and 841 ticker symbols from FTSE AIM All-Share.The comments, which belong to all these ticker symbols, made within the 12 weeks were collected from both LSE and III.As for prices, these are 12 weeks' worth of 1-minute bar share prices belong to all the 941 ticker symbols.Director deals and broker ratings related to all the ticker symbols were also collected for potential future work.Fig. 2 depicts the FDB-DS structure.www.ijacsa.thesai.org

C. IE Template
Pump & Dump (P&D) IE keyword template is populated by obtaining the keywords from the P&D comments demonstrated in existing research [6], [27]- [29].The following is a sample list of the keywords and phrases that were used in this work:

IV. FORWARD ANALYSIS METHODOLOGY
This section introduces the novel forward analysis methodology.The aim of this methodology is to flag and filter the potentially illegal P&D comments using P&D keyword template with the integration of the share prices in the analysis process.This will categorise the flagged comments into different risk levels and allows relevant authorities to investigate into the flagged comments more realistically in terms of time and efforts.
The forward analysis methodology in this section will test the following hypothesis: H 0a : Pump and Dump activity from FDBs can be filtered using template based IE and their correlation with price movements.
H 1a : Pump and Dump activity from FDBs cannot be filtered using template based IE and their correlation with price movements.
As shown in the architecture diagram in Fig. 1, the forward analysis component contains several functions.These functions (i.e.comments flagging, price matching, threshold calculation and threshold labelling) that are part of the forward analysis methodology which will be discussed below.

A. Methodology
The following describes the steps taken in this methodology to flag potentially illegal comments: 1) Comments Flagging: a) Firstly, the forward analyser matches all the available keywords and phrases from the Pump and Dump IE keyword template against all the 507,970 comments which were stored in FDB dataset (FDB-DS).
b) The flagged comments which deemed potentially illegal are imported into FDB-DS as a new database table named `flaggedcomment`.
2) Price and Comments Matching: a) Once `flaggedcomment` has been populated, the forward analyser appends the price to each flagged comment by matching the ticker symbol and the exact or nearest date and time.This step is done to ensure a -base price‖ is set for each flagged comment.The -base price‖ will be used for threshold labelling in next step.Due to the extremely large 12 weeks' worth of price data belongs to 941 companies, the process of setting a -base price‖ takes up to a week to complete.
3) Comments Threshold Labelling: a) After having all the -base price‖ set for each flagged comment in the previous step, the forward analyser labels each flagged comment with thresholds.Due to the large data set, the threshold labelling process takes up to five days to complete all threshold calculations.To determine whether a flagged comment's base price exceeds any thresholds (i.e.various levels of spikes in prices), the forward analyser calculates all the ± 2 days' per-minute prices against the -base price‖ of each flagged comment.
b) When there is a trigger, a flagged comment will be labelled accordingly.The threshold labelling rules are as follows:  Flagged comments that have no price figure (due to empty price figures collected from ADVFN) are labelled as -N‖ (Null).
 If any of the ± 2 days prices calculated against the -base price‖ indicates a 5% price hike the comment is labelled as -Y‖ (Yellow).
 If any of the ± 2 days prices calculated against the -base price‖ indicates a 10% price hike the comment is labelled as -A‖ (Amber).
 If any of the ± 2 days prices calculated against the -base price‖ indicates a 15% price hike the comment is labelled as -R‖ (Red).
 Flagged comments that do not trigger any thresholds are labelled as -C‖.

B. Forward Analysis Methodology Results
By matching the keywords and phrases from P&D IE keyword template against all the 507,970 comments, a total number of 49,858 comments were flagged as potentially illegal comments (as shown in Table II).These flagged comments took up 9.82% of the total comments.www.ijacsa.thesai.orgOut of all the 49,858 flagged comments, 3,613 (7.25%) of the flagged comments triggered the -R‖ 15% price hike threshold, 2,555 (5.12%) flagged comments triggered the -A‖ 10% price hike threshold and 5,197 (10.42%) flagged comments triggered the -Y‖ 5% price hike threshold.37,895 (76.01%) flagged comments labelled as -C‖ did not trigger any price thresholds.The total number of flagged comments that triggered the thresholds is summarised in Table III and visualised in Fig. 3.The results show the possibility to filter comments that may be indicative of Pump and Dump activities by using template based IE and the correlation with price movements.For 12 weeks' worth of 941 companies' share prices data, the forward analyser took approximately seven days to completely calculate all the price thresholds and labelling the flagged comments.The length of time taken in this process heavily relied on the computer machine power and the efficiency of the programming in FDBM.In this research, the server machine used is a quad core CPU (2.50GHz Intel(R) Xeon(R) CPU E5-2680 v3).Although the forward analysis process takes a long time to process, this is due to the massive amount of data being processed altogether in this research.In real world scenario, this methodology can significantly help relevant authorities to narrow down and focus on the potentially illegal comments with higher risks.Therefore, the hypothesis for this section is met.

V. BACKWARD ANALYSIS METHODOLOGY
As an enhancement to the forward analysis process, the novel backward analysis process will test whether simple moving average (SMA) technique can be used to reduce false positives in the comments flagging process by highlighting abnormalities in the share prices and backward classify the flagged comments.
The backward analysis methodology in this section will test the following hypothesis: H 0b : Backward analysis can be performed by matching abnormal stock prices with the flagged comments to further classify flagged comments to reduce false positive.
H 1b : Backward analysis cannot be performed by matching abnormal stock prices with the flagged comments to further classify flagged comments to reduce false positive.
The moving average is one of the technical analysis methods that is often being used by financial analysts to predict the future price patterns, learning stocks' behaviour and trends by studying historical price data.The most basic moving average technique being used by financial analysts is SMA.Some research even used such moving average techniques to predict the rate of traffic congestions and road accidents [30].However, it appears that there was no attempt to integrate moving average techniques in the detection process of potential FDB crimes in the past.
The backward analysis attempts to use SMA to test if it can be of helpful to detect flagged comments while reducing false positives.SMA technique is integrated and applied to the share prices before performing backward analysis.The moving average technique is used in backward analysis because it can calculate and highlight whether a price figure exceeds a certain threshold.The following section discusses the methodology to perform backward analysis.

A. Methodology
The following describes the steps taken to produce results for analysis:  3) Alert Matching a) Next, the backward analyser appends the price alerts back to the `flaggedcomment` table by matching the ticker symbol and the exact or nearest date and time between both `price` and `flaggedcomment` tables.

B. Backwards Analysis Methodology Results
Table V shows the total number of flagged comments that matched 5% threshold from both forward and backward analysis for the 1 day, 3 days and 5 days' time period.Out of 49,858 flagged comments there are 228 flagged comments from the 1 day time period experiment labelled with Y (5% threshold from forward analysis) which are also labelled with 5% threshold from backward analysis.Next, there are 306 flagged comments from the 3 days' time period labelled with Y (5% threshold from forward analysis) and 5% threshold from backward analysis.Lastly, there are 274 flagged comments from the 5 days' time period labelled with Y (5% threshold from forward analysis) and 5% threshold from backward analysis.Table VI shows the total number of flagged comments that matched 10% threshold from both forward and backward analysis for the 1 day, 3 days and 5 days' time period.Out of 49,858 flagged comments there are 40 flagged comments from the 1 day time period experiment labelled with A (10% threshold from forward analysis) which are also labelled with 10% threshold from backward analysis.Next, followed by 49 flagged comments from the 3 days' period labelled with A (10% threshold from forward analysis) and 10% threshold from backward analysis.Lastly, there are 64 flagged comments from the 5 days' period labelled with A (10% threshold from forward analysis) and 10% threshold from backward analysis.Table VII shows the total number of flagged comments that matched 15% threshold from both forward and backward analysis for the 1 day, 3 days and 5 days' period.Out of 49,858 flagged comments there are 199 flagged comments from the 1 day time period experiment labelled with R (15% threshold from forward analysis) which are also labelled with 15% threshold from backward analysis.There are 408 flagged comments from the 3 days' time period labelled with R (15% threshold from forward analysis) and 15% threshold from backward analysis.Lastly, there are 500 flagged comments from the 5 days' time period labelled with R (15% threshold from forward analysis) and 15% threshold from backward analysis.The results in Tables V, VI and VII show it is possible to perform backward analysis by matching the abnormal stock prices backwards to the flagged comments to resolve false positives.
Take ticker symbol -BOX‖ as an example, there are 50 comments belong to this stock flagged as -R (15%)‖ threshold in the forward analysis process.Subsequently, some of these comments are flagged with SMA 15% threshold alert in the backward analysis process.This indicates that there are very high chances of potentially illegal activities going on during ± 2 days' time of the comments made.A further look at these flagged comments can confirm a highly potential P&D crime.One comment suggests that P&D has indeed happened which pumped the price up and then dumped.Another comment shows that there is still an attempt to pump up the price after the P&D event.Author -ne14t‖ has a series of BOX comments showing that he/she could possibly involve in a P&D crime.As an enhancement to the forward analysis methodology, the backward analysis aims to resolve false positives and reduce the need of a lot of manpower and time to read through initially flagged comments.The time taken in both forward and backward analysis process in this research is long; however, this is only due to the significant amount of data being processed and analysed altogether.If the prototype system and both methodologies are applied in real time in real world scenarios, it can significantly reduce the time, effort and cost of www.ijacsa.thesai.orgmonitoring and detecting P&D crimes on FDBs.Therefore, this concluded that the hypothesis is met.

VI. CONCLUSION AND FUTURE WORK
This paper has introduced two novel methodologies for detecting potentially illegal activities on share price based FDBs by looking not only at the comments but also the per minute share prices.IE techniques were used to collect FDB artefacts such as ticker symbol, comments and prices which made the forward analysis possible to be conducted in this research.A total of 49,858 comments were flagged when matching against the P&D IE keyword template.On average, this is 4,154 flagged comments per week or 593 flagged comments a day.More importantly, these comments belong to only 941 listed companies, not the entire stock market in the UK.Furthermore, according to the results, a large portion of these flagged comments are belong to the listed companies under FTSE AIM All-Share index, where it contains many smaller companies since it is an index that has a more flexible regulatory system, thus, allowing the smaller companies to enter LSE.In order to perform a more realistic investigation into such financial crime on all the FDBs and for all listed companies in the UK on a daily basis, the forward and backward analysis methodologies integrate share prices in the analysis process.This makes it possible for the relevant authorities to prioritise on investigating the flagged comments that have higher risks.The methodologies implemented in FDBM can significantly reduce the time and efforts needed by the relevant authorities to investigate P&D crime on FDBs in real time.As suggested by [29], regulators need to monitor share price based FDBs closely as share price based FDBs are becoming increasingly popular and the authors also find strong positive relationship between the stock prices of smaller companies and the investors' sentiments on FDBs.
The current limitations of this research are such as, not having a predefined IE keyword template for other financial crimes that can happen on the FDBs, namely Insider Information; secondly, the prototype system has not yet taken other artefact data such as broker ratings and director deals into account during the forward and backward analysis; thirdly, the prototype system has previously relied on an XML file format to obtain comments artefact data from the FDBs, thus, it should be programmed to be able to obtain comments through HTML file format, so that it can crawl comments data from FDBs that do not provide comments through XML file format.


Once in a lifetime  Pump the price  Keep ramping  Buy now  Good future  Invested so heavily  It will fly  Sell now  This is the chance  Price will go up  Buy as quickly as possible  Get out while you can.

TABLE .
VI. TOTAL NUMBER OF FLAGGED COMMENTS THAT MATCHED 10% THRESHOLD FROM BOTH FORWARD AND BACKWARD ANALYSIS

TABLE .
VII. TOTAL NUMBER OF FLAGGED COMMENTS THAT MATCHED 15% THRESHOLD FROM BOTH FORWARD AND BACKWARD ANALYSIS