A Transformer Seq2Seq Model with Fast Fourier Transform Layers for Rephrasing and Simplifying Complex Arabic Text

—Text simplification is a fundamental unsolved problem for Natural Language Understanding (NLU) models, which is deemed a hard-to-solve task. Recently, this hard task has aimed to simplify texts with complex linguistic structures and improve their readability, not only for human readers but also for boosting the performance of many natural language processing (NLP) applications. Towards tackling this hard task for the low-resource Arabic NLP, this paper presents a text split-and-rephrase strategy for simplifying complex texts, which depends principally on a sequence-to-sequence Transformer-based architecture (which we call TSimAr). For evaluation, we created a new benchmarking corpus for Arabic text simplification (so-called ATSC) containing 500 articles besides their corresponding simplifications. Through our automatic and manual analyses, experimental results report that our TSimAr evidently outperforms all the publicly accessible state-of-the-art text-to-text generation models for the Arabic language as it achieved the best score on SARI, BLEU, and METEOR metrics of about 0.73, 0.65, and 0.68, respectively.


I. INTRODUCTION
Texts with complex linguistic structures often pose a difficulty in interpreting and understanding the indented meanings, particularly the meanings between the lines. Such difficulty is not only encountered by human readers but also by intelligent applications that demand text comprehension at some point.
Consequently, text simplification methodologies have come to help various readers (mainly readers with low-literacy skills [1], such as children or non-native readers) as well as to boost the performance of many natural language processing (NLP) applications (e.g., automated text parsing [2], summarizations [3], and translations).
Given a linguistically complex text as input, automated text simplifications (ATS) work around generating candidate texts as output that are essentially uncomplicated in structure and easy to understand without losing the purposed meaning [4]. In broad, the relatively common two steps involved in many ATS approaches are 1) splitting complex texts into simple sentences [5], [6] and 2) text rephrasing [7], [8], [9], [10], [11] using more straightforward common words (well-known in some approaches as lexical paraphrasing). In this paper, we focus on these two steps and introduce a novel text split-andrephrase solution for Arabic ATS.
Unlike the highly supported Indo-European languages (such as English), the recent ATS literature [4], [12], [13], [14] indicates that quite a few works are dedicated to supporting the Arabic language. To our knowledge, a few text splitting (e.g. [5]) and/or text-to-text rephrasing (e.g. [15], [16], [17], [18], [19]) techniques exist in Arabic NLP literature for tasks not related to simplification problems. Thus, in this paper, we seek to combine these techniques to support Arabic ATS in general. To illustrate the originality of our proposal, in the next section, we review in some detail these techniques besides exploring the existing non-Arabic split-and-rephrase models [7], [20], [10], [9]. Until then, we briefly outline the originality and main contributions of this paper as follows: • We introduce a text split-and-rephrase strategy for simplifying complex Arabic texts, which depends principally on a sequence-to-sequence Transformer-based architecture.
In the splitting part, we integrate our suggested solution with a punctuation detector for text segmentation (PDTS) built on top of a pre-trained multilingual masked-language model (mBERT). This PDTS attempts to generate the shortest set of simple independent-clause sentences from a given lengthy complex text. While in the rephrasing part, we propose a modified attention-free Transformer model, depending on a fast Fourier-Transform (FNet-based), which rephrases the concatenated simple sentences into a more readable version. The significant (original) work introduced in this paper focuses on the latter part. • We create a new Arabic corpus for benchmarking text simplification approaches and make it publicly available 1 . Besides, we make the details of our experimental evaluations and implementations (i.e., including codes and scripts) publicly accessible 2 to the interested researchers for replicating our experiments.
The remainder of this paper structurally unfolds as follows. First, we present the closely related work and the state-of-theart NLP pre-trained models, including a brief review of gaps in the literature concerning the Arabic text simplification prob-lem. Next, we introduce our text split-and-rephrase approach and discuss the subsequent experimental analysis and results. Concluding the paper and outlining the potential forthcoming avenues of investigation are presented afterward.

II. REVIEW OF RELATED LITERATURE
This section provides a synopsis of Text Simplification (TS) methods, broadly categorized into extractive and abstractive approaches. In principle, text simplification approaches (illustrated in Figure 1) are abstractly analogous to text summarization approaches, as they both strive to simplify texts. However, the dissimilarity between them is fundamental. Text summarization attempts to shorten text without losing the key meanings, while simplification attempts to improve text readability by reducing linguistic complexity constrained by preserving the key meanings. Before delving into TS approaches, we position the novelty of our contribution in the context of related literature by stating the following: • no split-and-rephrase model has been suggested yet for simplifying complex Arabic texts to the extent of the authors' knowledge; and • for text-to-text rephrasing, we explore, for the first time, the effect of the modified attention-free Transformer model [21](i.e., depends on a fast Fourier-Transform) for the Arabic language.

A. Extractive Approaches
The core idea of the extractive approach is to extract the main sentences of the text through text summarization. In other words, the simplification here is performed using summarization. However, summarization does not necessarily lead to simplification. Therefore, this approach is not recommended. One of the examples of text simplification through summarization is the TF-IDF [14]. Preprocessing is a normal requirement of this approach. The preprocessing includes converting text into lower-case, removing punctuation, special characters, and stop words, and stemming to return complex words to their language base.

B. Abstractive Approaches
The previous approach is not a real simplification, it is just a summarization, but in the abstractive approach, the output text is a real simplified text. In this approach, there are two main categories of simplifications. The first is Lexical Simplification (LS), and the second is Text Generation (TG).

1) Lexical simplification:
The simplification in the LS category is performed through the replacement of complex/hard words with simple/easy words. Therefore, it is called lexical simplification. LS algorithms work at the sentence level. The structure of sentences is not changed, and the grammar is simplified. Only word replacement is included in the simplification. As so, this type of simplification is not effective enough. The hard words may be replaced with easy ones, but the sentence structure and grammar may still be hard to understand. Examples of lexical simplification are the following.
• Rules-based LS • Parallel corpora extracted-rules LS • Word embedding LS • Pre-trained language models LS Rule-based LS [22], [23] depends on a linguistic database, like WordNet, to get the simplest synonym of a word, which can be based on its frequency or its length. In parallel corpora, extracted-rules TS [24], [25], [26], the rules are extracted automatically from a parallel aligned corpus. While Word Embedding TS [27], [28], [29] has the advantage that there is no need for lexical resources. The appearance of pre-trained models had a huge benefit for all NLP tasks; one of these tasks is text simplification. Some systems use pre-trained models like BERT to find and generate easy words for complex ones. Therefore, the class of pre-trained language models TS systems [30], [31] has proved its effectiveness compared to other techniques.
2) Text generation: The second category of the abstractive approach is Text Generation (TG). In TG, a new simplified text is generated. The new text may have a different structure or a different number of sentences. TG's approach includes sentence splitting, text addition, and deletion. TG can only be considered a true simplification, while the previous methods can be seen as good trials for the simplification, but not real simplification because real simplification means digestion of the original text and generating a new simplified one with new simple words, structures, and grammar. The recent text generation-based TS is data-driven, considering the advantage of the complex structures in data. The text generation techniques can be classified as follows: • Syntactic simplification In the next paragraphs, we explain each approach. a) Syntactic simplification: In syntactic simplification, the hard/complex words are replaced by easy/simple ones, and the grammatically complex sentences are identified and rewritten in simple sentences. The process of simplification includes splitting long sentences, changing passive sentences to active ones, and resolving ambiguities. Examples of syntactic simplification research are mentioned in [32], [33].
In [34], the authors proposed a model called: TriS: Their approach was to break down a long statement into numerous shorter ones. When a sentence is written in the subject-verbobject order, it is called simple (SVO). A dataset of 854 sentences taken from the New York Times and Wikipedia, manual simplification is performed for the evaluation. The authors assess 100 unseen sentences and compare them to Heilman and Smith's rule-based method [35]. Their approach gets higher ROUGE [36] and Flesch-Kincaid Grade Level scores. Another method is grammar induction, where simplification is considered by converting the tree1 to tree2 problem, where tree1 is for the source, and tree2 is for the target. The process includes extracting tree transformation rules using a corpus, then learning how to select the adequate rule(s) to be applied to simplify unseen sentences. In [37], authors modeled the syntactic and lexical simplification using tree transduction rules. The evaluation of their proposal was performed using the Simple Wikipedia corpus and it showed good results. The authors highlighted the need for a mechanism to eliminate useless transformation rules. is the process of translating a text written in language A (source) to language B (target), where the two languages are different. Due to the massive amount of available data nowadays, MT has achieved many success stories. MT is applied successfully to the TS by considering that language B (target) is a simplified version of language A (source), where both represent the same language. Now the problem of simplification can be seen as the generation of monolingual text-to-text or monolingual translation. Some research used phrase-based statistical MT and applied it to TS. The task obviously is simpler than the source, and the target languages are the same; only the target is simpler than the source. Examples of SMT used for TS includes [38], [39], [40], [41], [42]. c) Deep learning techniques: In the era of big data, powerful computers, and GPUs, deep learning took the lead in AI, especially data-driven AI. Deep learning proved to be effective when used with SMT, where RNN Encoder-Decoder is used in MT [43]. This motivated researchers to employ DL in TS using the monolingual translation approach. In [44], the researchers successfully used RNN-based Neural Machine Translation (NMT) for TS; other authors in [45] used LSTM Encoder-Decoder in the simplification process. The authors in [46] developed a model called R-PBMT, Phrase-Based Machine Translation, augmented with a re-Ranking heuristic based on dissimilarity. The model is trained and tested using the PWKP dataset; they compare their work with three models, Word-Substitution Baseline models, that replace a word in a sentence with synonyms retrieved from WordNet. In another research [47], the authors performed four rewriting operations, replacing, splitting, reordering, and deletion; their work is also depending on DL. In [48], the authors proposed a system that is based on quasi-synchronous grammar. Results showed the general superiority of their model using the human evaluation and the automatic evaluation using the metrics BLEU and Flesch-Kincaid grade level. There exist many other DL-based approaches for TS. These approaches include graph-based approaches [49], reinforcement learning-based TS [50], NMT [51], combining semantic structure and NMT [52], phrasebased unsupervised TS [53], unsupervised neural TS [54], and split-and-rephrasing techniques [8], [9], [7], [10], [20]. In this paper, we conceptually consider the latter technique (i.e., split-and-rephrasing technique) for the Arabic language for the first time. In addition, Table I summarizes the recent existing approaches which indicate the originality of our proposal. Figure 2 presents an overview of our proposed TSimAr in five steps with illustrative input-output examples. In principle, we integrate a punctuation detector for text segmentation (PDTS [5], see step 3) with a modified attention-free Transformer architecture for rephrasing and simplifying complex Arabic text, step 4. The former (i.e., PDTS) attempts to split a given input text into the shortest set of simple independentclause sentences. The latter (i.e. the focus of this paper) aims to rephrase the concatenated simple sentences to generate a more readable version. For example, given a textual document X containing complex sentences, TSimAr attempts to break it down into (Y ) uncomplicated sentences with rephrasing, such that Y ← T SimAr(X) and Y = y 1 , y 2 , · · · , y |n| , where n is the number of generated simple sentences.

III. METHODOLOGY
As usual with most NLP applications, TSimAr is set off with straightforward text preprocessing (see step 2), which includes noise/diacritizations removal and soft normalization. This preprocessing step cleans the input texts without breaking sentence structures, and more importantly, it preserves the overall meaning to an extent. For implementing this step, we consider two Python-based toolkits: NLTK 3 and CAMeL 4 . Once the preprocessing step is performed, TSimAr segments and then rephrases the cleaned input text, shown in steps 3 and 4 of Figure 2. In the following subsections, we focus on these

A. Text Segmentation
We base our proposed TSimAr on top of PDTS [5] (i.e. an Arabic text splitting tool that employs a pre-trained multilingual BERT [55] model for detecting missing punctuations) for segmenting input texts into a set of potentially independentclauses. More in detail, PDTS queries mBERT 5 |X| times to predict proper punctuations between words on which they can be used as text split delimiters: where pu m i represents the valid mBERT's output for a given white-space at index i; t m i is the actual mBERT's output; pun and θ are model parameters set by the user to filter out t m i ; and X ′ i is the input X with the inserted [M ASK] token at index i. PDTS then validates the predicted set of punctuations pu m ∀i ∈ {1, 2, · · · , |X|} using four generic linguistic rules in a greedy-like strategy. In this paper, we have set pun with only the main splitting punctuations, including full-stop, comma, semicolon, and colon.

B. Rephrase Generation
Motivated by Transformers-based encoder-decoder architecture [56] that has achieved outstanding improvements in complicated NLP tasks, we consider one of its optimized sequence-to-sequence models. In particular, we utilize (FNet) an efficient version that substitutes complex self-attention layers with linear Fourier Transforms-based layers, introduced originally in [21]. Here, FNet is efficiently lighter and much faster than the standard Transformer model with complicated attention layers. Besides, it closely matches the performance of the standard Transformer model. Considering the right part of Figure 2, it descries the core part of standard Transformer architecture that is modified by adding fast Fourier Transforms layers. Broadly speaking, Transformer blocks are stacked with a size of N x , where each block consists of a two residual 5 https://github.com/google-research/bert/blob/master/multilingual.md gateless layers that adds additional weight matrix with Skip Connection, Eq. (3): where ϵ is the regularization parameter. In the standard Transformer architecture, the multi-head attention (i.e. concatenate a number of self-attention layers) allows to learn the structural and morphological correlation between different input-tokens impressively, using Eq. (4): where query, key, and value vectors are computed as and σ is the softmax activation function. Nevertheless, it is memory intensive and has an exponential (quadratic) time complexity concerning the size of the input sequence [56]. Thus, to avoid such scalability issues, we employ fast Fourier Transform layers as an alternative to attention sublayers, expressed in Eq. (5): where i, j = 0, · · · , N − 1. Recent experiments [21] demonstrated that the Fourier Transform-based model (socalled FNet) can significantly reduce the training time and space complexity while providing an excellent performance that is exceptionally comparable to the performance offered by the standard Transformer-based encoder-decoder model. The architecture of our implemented FNet sequence-to-sequence model for rephrasing Arabic texts is presented explicitly in Figure 3.

IV. EXPERIMENTS AND RESULTS
We conduct experiments to assess the performance of TSimAr and analogize its rephrasing part with the existing Arabic pre-trained text-to-text generation models. We introduce ATSC (a new Arabic corpus for text simplification) and use it in our evaluation protocols. As a quality benchmark of generated simplifications, we report various automatic text www.ijacsa.thesai.org (2) Pre-processing: • Normalizing.
• Applying linguistic rules to identify potential independent clauses.
• Text rephrasing using Seq2Seq with Fourier Transform Layers. Input (a complex sentence) : Happiness is the highest aspiration of human beings and it's their strongest desire and each one of us loves it yearns for it and strives to obtain it to always seek it for ourselves.
Output (a set of simple rephrased sentences) : • Happiness is a lofty aspiration for human beings.
• Everyone loves happiness and yearns for it.
• Everyone works hard to find happiness. matching metrics, including SARI (the primary metric for text simplification and rephrasing), besides presenting the findings from the conducted manual (human-based) assessment.
A. Corpus and Experimental Setup a) Arabic text simplification corpus (ATSC):: To the extent of our knowledge, there is no specific Arabic corpus for text simplification. Thus, we create a small benchmark corpus containing 500 pairs of a small-to-large complex text (source) and a gold-standard simplified (i.e. splitted and rephrased) text. The gold-standard reference simplifications are written and carefully reviewed by human experts. In a little more detail, our corpus ATSC has been constructed from selective Arabic articles that contain appropriate text to simplify. We collected these articles from different public sources (i.e., Wikipedia, newspapers, and news agencies), which cover various domains, including history, geography, health, education, and technology. For constructing the simplified versions from the collected articles (i.e., form the gold-standard human-based references), we have applied two simplification methods: syntactic simplification (i.e. just an extractive text summarization method that drops/selects the key sub-sentences without generating new words) as well as linguistic simplification (i.e. almost similar to the abstractive summarization method that attempts to replace complicated words with conceivably simpler synonym words), depending on their contexts and overall meaning. Table II shows some general statistical descriptions of our ATSC. b) Baselines:: We compare the text rephrasing part in our TSimAr (i.e. FNet model) against the state-of-the-art pretrained Arabic monolingual (Arabic-T5-small [16], Arabic-T5 [15], UBC-AraT5 [17]) and general multilingual (MT5base [19], mBART-large-50 [18]) models for text generation tasks. These text-to-text generation models are architecturally extended from T5 encoder-decoder transformer blocks [57], except mBART that is a multilingual Sequence-to-Sequence model used generally for translation tasks: c) Automatic metrics:: Given a source text st and a gold-reference gr (i.e. a typical simplification version written by human experts), we evaluate the efficiency of the produced simplification y (i.e. Y ← T SimAr(st)) using a variety of automatic metrics as follows: • SARI [58] (System output against References and against the Input sentence) is a standard evaluation metric for text simplification, which compares the generated candidate simplifications y against both (1) the source input st and (2) the gold-reference gr. It uses precision and F1 scores of n-grams (n ∈ 1, 2, 3, 4) to measure the goodness of added, deleted, and preserved tokens by the simplifier model (i.e. TSimAr). • BLEU [59] (Bilingual Evaluation Understudy) is a popular evaluation metric for text quality, commonly used in machine-translated tasks. It compares y against gr only and approximates recall and precision metrics using the best match (n-gram) length and modified n-gram precision, respectively. • METEOR [60] (Metric for Evaluation for Translation with Explicit Ordering) is similar to BLEU but replaces the best match (n-gram) length and modified n-gram precision with a weighted F-score metric that depends on unigram mapping. • TER [61] (Translation Edit Rate) which estimates the number of edits required (e.g., adding, deleting, or shifting a word token) to improve y as matched with gr.

• ROUGE [36] (Recall-Oriented Understudy for Gisting
Evaluation) gives different ROUGE-n metrics, where n represents the number of overlapping n-grams between y and gr. It uses the standard statistical metrics (precision, recall, and F-measure) for its measurements. In our experiments, we consider ROUGE-1 (unigram overlapping), ROUGE-2 (bi-grams overlapping), and ROUGE-L (the longest identical subsequence overlapping between y and gr).
Concerning SARI, BLEU, METEOR, and ROUGE, higher scores indicate better quality correlated to rational human judgments. In contrast, a lower TER metric (i.e. lower editrating scores) indicates better performance. d) Implementation details: To train and configure the text rephrasing part in our TSimAr (presented in Figure 3), we applied a 50-20-30 random split on our ATSC corpus to create train, dev, and test sets, respectively. Besides, we used the Adam optimization algorithm for training with a learning rate of 0.001. The training loop lasts 5k epochs with a batch size of 64 and a maximum sequence length of 256. Moreover, the text-to-text generation models, considered in this paper as baselines, are publicly available at the Hugging Face 6 , under the model (card) names: 'google/mt5-base', 'facebook/mbart-large-50', 'flax-community/arabic-t5-small', 'UBC-NLP/AraT5-base-title-generation', and 'malmarjeh/t5arabic-text-summarization'. We have constructed these models using PyTorch 7 framework besides utilizing some NLP toolkits for text preprocessing, including NLTK 8 and CAMeL 9 . All experiments have been conducted using a gaming PC equipped with Intel i9-CPU, 64G-RAM, and a single NVIDIA GeForce RTX3070 GPU.

B. Performance Evaluation
In Table III, we show the performance of our TSimAr with FNet against different text rephrasing models (i.e. depending on text-to-text generation models) using the validation portion from ATSC. Performance results are also visualized in Figure 4. Besides that, we break down the performance details and simplification quality for one input instance in Table IV. As can be observed, TSimAr evidently outperforms all the existing state of the art text-to-text generation models for the Arabic language. It achieves the best score on all standard metrics (particularly SARI) and gives the second to the best score on TER and ROUGE-1. The last column on the right of Table III shows the execution time in second, visualized in Figure 5. Here, our TSimAr gives foreseen poor-to-ordinary time performance as its FNet architecture is quite heavy (consisting of more than 11M trainable parameters).
In addition, giving insight into the text simplifications produced by the competitors' models, presented in Table IV  TABLE III. AUTOMATIC EVALUATION RESULTS. THE BEST PERFORMANCE FOUND IS INDICATED BY THE ASTERISK*   ROUGE   SARI  BLEU  TER  METEOR  R-  Dealing with increasingly complex health needs calls for a multisectoral approach in which health promotion and prevention policies are combined, with community-based solutions and people-centred health services. Primary health care also includes the essential elements needed to improve health security and stave off health threats such as epidemics and antimicrobial resistance, through measures such as community participation and education, rational prescription, and a core set of essential public health functions, including surveillance. . ‫العامه‬ ‫للصحه‬ ‫االساسيه‬ ‫ووالوظائف‬ ‫الرشيده‬ To deal with complex health needs we must take a multi-sectoral approach to health promotion. People-centred health services must be provided. The essential elements of primary health care are community participation, rational prescriptions, and basic functions of public health.  To clarify more, we observe that mBART [18] often produces outputs almost identical to the input without simplifying or rephrasing, and in turn, it inaccurately archives more than 0.4 SARI score. While, UBC-AraT5 [17] simplifies input text in much better quality, it achieves around 0.27 SARI. Accordingly, it was essential to solidify the evaluation of our proposed TSimAr using manual insight by eliciting human judgments.

C. Manual Evaluation
To get an additional in-depth evaluation of our TSimAr, we conducted a qualitative analysis by eliciting a humanistic viewpoint on 36 sampled text documents selected randomly from the ATSC validation set. We invited two expert consultants in Arabic linguistics (not authors of this paper) to evaluate these documents (each expert is given 18 documents) on the following three standards using a five-star-point Likert scale (1-5): • Adequacy (preservation of the source meaning), • Contextual soundness (quality of rephrased and simplified texts), and • Grammaticality (to what extent the generated text is free from grammatical errors).
Experts are asked to compare the generated simplifications TSimAr by (i.e. depending on Arabic-T5-small, Arabic-T5, MT5-base, mBART, UBC-AraT5, and our FNet) against the gold-standard references (i.e. text simplification versions written by human experts). With a glancing over into Table VI, the results of our manual evaluation look almost compatible with the automatic evaluation results (shown in Table III) for only the first and the third standards (i.e. Adequacy or Grammaticality). Nevertheless, Contextual soundness standard   TABLE VI. HUMAN EVALUATION RESULTS FOR THE THREE CRITERIA:  ADEQUACY, CONTEXTUAL SOUNDNESS, AND GRAMMATICALITY. BASE  PRE-TRAINED TEXT-TO-TEXT GENERATION MODELS WITH  *  ARE  SIGNIFICANTLY DIFFERENT FROM TSIMAR'S BASE MODEL  †,  DEPENDING ON A TWO-TAILED INDEPENDENT T-TEST,  reveals the quality differences more precisely, which also confirms that our TSimAr with FNet (indicated by ‡) can produce a highly competitive performance (see, 4.58 the best average ratings obtained by TSimAr).
Furthermore, the manual experimental results indicated that UBC-AraT5 is a feasible pre-trained text rephrasing model to adopt (i.e. an alternative model to FNet) as it achieves the second highest average score of about 4.42. This indication can also be statistically noticed by its insignificant p-value (i.e. the obtained .78 with UBC-AraT5). In contrast, however, the worst performances observed were with MT5-base and mBART, which unexpectedly gave zero simplification quality. Here, the performance of mBART contradicts the insignificant p-value (i.e. .367) as this heavy model offers an illusive high score in Adequacy and Grammaticality standards, which is a consequence of generating texts exactly similar to the input texts with no simplification.

D. Discussion and Potential Threats to Validity
In this section, we discuss the potential threats to the empirical validity of the proposed TSimAr. The main threats may include the creation of our corpus (ATSC) for evaluation as well as the benchmarking against the state-of-the-art text rephrasing models. As mentioned earlier in this section, there is no specific Arabic corpus for text simplification available to date. Therefore, we had to make an effort to (1) collect professionally written corpus from online sources (mainly from newspaper articles) and (2) simplify them precisely (i.e. split with rephrasing) by linguistic experts in Arabic, elaborated in ATSC. One may argue that ATSC is relatively small (containing only 500 pairs of texts), and more importantly, it may be insufficient to train a heavy model like FNet. These thoughts are valid to a reasonable extent. However, recent studies demonstrated that training a language understanding model on a larger corpora/dataset might not necessarily imply improving its performance [62], [63]. Besides, our intention here is not to use ATSC to train a language understating model but rather to use it as a benchmarking corpus for testing the generalization of a pre-trained test-to-text generation model. We make our ATSC available for researchers to exploit in this regard.
Concerning the chosen text rephrasing baseline models, we attempted to counter this concern by using all publicly available Arabic monolingual sequence-to-sequence models (we have encountered only Arabic-T5, Arabic-T5-small, and UBC-AraT5) as well as using the state-of-the-art multilingual models www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 2, 2023 (i.e. MT5 and mBART). For a fair comparison between these pre-trained models, we confirmed that their large vocabulary contains all the distinct 6737 words extracted from ATSC.

V. CONCLUSION
Approaching towards breaking down a given complex Arabic text into a simple and meaning-preserving version, we have presented a text split-and-rephrase solution (socalled TSimAr), which depends principally on a sequence-tosequence Transformer-based architecture. For the splitting, we have integrated TSimAr with a punctuation detector for text segmentation (PDTS) built on top of a pre-trained multilingual masked-language model (mBERT). This PDTS attempts to generate the shortest set of simple independent-clause sentences from a given lengthy complex text. While in the rephrasing phase, we have proposed an attention-free Transformer model, depending on a fast Fourier-Transform (FNet-based), which rephrases the concatenated simple sentences into a more readable version.
In addition, we have created a new corpus (ATSC) to train and evaluate the rephrasing part in our TSimAr. Automated and manual analyses demonstrated that with the support of PDTS, our TSimAr evidently outperforms all the existing state-of-theart text-to-text generation models for the Arabic language as it achieved the best score on SARI, BLEU, and METEOR metrics. Nevertheless, a trivial limitation noted in TSimAr lies in the execution time compared with competitors' lighter models, such as Arabic-T5-small. Hence, for the generality, we imagine a remarkable extension of this ongoing work in two directions: • (1) evaluating TSimAr on a comprehensively benchmarking dataset that we plan to create, and • (2) optimizing our FNet architecture for enhancing its execution performance.
For the latter direction, we will investigate the feasibility of applying a knowledge distillation technique to compress our FNet into a smaller version to help us reduce its space complexity while achieving higher inference speed and accuracy.