Urbanization Change Analysis based on SVM and RF Machine Learning Algorithms

To maintain sustainability in the development, measured the yearly change rate of the land through Land Cover classified maps that hold the data which is surveyed as an influential factor for environment management and urbanization. This paper measured the change rate, which is helpful for the management of the city to define the new policy and implement the best one to maintain the natural resources. Machine Learning algorithms are utilized to produce the most acknowledged Land Cover maps using the GEE cloud-based reliable platform using the LANDSAT8 satellite imagery. For the classification used the Random Forest (RF) and Support Vector Machine (SVM) Algorithm. This investigation also found that the Support Vector Machine (SVM) classifier accomplished better over-all accuracy and Kappa coefficient as compared to the Random Forest (RF) classifier while the training sample for both is the same. Keywords—Random Forest (RF); Support Vector Machine (SVM); GEE; classification; machine learning classifier; multitemporal change analysis; urban change analysis; LANDSAT8; Kappa co-efficient


I. INTRODUCTION
Urbanization is a global phenomenon with various nations encountering different frequencies and precedents of ruralurban relocation. The whole world is changing into an urbanized center with a more significant part of the individuals moving to huge urban communities during the previous decades. Almost everyone is either way engaged with this procedure. Before mid-century, urban advancement was, for the most part, limited to developed nations yet has spread to developing nations now. Presently, practically all the developing nations on the planet are encountering urbanization. Advancing investigation suggests that the association between urbanization and advancement is not customized. Urbanization is ordinarily seen as solidly related to fiscal improvement, particularly in advanced countries. It is evaluated that urban territories make over 80% of the overall GDP. Enormous scale urbanization is changing the traditional structure prompting a significant upheaval in our general public. The development of urbanization in Pakistan mirrors the desires and ambitions of billions of new urbanites. It is a repetitive procedure a nation encounters as it advances from an undomesticated to modern culture. People migrate from monetarily restrained territories to a spot where fitter possibilities are advertised [1].
Pakistan has an average level of urbanization inside the Asia-Pacific area, dependent on both the development of urbanization and urban evolution. Pakistan has an enormous shelter shortage of around ten million units and increasing. Urban masses' improvement in the country has not been facilitated by advancement in shelter units or unbiased access to land, causing shelter to lack and the preferment of ghettos. Given Pakistan's shortcoming to earthquakes and other natural hazards, the lawmaking organization should set up a practical framework for developing guidelines and their consistency to avoid the implied adverse effects of vertical shelter schemes. The modern approach to manage urban shelter has faced multiple challenges. Although, among the South Asian nations, Pakistan has the most raised portion of the population living in the urban zone. The urbanized percentage of Pakistan's population grew 4.1 percent in the five years between 2005 and 2010 and is expected to grow more in the near future years [2]. As per the UN, a fraction of the populace will be residing in urban areas by the year 2030. Altogether, urban networks in Pakistan contribute fifty-five percent of the GDP. Furthermore, Pakistan generates 95% of its government salary from 10 substantial urban zones [2].
A Multan district is in the southern area of the Punjab province, Pakistan. Multan is also famous with the name City of Saints and is found on the banks of the River of Chenab at 30.1575° N latitude and 71.5249° E longitude. Total covering area around 3,721. Sq.km and it is the seventh-largest city in Pakistan. Urbanization is the transformation of any land into urban land and globally, it is observing that the trend of urbanization is increasing rapidly. To measure this change needs the upper layer view of the Earth's surface like satellite images. Numerous sensors are catching tremendous volumes of high-resolution satellite images containing diverse information that is updated diurnally. Remote sensing is the process of receiving the information associated with the Earth by performing the scanning process with the assistance of satellites. Remote sensing is performed to achieve the precise and clear image of the Earth, and these high-resolution remote sensing images are free of any cost, and for the researcher and 592 | P a g e www.ijacsa.thesai.org analyst, it is an incredible possibility to achieve better analysis results in a specific region of interest (Lopez, R. D., & Frohn, R. C. (2017)). If these images utilized in an appropriate manner, then it gives a tremendous profit and investigates the surface of the Earth [1]. As per a near perspective, remote sensing imagery contrasts from the standard imagery, the significant explanation behind these matters is additional spectral information that the satellite imagery contains and which is not noticeable for the unaided eye [3]. On the highresolution satellite images, Land Use Land Cover (LULC) change examination is one of the major because from this investigation report obtains the yearly trend of urbanization change that helps the policymaker of any establishment makes the correct choice and makers additionally understand the impacts of these progressions on people and their environments. Land Use and Land Cover (LULC) investigation has an exceptional influence on the future of the land, and this examination depends on the classification of the particular feature's class [4]. Land Cover Change Analysis screens the change either in loss or gain of any land type over the time range and measures the status of progress by the analysis for the scope of course of events. Yu et al. (2016) Land use have frequently affected land cover. The surface of the land has been changed in the part of appearance when seen on two distinctive time-series [5]. Furthermore,Huang et al.
(2017) mentioned that Google Earth Engine (GEE) is the reliable and powerful platform created by Google, Carnegie Mellon University and the US Geological Survey, uninhibitedly accessible; a propelled distributed computing conditions for remote sensing data processing and geospatial analysis and no need of any fast CPUs or GPUs for high computational work aside from fast internet connection [6].
The paper is organized into sections like problem statement which describes the land loss due to urbanization. In the next section, a literature survey is done where the machine learning approaches are mentioned; later in that section, satellite image collection and analysis processes are mentioned by highlighting the work of different researchers. In the later section of methodology, the data collection methods and SVM and RF algorithms are used to obtain the results of the study which describes the transformation of land in 2017 and 2018 and its impacts. In the last section the paper is concluded by mentioned the land transformation changes.

II. RESEARCH PROBLEM
The continuous urbanization may result in several concerns for the supervision of the city. In order to increase the sustainable progress of the city, there is a demand for measuring Land Cover Change analysis. From this investigation, obtain the influence of water-bodies and vegetation land on the advancement of urbanization with the help of the SVM and RF machine learning algorithms during the multi-temporal range of 2017 to 2018.The research is conducted for the lack of land cover change analysis in order to measure the change of urbanization. The motivation behind the research is to measure the changes in water bodies and vegetation using the classifier (SVM and RF) approach. Through the latest cloud-based powerful and the most reliable environment named, GEE and designing the new policies for the accommodation of the development.

III. LITERATURE REVIEW
Agriculture is a practice of producing and harvesting of the crops in a methodical manner. Because of the ever-increasing demand for the food, there is a need to improve the yield of the crop and avoid the loss of crop in every possible manner. In order to do so, the scientific community is using the optimal resources for the cultivation of crops, and remote sensing is used for this purpose because of its enormous advantages and features. One of the most vital applications of remote sensing is the classification of the crops by differentiating between the varieties of the crop.
Varma et al. (2017) Satellite imaging are used for the viable investigation of the temporal changes that can affect the yield of the crop in the specific areas. By using satellite imaging methods, the growth of crops from sowing to the harvesting level can be monitored in a very efficient way. Geo-references and ortho-rectified satellite images are used to identify the land loss problems and the affected areas in a different region of the world, not just that the seasonal changes and crop variation can also be monitored by using these techniques. Moreover, on the information based on the activities like deciding the type of crop and its acreage, growth determination of different stages of the crop, and delineating theextent of the crop canbeplanned inadvance. These methods are then used for the decision-making policies to increase the yield of the crop and avoid land loss.

Senthilnath et al. (2016)
Another satellite image collection method called multi-spectral satellite imaging can aid the identification and classification of the crops because they take chances in the reflectance as a function for the specific crop type. The classification of the crop is used in the auditing of the land use, and it also facilitates the soil and water study of the particular area. Because of the variability in the cultivation of the crop in a geographical area the process of the classification is a very challenging factor. Moreover, the classification of the crop is done on the basis of spatial and spectral bands o also by combining both methods for the purpose of classification. The clustering method is used for the grouping of the explicit datasets in such fashion that the data point towards the same group is virtually analogous. The purpose of clustering is that it reduces the intra-cluster distances, and it maximizes the inter-cluster distance. The data extracted from the points are in the form of optimal cluster centres, and the clustering is performed on the satellite data in several ways, for example, the clustering methods are focused on the partitioning using the spatial patterns. Partition clustering is completed by dividing the data with the fixed number of clusters (priori) by using the similarity indexes. Kmeans is known as one of the most commonly used clustering methods.
Machine learning is one of the most unswerving classification methods. In machine learning, the classifier works to determine the category to which a new research sample belongs based on the instruction data of the marked kind. The machine learning technique makes its technique more suitable in classification, especially in the field of remote sensing images because it is impossible to have a detailed knowledge area of study. Satellite images broadly utilized for www.ijacsa.thesai.org land cover change investigation and more often than not the manager and policy designer need a reliable technique to get to the result of a land cover change by reviewing and recognizing changes of land typesKarpatne et al. (2016).
The research yield portrays the diverse land includes that is available increases and lost on the surface of the Earth. GEE gives the most precise and dependable research data with a financially savvy parameter and takes lesser time contrasted with different gadgets that can be utilized for land cover change analysis. Mueller et al. (2017) reported that Landsat Satellite imagery is profitable for the various plans, for example, to measure the change of climate and disasters related to the environment and management of natural resources. Land cover change analysis is the under of natural resources either created by any machine learning classifier, threshold method or the hybrid approach. Landsat data are most commonly used in natural phenomena-oriented applications.
Borra et al. (2019) reported thatthe procedure of satellite image classification encompasses the gatherings of image's pixel esteem into the section of feature assortment, and for the satellite image classification, various strategies are accessible [7] to create the map. These maps can acquire various highlights or qualities to do the ideal work. A few different strategies incorporate the supervised and unsupervised plan to acquire the estimation of the precision of remote sensing information which is an essential requisite in classification. Support Vector Machine Classifier is among one of the many mainstreams and the unanimously acknowledged classifiers in the domain of remote sensing, the essential purpose for this is profoundly precise classification results. It is the binary classifier and dependent on the concept that the training samples which are nearer concurrence to the boundaries of a class will differentiate the class superior to other samples of training [8].So, it is evident that SVM classifier focuses on finding the optimal hyper-plane that separates the samples of training input into many classes and samples of training data is close to the boundaries of the class and at the lesser distance to hyper-plan are taken as support vectors, which is to be used for actual training [12]. Maxwell, A. E. (2018) the selection of kernels plays a significant role in the classification's results. RBF is also the type of kernel in which has a user-defined parameter that controls the impact of a sample of training on the boundary of decision and the value of the user-defined parameter is higher, then there is the chance of over-fitting. So that is why it is necessary to take the right balance for value [9]. Tehrany et al. (2015) conducted a study where the researchers implemented the Support Vector Machine (SVM) algorithms in different fields of study like flood susceptibility assessment and landslide susceptibility investigation. Genetic algorithms are acknowledged as the most advanced and pervasive developed heuristic search models in the field of artificial intelligence, and it has its application in the urban planning, ecological studies, climate modelling, and remote sensing. Kruber et al. (2019) also evaluated the performance of machine learning algorithms called RF and SVM and compared the results with each other and mentioned thatRF and SVM are called a mapping algorithm for the groundwater outbreaks, and these algorithms can provide very effective results.

IV. AREA OF STUDY
A Multan district is in the southern area of the Punjab province, Pakistan. Multan is also eminent with the name City of Saints and is found on the banks of the River of Chenab at 30.1575° N latitude and 71.5249° E longitude. Total covering the area around 3,721. Sq.km and it is the seventh-largest city in Pakistan.  Fig. 1 indicates the location of the Multan District under the boundaries of Pakistan and also shows the selected area study of area for the research.

V. DATA PREPROCESSING
The satellite imagery data use in the research is Landsat 8, Collection 1, Tier 1 and ID of this dataset is LANDSAT/LC08/C01/T1. The quality of this dataset is high, and it is considered a more suitable multi-temporal analysis. The total number of bands of this dataset is 12 and for this research just selected just 6 bands B2, B3, B4, B5, B6, and B7. The area bound was set according to the above-mentioned coordinate's value under the district of Multan. For the year 2017, the collection of images is 43, and for 2018 the numbers of an image are 45, and these 88 images are filtered as per area of study. After this select only two images from them one for 2017 and one for 2018. For this reduced selection, convert them into TOA reflectance, simple cloud score to be calculated and in the last apply the median on the least cloudy as shown in Fig. 2. www.ijacsa.thesai.org

VI. METHODOLOGY
To attain the urbanization of yearly drift the adopted methodology is shown in Fig. 3. From the LANDSAT8 take the image assortment and in 2017 assortment of satellite images are 43 however in 2018 the assortment which comprises of 45 images as indicated by the filtration of the study zone limit, then apply the composite algorithm which is exceptionally valuable to make an assortment of images into a solitary image. The composite calculation forms the median from the assortment of Landsat images and the exclusive image from the assortment ought to be the sans cloud image. The composite algorithm pertains to both image assortments of 2017 and 2018. For the determination of the representations of training points, utilizing the geometry alternative such as polygon and point of the GEE frame for the classification and uses two unique classifiers SVM and RF to compose the classified maps as shown in the Fig. 3. Estimate the precision and kappa coefficient of both SVM and RF classifiers and additionally anticipates the varieties of land in square kilometer for 2017 and 2018. In the final prognosis, the variation in urbanization land from different types of land over time, operating the two classifiers.

VII. RESULTS AND ANALYSIS
The characterized images for years 2017 and 2018 are created by applying machine learning classifiers (SVM and RF) for the apprehension of the variation in urbanization. We use SVM and random forest classifier on a comparable training data index. We analyze that the consequences produced by the SVM classifier are more exact than the random forest classifier. SVM classifier produces the effects by retaining the pixel's fallacious quality with low miss arrangement of urbanization.   (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 595 | P a g e www.ijacsa.thesai.org

a) Type of Land for 2017
Extracted each type of land from the classified results under the study area using the machine learning (SVM and RF) classifiers. The water area for the year 2017, which is obtained from the SVM and RF classifier is shown in Fig. 8  and 9, respectively. Blue color indicates the existence of water under the area of study while the black color indicates any other land type area.
The vegetation or greenish area as presented in the Fig. 10 and 11 identified using the Green color, indicates the existence of vegetation whileblack color indicates the non-vegetation area under the area of study.   The urban area for the year 2017 which is obtained from the SVM classifier results is shown in Fig. 12 and urban area for the year 2017 which is obtained from the Random forest classifier is shown in Fig. 13. In both, Yellow color indicating the area of urban land presence while the black color indicates any other land type area.

b) Type of Land for 2018
For 2018 extract each type of land from the classified results under the study area using the machine learning (SVM and RF) classifiers. The water area for the year 2018 which is obtained from the SVM and RF classifier is shown in Fig. 14  and 15, respectively. Blue color indicates the existence of water bodies under the area of study, while the black color indicates any other land type area.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 596 | P a g e www.ijacsa.thesai.org The urban area for the year 2018 which is obtained from the SVM classifier results are shown in Fig. 16 and urban area for the year 2018 which is obtained from the Random forest classifier is shown in Fig. 17. In both, Yellow color indicating the area of urban land presence while the black color indicates any other land type area.   The urban area for the year 2018 which is obtained from the SVM classifier results are shown in Fig. 18 and urban area for the year 2018 which is obtained from the Random forest classifier is shown in Fig. 19.

1) Measure Water Bodies Losses:
To calculatethe water bodies' losses between the multi-temporal ranges from 2017 to 2018 firstly obtain the values of water bodies of both years separately by using both the classifier's results. In 2017, the area of water bodies according to the classified result of SVM is shown in Fig. 8 under the area of study while for the year 2018 is shown in Fig. 12. From both, extract the area of losses in waterbodies and it is shown in Fig. 20. While the loss area of water bodies based on RF classifier's results over the time series of 2017 to 2018 is shown in Fig. 21.
The area covered by water bodies that are obtained using the SVM classifier for the year 2017 is 17.733197689859043 sq. kilometer and in 2018 it is 10.974930358946 sq. kilometer under the area of study which is 756.5236827301217 sq. kilometer. The calculated difference between both extracted land type shows that the water bodies' area is decreased by 6.75826733091299 sq kilometers from the classified results of SVM and RF as shown in Fig. 20 and Fig. 21 Fig. 29. The urban land area increased by 96.33055709482205 sq. Kilometers using RF and using SVM as shown in Fig. 30 and 31.

a) Vegetation Land into an Urban land
From the classified results of both algorithms extract the vegetation land and obtained the transformation area of vegetation land into urban land. As shown in Fig. 32 for the SVM algorithm and Fig. 33 for the RF algorithm.

b) Water Bodies into an Urban Land
From the classified results of both algorithms extract the water-bodies and obtained the transformation area of waterbodies into urban land. As shown in Fig. 34 for the SVM algorithm and Fig. 35 for the RF algorithm.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 599 | P a g e www.ijacsa.thesai.org  VIII. ACCURACY AND KAPPA COEFFICIENT ANALYSIS 1) Accuracy Assessment: For the assessment of accuracy uses the method of the Confusion matrix (error matrix). This method is the best approach for expressing accuracy of classification in supervised learning, and the error matrix (confusion matrix) provides the information of accuracy for every class in the classification [11]. By using the above-mentioned equation, obtain the accuracy of the classifier and value of TCP and TNP can be calculated by a confusion matrix.

a) SVM Classifier Accuracy
To measure the accuracy of the model, use the values of the Confusion matrix (error matrix) as presented in Table I. TCP is the sum of the diagonal values of the confusion matrix of the SVM Classifier.

So,TCP = 8+129+134=271
TNP is the sum of all values of the accuracy matrix.

So, TNP=8+4+17+0+129+6+0+0+134=298
Put the calculated values of TCP and TNP into the abovementioned equation. www.ijacsa.thesai.org 2) Kappa Coefficient: Kappa coefficient is used to control only those instances that may have been correctly classified by chance [10]. The value of this coefficient is between -1 to +1. The negative value of Kappa coefficient means low accuracy with between the classified image and the reference image, if the resultant value is 0 then it means no correlation at all between the classified image and the reference image and if the value is the higher then it indicates the higher accuracy of the classification [13].
Kappa Coefficient = ((OLA) -(ELA))/ ((TNP) -(ELA)) The OLA means the Observed Level of Agreement which is basically the diagonal's values sum of the confusion matrix and ELA indicates the Expected level of agreement which is calculated by multiplying the total row values with total column values and the resultant value further divides by TNP repeats this till ends. After that sum up all values. TNP is the Total Number of Pixels of the confusion matrix. The confusion matrix for SVM classifier using the Kappa Coefficient value is claculated and presented in Table II The urbanization has affected the plans of settlement and use of the land around the sub-urban, increasing the population and areas concurrent. And by this mean, there have occurred the changes in the weather pattern raising the global warming and energy crisis. So, this study is conducted for the purpose to have a view about the land loss and how it is contributing to the energy crisis and greenhouse damaging. Analyzing the multi-temporal land cover change is very helpful for higher management and environmental departments to maintain sustainability in the development of urbanization. Realize the current change rate situation, then made the decisions and policy according to it.Also, for forecasting possible future changes with respect to growth to urban land/losses of water bodies and vegetation to avoid from any natural environmental problems that come in the near future. Uses the same size of training sample points and polygons for both classifiers and according to overallaccuracy and kappa coefficient comparison, observed that the SVM performed better as compared to RF, though both are superlative in performance when training sample data are smaller in size. Along with this, SVM classified maps are almost to the ground facts as compared to RF generated maps. The RF map contained miss-classified water pixel data. We utilize SVM and RF classifiers to assess the variation in urbanization. SVM and RF utilizing the prevailing platform GEE under anassociated size of training sample points and polygons.
Conferring to general precision and kappacoefficient analysis, we observed that the SVM functioned better when contrasted with RF, while both are best in execution when training sample data are diminutive in size. This yearly change trend will be helpful for higher management and environmental departments to maintain sustainability in the development of urbanization by realizing the current situation, made better decisions and predicted future changes with respect to growth/loss land patterns including natural or unnatural environmental problems.The classification results of SVM algorithm are almost to the ground facts as compared to RF classification results and the majoris that the random forest classification results contained miss-classified water pixel data. X. FUTURE DIRECTIONS 1) For the future work we can do the comparison of the results we obtained from supervised classifiers with unsupervised classifier results.
2) Digital image classification of multi-spectral imagery for prominent surface features, particularly with reference to population growth, within the precincts of Multan district can be done in the future.
3) Using Satellite imgery measure the impact of waterbodies growth/losses on vegetation for the same area of study.