Modeling House Price Prediction using Regression Analysis and Particle Swarm Optimization Case Study : Malang , East Java , Indonesia

House prices increase every year, so there is a need for a system to predict house prices in the future. House price prediction can help the developer determine the selling price of a house and can help the customer to arrange the right time to purchase a house. There are three factors that influence the price of a house which include physical conditions, concept and location. This research aims to predict house prices based on NJOP houses in Malang city with regression analysis and particle swarm optimization (PSO). PSO is used for selection of affect variables and regression analysis is used to determine the optimal coefficient in prediction. The result from this research proved combination regression and PSO is suitable and get the minimum prediction error obtained which is IDR 14.186. Keywords—House prediction; regression analysis; particle swarm optimization


INTRODUCTION
Investment is a business activity that most people are interested in this globalization era.There are several objects that are often used for investment, for example, gold, stocks and property.In particular, property investment has increased significantly since 2011, both on demand and property selling [1].One of the increasing of property demand is because of high population in Indonesia.Indonesian Central Bureau of Statistics states that in East Java 50% of the population of East Java classified as a young population who have age approximately at 30 years old [2].The result of this census indicates that the younger generation will need a house or buy a house in the future.Based on preliminary research conducted, there are two standards of house price which are valid in buying and selling transaction of a house that is house price based on the developer (market selling price) and price based on Value of Selling Tax Object (NJOP).According to Lim, et al the fundamental problem for a developer is to determine the selling price of a house [3].In determining the price of home, the developer must calculate carefully and determine the appropriate method because property prices always increase continuously and almost never fall in the long term or short [4].
There are several approaches that can be used to determine the price of the house, one of them is the prediction analysis.
The first approach is a quantitative prediction.A quantitative approach is an approach that utilizes time-series data [5].The time-series approach is to look for the relationship between current prices and prevailing prices.The second approach is to use linear regression based on hedonic pricing [6], [7].Previous research conducted by Gharehchopogh, et al. [7] using linear regression approach get 0,929 error with the actual price.In linear regression, determining coefficients generally using the least square method, but it takes a long time to get the best formula.
Particle swarm optimization (PSO) is proposed to find the coefficients aimed at obtaining optimal results [8].Some previous researches such as Marini and Walzack [9], [10] show that PSO gets better results than other hybrid methods.There are several advantages of PSO, in the small search space PSO can do better solution search [11].Although the PSO global search is less than optimal [12], but on the optimization problem the value of the variable on the regression equation can find a maximum solution using PSO [12], [13].
This research aims to create a house price prediction model using regression and PSO to obtain optimal prediction results.PSO is used for selection of affect variables in house prediction, regression is used to determine the optimal coefficient in prediction.In this study, researchers wanted to know the performance of the developed model in time series data.Prediction house prices are expected to help people who plan to buy a house so they can know the price range in the future, then they can plan their finance well.In addition, house price predictions are also beneficial for property investors to know the trend of housing prices in a certain location.This research is focused in Malang City, because Malang is one of tourism and urban city in East Java.

A. House Price Affecting Factors
There are several factors that affect house prices.In his research Rahadi, et al. [14] divide these factors into three main groups, there are physical condition, concept and location.Physical conditions are properties possessed by a house that www.ijacsa.thesai.orgcan be observed by human senses, including the size of the house, the number of bedrooms, the availability of kitchen and garage, the availability of the garden, the area of land and buildings, and the age of the house [15], while the concept is an idea offered by developers who can attract potential buyers, for example, the concept of a minimalist home, healthy and green environment, and elite environment.
Location is an important factor in shaping the price of a house.This is because the location determines the prevailing land price [16].In addition, the location also determines the ease of access to public facilities, such as schools, campus, hospitals and health centers, as well as family recreation facilities such as malls, culinary tours, or even offer a beautiful scenery [17], [18].In general, the factors affecting the house prices will be presented in Table 1.

B. Hedonic Pricing
Hedonic pricing is a price prediction model based on the hedonic price theory, which assumes that the value of a property is the sum of all its attributes value [20].In the implementation, hedonic pricing can be implemented using regression model.Equation 1will show the regression model in determining a price.
Where, y is the predicted price, and x 1 , x 2 , x i are the attributes of a house.While a, b, ... n indicate the correlation coefficients of each variables in the determination of house prices.

III. DATA SET
In this research, we use house price data based on NJOP from Land and Building Tax (PBB) payment structure.Due to limited access to the data, this study used 9 houses data in time series scattered in Malang City area, within 2014-2017.Normalization of data is done by completing the empty data at a certain time with the assumption that land prices tend to change every 2 years, while building prices tend to be stable.
The data tabulation offer information of the houses includes: home id, address (street name), longitude-latitude, year, building area, land area, NJOP building price (IDR/m 2 ), NJOP land price (IDR/m 2 ), distance from city center(km), amount number of campuses, amount number of restaurants, amount number of health facilities, amount number of playground, amount number of schools, amount number of traditional markets or malls, amount number of worship places, and also easiness access to public transportation.The city center in this study defined as the location of the square of Malang City.The distance to city center is calculated using Google maps.Meanwhile, easy access to public transportation is calculated between radius 400 meter.The calculation of nearest objects in the certain radius using buffering techniques accessed through the site http://obeattie.github.io/.Based on Fig. 1, the process of regression analysis and particle swarm optimization methods is described in the following section:

A. Regression analysis
The prediction model used in this research is hedonic pricing, the suitable model using regression, with the standard formula as shown in (1).The dependent variable symbolized as Y is NJOP price and independent variables with symbol x 1 -x 14 consist of year, building area, land area, NJOP land price (IDR/m 2 ), NJOP building price (IDR/m 2 ), distance to center of the city, amount number of campuses, amount number of restaurants, amount number of health facilities, amount number of amusement parks, amount number of educational facilities, amount number of traditional markets, amount number of worship places, and easiness to public transportations is shown in (2).www.ijacsa.thesai.orgIn this case, the public transportation variable will be 0 or 1, 0 means no public transport passes the area within 200 meters.And 1 means that there is public transports which passes through the area.

B. Particle Swarm Optimization (PSO)
PSO is a stochastic optimization method that represents solutions as particle [21].Amount number of particles are generated randomly, where each particle consists of some dimensions of xi position and velocity vi.Each particle will measure its fitness value which shown in (3).
Where, f (x) is the fitness value of each particle that indicates the error prediction value.Each particle will explore the solution search space to get optimal results.The displacement from one position to another is greatly influenced by the speed of each particle, to obtain the best position required a dynamic speed formulation using (4) [22].
Where, vi shows the velocity value for the particle dimension to i to n, t denotes the iteration time, w is the value of the inertia vector whose value is obtained dynamically using (5) [23].p i is the best position ever obtained for each particle, while the pg i is the best position ever achieved by the whole particle.c 1 and c 2 sequential are cognitive and social constant, which in this study is 2.5 and 0.5.r 1 and r 2 are 0.5 and 2.5.Once obtained speed will be updated position using (6).
In the PSO, too fast particle displacement position can make the method fail to obtain the optimum solution.This problem can be handled by performing speed control or velocity clamping [9].The speed control mechanism by conducting conditions for the speed of each particle uses (7).
While, the value of v j max is generated using equation 8 and v j min is the negative value of v j max .
(8) Calculation cycle of velocity values vi and updated position xi will be repeated until maximum iteration is achieved.When the iteration is over, the best particles come out as the optimum solution.

C. Testing Methods
The model developed in this research will be tested using several methods such as Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE).MAPE is calculated by making an average percentage of the absolute error of each predicted result.Thus, MAPE can indicate how much prediction error.MAPE is described in (9).

∑ | |
MAE calculate the average of absolute error for each predicted result.MAE is useful when measuring errors in certain units.MAE values can be calculated using (10).

∑ | |
RMSE is used to calculate predicted performance by considering the prediction error of each data.RMSE formula can be seen there (11).

V. EXPERIMENT AND RESULT
The experimental process examines the parameters used on particle swarm optimization such as particle test, iteration test, and also inertia weight combination test.
The PSO algorithm generates population and initial velocity in the range of [0-100].The range used has been tested from the number -1000 to 1000 and obtained that range 0-100 can provide highest fitness solutions.Particle test and iteration test for each model use a multiple of 100 in which the maximum particle test lies in 3000 particles, if the particles tested over 3000 require longer computation time.For each testing run 5 times, and the fitness value obtained from the average test results.The last test was a combination of inertia weight, performed to know the displacement velocity of each particle, inertia weight is tested in a range [0,1-0,9].The result of each parameter testing is shown in Table 2.

VI. CONCLUSION
In this paper, several tests have been performed using linear regression and particle swarm optimization methods to perform house price prediction.Based on the NJOP data of 9 houses, the system is modeling house price predictions into 7 models each of them represents one area.The area modeling includes Kelurahan Karang Besuki, Tunggulwulung, Lowokwaru, Puncak Trikora, Sumbersari, Dinoyo, Manggar.Based on the result from particle test, iteration test and inertia weight test can be concluded that M-1 represents Karang Besuki area get the best parameter for optimal prediction.Those best values of parameters obtained are 1800 particles, 700 iterations and of inertia weight 0.4 and 0.8 can get minimum prediction error RMSE as IDR 14.186.For the other model, the error prediction values are still large.Using different methods that match the time-series data will be used in the future research to obtain smaller error prediction values and using more data to get the better result.
i l i t i e s R e s t a u r a n t P u b l i c t r a n s p o r t a t i o n S c e n e r y

TABLE II
The experimental result shows that the fitness value based on data being tested.Furthermore, this research is better using more data.www.ijacsa.thesai.orgAfterknowing the result of parameter testing, error values are calculated based on RMSE, MAE, and MAPE.Comparison of test values is shown in Table3.

TABLE III .
RESULT OF TESTING METHOD