Sentiment Analysis and Classification of Photos for 2-Generation Conversation in China

Appropriate photos can help the Chinese emptynest elderly and young volunteers find common topics to promote communication. However, there are little researches on such photo in China. This paper used 40 online photos with 160 sessions for the conversation experiment for the Chinese elderly and young people to analyze these photos and classify them. Sentiment analysis of Chinese conversational texts was used to estimate the speaker’s attitude towards these photos. We collected the data set from the average value of sentiment analysis, the number of words uttered by the speakers, the pulse of the elderly, and the stress level of the youth for each photo. Principal Component Analysis (PCA) was carried out as a data preprocessing step to improve classification accuracies, and we selected four Principal Components (PCs) that account for 85.20% of total variance in the data. Next, we normalized these four PCs scores for Hierarchical Clustering Analysis (HCA) of the photos, and we got four clusters with different features. The results showed that photos in cluster2 were only optimal for the youth; cluster3 only made the elderly participants speak more; cluster1 and cluster4 was not suitable for the elders and the young people. This paper firstly classified the photos for 2-generation conversation and describing their features in China. Although, we did not find any photos suitable for both the elderly and the youth, this empirical study took a step forward in the investigation of photos for 2generation conversation in China. Keywords—Photo; 2-generation conversation; sentiment analysis; Principal Component Analysis (PCA); Hierarchical Clustering Analysis (HCA); China


A. Cognitive Impairment in the Empty-Nest Elderly in China
As most of the first generation of only-child in China has entered the age of marriage and childbearing, the pattern of "421" families (four grandparents, two parents, and one child) began to show a mainstream tendency [1].It has led to an increase in empty-nesters who do not live with their married children.They live with their spouses (empty-nest-couple) or alone (empty-nest -single).The empty-nesters accounted for 47.53% of the Chinese elderly in 2016, 60% of them have mental problems [2].Recent research has shown that empty-nest-related psychological distress is associated with cognitive impairment in the elderly.Ensuring good social ties and minimizing psychological distress may help delay or prevent the progression of cognitive impairment in the emptynest elderly [3].

B. Related Works
A large number of works have shown that daily conversations with the elderly can promote their social communication and maintain their cognitive function.Although it is difficult, using photos as topics can be a "switch" to activate the conversation [4].Now in China, the study the voice interaction with companion robots aims to improve the cognitive level of the elderly [5].
However, most of the elderly in China have poor mandarin, and their intention is too colloquial.This first put a great test on the technology of speech recognition.Besides, the language itself has the characteristics of ambiguity and diversity, and there is great uncertainty in man-machine dialogue [6].Moreover, elderly care requires emotional devotion, which is irreplaceable for the robot.Relevant government should take adequate measures to actively respond, such as vigorously improving community home care services, and organizing and promoting volunteer service activities to help the elderly from physical and psychological aspects.
In summary, the photo is currently the most effective media to help the elderly and young volunteers communicate, but there are little researches on the photos for 2-generations conversation in China.

C. Research Objectives
The goal of this research is to create a "2-generation conversation support system" in China.It can help the elders and the young volunteers quickly and effectively find and switch the photo they like to talk about on the web.However, few studies in China revealed the features of appropriate photos for the 2-generation conversation.The purpose here is to investigate the effects of the photos on the 2-generation conversation in China and classify them.

D. Materials and Methods
We set up 160 sessions with 2 participants for 40 photos for the experiment.

1) Sentiment Analysis:
There are many methods to investigate the effects of photos on the conversation, such as the questionnaire survey, physiological monitoring, and emotion recognition method.Sentiment analysis as an emotion recognition method is designed to automatically discriminate textual data such as comments, opinions, opinions published by users with emotions, and calculate the emotional intensity of each text data to observe the user's emotions.The Chinese dictionarybased sentiment analysis is to mark the emotional polarity and intensity of these topic words by extracting the domain keywords in the corpus text to be analyzed [7].At present, the development of the Chinese general sentiment dictionary has been relatively complete, such as the Chinese artificial intelligence open platform of Baidu, Inc (https://ai.baidu.com).We used "Baidu AI" to carry out the sentiment analysis.
2) Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA): This study collected data set from the average value of sentiment analysis, the number of words uttered by the speakers, the pulse of the elderly, and the stress level of the youth for each photo.To explore similarities and hidden patterns among samples where relationship on data and grouping are until unclear, principal component analysis (PCA) and hierarchical cluster analysis (HCA) are the most widely used tools.Moreover, an objective multivariate statistical methodology (PCA and HCA) incorporating can effectively do the data pre-preprocessing [8].Principal component analysis (PCA) is a method used to perform dimensionality reduction [9].We applied PCA to the data set and obtained six principal components (PCs).We selected four PCs that explain 85.2% variance in data for hierarchical clustering analysis (HCA) of the photos.Ward's method is the only agglomerative clustering method based on a classical sum-of-squares criterion.It can produce groups that minimize within-group dispersion at each binary fusion, look for clusters in multivariate Euclidean space [10].By this method, we obtained four photo clusters.

E. Results and Conclusion
For the first time, this paper classified photos for 2generations conversation in China and obtains four clusters with different features.Because there are not many experimental photos and the types of participants are incomplete, it is not appropriate to organize the clusters into a complete classification system.Nevertheless, from the above analysis, it is feasible to combine the PCA method with HCA for photo classification, and it can receive good results, enabling researchers to quickly find the difference between photos, thus improving the classification quality.With the deepening of the work, it is expected that a classification system for photos can be formed based on the method of this paper.

A. Participants
The participants were four elderly Nanjing citizens over 65 years without dementia (two males, two female) and four female students from Nanjing Medical University.They did not know each other before the experiment.
In Table I, we present the participants and the conversation sessions number.We set up 160 sessions with 2 participants and 1 photo each, which ensured that each elderly participant had the opportunity to talk to each young participant about each photo.

B. Photos and Apparatus
40 photos searched from Baidu (http://image.baidu.com)were used for the 2-generation conversation.Referring to the related work in Japan, these photos are about things around that people are familiar with, such as the photo7 of "lotus" showed in Fig. 1.
We used A MacBook Air and a projector for displaying photos, and an iPad mini4 for video recording.Experiment  processes were recorded to convert the conversation to text (Fig. 2).
A fingertip oximeter (Guangdong medical device registration number: 20152210273) was used for measuring the pulse of the elderly participants (Fig. 3), which was friendly to the elderly users.We used the Stress-Check-Sheet for subjective stress survey for the young participants.It was a 7-level rating from 1 to 7, 1 is the lowest, and 7 is the highest stress level (Fig. 4).

C. Design
A research design aims to investigate the effects of the photos on the 2-generation conversation.Thus, we set this experiment to the form of 2-person conversation by watching the photo.In Table II, we present six evaluation factors for each photo.They are positive probability, number of words, pulse, and stress of the participants.We collected the data set from the average value of these factors.PCA was carried out as a data preprocessing step before HCA to improve classification accuracies of the photos.

D. Procedure
The following was the experimental procedure: • Participants were instructed to watch the photo and talk freely about it with their partners.
• Each photo was automatically played for 1 minute.
• During the conversation, the pulse of the elderly was recorded with a fingertip oximeter.
• At the end of each session, the young participant evaluated their stress level with the Stress-Check-Sheet (Fig. 4).• After the experiment, conversation video recording was exported for textualization, sentiment analysis, and word counts.
• PCA was applied to the data set.
• HCA was applied to the normalized principal component scores by ward method.
E. Analysis 1) Sentiment Analysis: Sentiment Analysis of Chinese text can use Baidu AI Open Platform to automatically classify the opinioned text into sentiment polarity: positive, negative, and neutral.
We analyzed the sessions sentence by sentence by code written in Python 2.7.10.For example, the sentence of "This is a panda, a rare animal." is more positive, because its sentiment polarity is 2 (positive) and the positive probability is 0.607345 more than the negative probability of 0.392655.There are only three classes of sentiment polarity, and the negative probability is equal to 100% minus the positive probability.In order to compare the differences between the photos accurately, we only report the average of positive probability for each photo.
2) Principal Component Analysis (PCA): The following was the results of PCA of the data set shown in Table III: • Standard Deviation: This is simply the eigenvalues in our case since the data has been normalized.The average of all eigenvalues is 0.965, which is exceeded up to PC4.Since an eigenvalue <1 would mean that the component explains less than a single explanatory variable we would like to discard PC5-PC6.
• Proportion of Variance: This is the number of variances the component accounts for in the data.PC1 accounts for 28.07% of the total variance in the data, PC2 accounts for 20.78% of the total variance, and PC3 accounts for 19.36% of the total variance.These three PCs are essential.
• Cumulative Proportion: This is simply the accumulated number of explained variance, i.e., if we used PC1-PC4, we would be able to account for 85.20% of total variance in the data.The factor loading is shown in Fig. 5: • PC1 has a strong correlation with the number of words uttered by the elders and the youth.The less the elders speak, the more the youth speak.
• PC2 has a strong correlation with positive probability of the elderly and the number of words uttered by the youth.We found that the more positive the elders become, the less the youth speak.
• PC3 has a strong correlation with the stress level of the youth and the pulse of the elders.The lower the stress of the youth is, the lower the pulse of the elders is.
• PC4 has a strong correlation with the positive probability of the youth and the pulse of the elders.The less the positive probability of the youth is, the lower the pulse of the elders is.
In summary, we summarize the followings: • PC1 is "number of words uttered by participants".
• PC2 is "positivity of elders" for "number of words uttered by youth".
• PC3 is "stress of youth" for "pulse of elders".

F. Photo Clustering
PCA was carried out as a data preprocessing step to improve classification accuracies.For four PCs selected from the results of PCA, their scores were normalized to an average value of 0 and variance 1.In all cases, the ward method was used, which is a method that treats each principal component equally, emphasizes factors with a small contribution rate.
Then HCA was performed by RStudio3.3.3.A dendrogram of 40 photos as a result of HCA is shown in Fig. 6: the vertical line indicates the cluster of connections, and the length of the horizontal line indicates the distance between the two types of connections.
As we can see from Fig. 6, the distance between photo2 and photo14 is the closest (<1), so photo2 and photo14 are combined into one cluster (photo2, photo14), so nodes photo2 and photo14 are first connected in the dendrogram to make it become a child node of a new node (photo2, photo14) and set the height of this new node; then select the nearest distance among the remaining clusters, and the distance between (photo2, photo14) and photo5 is the closest (1), so (photo2, photo14) and photo5 are combined into one cluster ((photo2,photo14), photo5), which is reflected in the dendrogram, connecting nodes (photo2, photo14) and photo5 to make it a child node of a new node ((photo2,photo14), photo5), and set the height of this new node to 1;.... Generate a dendrogram in this mode until there is only one cluster left.
It can be intuitively seen that if we want to get a clustering result, we just cut a vertical line on the dendrogram.For example, this dendrogram can be cut to 4 clusters with the red borderline.
HCA is an unsupervised learning, a classification model in the absence of labels.The primary assumption is that there is a similarity between the data, and the similarity is valuable, so it can be used to explore the features in the data to generate value.In China, due to the lack of research on photos for the 2-generation conversation, we used HCA to group photos into four different clusters, each of which should have its unique properties.Next, we will conduct an in-depth analysis of each cluster separately to get more detailed results.

III. RESULTS
The PCs scores of each photo in each cluster are represented on the principal component space for compare.Fig. 7 shows the location classes in 4th-dimensional principal component space with the confidence interval around a linear regression line.The features of photos for each cluster can be interpreted as follows: 1) Cluster1: Photos in cluster 1 focus on the positive correlation area of PC1 and negative correlation area of PC3.It has a strong correlation with "number of words uttered by participants" and "stress of youth".
2) Cluster2: Photos in cluster 2 focus on the positive correlation area of PC1 and negative correlation area of PC4.It has a strong correlation with "number of words uttered by participants" and "positivity of youth".
3) Cluster3: Photos in cluster 3 focus on the negative correlation area of PC1.It has a strong correlation with "number of words uttered by participants".4) Cluster4: Photos in cluster 4 focus on the positive correlation area of PC4.It has a strong correlation with "positivity of youth" for "pulse of elders".5) Photo38 in Cluster2 (Fig. 8): It has the highest PC1 score in all photos, which has the strongest correlation with the "number of words uttered by participants".

IV. DISCUSSION
We have shown that PCA is very useful in reducing the dimension of data.We selected four PCs that explain 85.2% variance in data.HCA was used to group these normalized PCs scores in four clusters.Fig. 9 restored the factors data for each cluster and photo, which can explain each photo cluster in more detail in Table IV: 1) Cluster1: It has a strong correlation with "number of words uttered by participants" and "stress of youth".In cluster1, all "number of words uttered by the elderly" below average, and all "stress of youth" above average.
2) Cluster2: It has a strong correlation with "number of words uttered by participants" and "positivity of youth" for "pulse of elderly".In cluster2, almost all "number of words uttered by youth" above average, and almost all "positivity of youth" above average.
3) Cluster3: It has a strong correlation with "number of words uttered by participants".In cluster3, almost all "number of words uttered elders" above average, oppositely all "number of words uttered by youth" below average.4) Cluster4: It has a strong correlation with "positivity of youth" for "pulse of elders".In cluster4, almost all "positivity of youth" below average, and almost all "pulse of elderly" below average.

V. CONCLUSION
In this study, we described an emotion recognition method of sentiment analysis of Chinese conversation texts, which can obtain the positive probability of speakers for the photos during the conversation.The data set from the average value of sentiment analysis, the number of words, pulse, stress was used for PCA to obtain four PCs.As a result of HCA of normalized these PCs scores, we got four photo clusters with different features, from which we generalized the following conclusions: • Photos in cluster2 are optimal only for the youth, which make the youth speak more and feel positive, and the photo38 make the youth speak the most, oppositely the elders speak the least in all photos.
• Photos in cluster3 make the elders speak more, oppositely make the youth speak little.• Photos in cluster1 make the elders speak little and the youth under stress.
• Photos in cluster4 make the youth do not feel positive and have no appeal to the elders.
From the above analysis, it is feasible to combine the PCA method with HCA for photo classification, and it can receive good results, enabling researchers to quickly find the difference between the influence factors between photos, thus improving the classification quality.This paper firstly analyzed and classified the photos for 2generation conversation and describing their features in China.However, we did not find the photo cluster optimal for both the Chinese elderly and young people in this study.Even so, we believe that our empirical study is one step forward to investigate the effectiveness of photos for 2-generation conversation, which can help the Chinese elderly promote their social communication and maintain their cognitive function.With the deepening of the work and the increase in the number of photo samples, it is expected that a classification system for photos can be formed based on the method of this paper.

VI. FUTURE WORKS
We should continue our experiments with more photos to find photos optimal for both the Chinese elderly and young people.
Sentiment analysis only considers conversation text and ignores the speaker's information.That is very useful for classifying photos, mainly in the speaker's different preferences for the same photos.For example, for the photo5 with "Yuga" (Fig. 10), the elderly women were concerned about "it is difficult for the elderly", while the elderly men were concerned about "advantages of exercise".This study only divided the participants into the elders and youth, whose information can be expanded, such as gender, education level, and birthplace.Next, we will consider how to introduce such information into our study to better improve the classification accuracies of the photos.

Fig. 2 .
Fig. 2. State of experiment.A pair of participants are in the first row, who are talking about the projected photo.

Fig. 3 .
Fig. 3. Fingertip Oximeter was used to record the pulse of the elderly participants.

Fig. 4 .
Fig. 4. Stress Check Sheet for the young participants.1 is the lowest, and 7is the highest stress level.

Fig. 6 .
Fig. 6.Cluster Dendrogram: clustering of normalized principal component scores by ward method.It is set to 4 clusters with red borderline.

Fig. 7 .Fig. 8 .
Fig. 7. Location classes in 4th dimensional principal component space with confidence interval around a linear regression line.

Fig. 9 .
Fig. 9. Factors data for each cluster and photo.

TABLE I .
PARTICIPANTS AND SESSIONS FOR EACH PHOTO

TABLE II .
EVALUATION FACTORS FOR PHOTOS

TABLE III .
THE VALUES OF PC1-PC6

TABLE IV .
FEATURES OF EACH PHOTO CLUSTER