• No results found

Prediction of Language Users Based on Multi factor Analysis

N/A
N/A
Protected

Academic year: 2020

Share "Prediction of Language Users Based on Multi factor Analysis"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

2019 2nd International Conference on Informatics, Control and Automation (ICA 2019) ISBN: 978-1-60595-637-4

Prediction of Language Users Based on Multi-factor Analysis

Yuan SHEN

1

, Yin LU

2,3,*

and Fang-xin WANG

4 1

Bell Honors School, Nanjing University of Posts and Telecommunications, China

2

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, China

3

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Nanjing University of Posts and Telecommunications, Ministry of Education, China

4School of Computer Science, Nanjing University of Posts and Telecommunications, China

*Corresponding author

Keywords: Language Users, Grey Markov Prediction, Analytic Hierarchy Process.

Abstract. Predicting the number of speakers in different languages in the next 50 years will not only have far-reaching significance for human social development, cultural intercommunication, but also have a positive effect on the popularization of common languages and the protection of endangered languages. In order to predict the number of speakers in each language, this paper established a Grey Markov Prediction Model and used the Analytic Hierarchy Process to consider the political and economic strength, population size, language audience, immigration rate and use of electronic communication and social media to correct the results. After 50 years, English, Hindi, and Chinese will become the top three most used languages. The prediction and test results show that the corrected results are more accurate and closer to the actual development of the society, and the accuracy of the results is improved.

Introduction

With the social progress and cultural integration, the appearance of "global village" has become increasingly clear. In order to keep pace with the time, it is important for countries to communicate with each other and exchange what they have. Predicting the number of speakers of different languages in the next 50 years can not only point out the direction of future language learning, popularize popular languages, but also predict endangered languages and take measures in advance to protect the precious cultural heritage.

(2)

refers to people's basic opinions and propositions on a certain language, which is of great significance to the study of language selection tendency. Literature [8] finds that the guardian's use of Chinese to educate children is a key influencing factor for children to choose to use Chinese, children's language habits are more susceptible to the influence of the environment. Literature [9] believes that language, as the main ways of communication between countries, directly affects the growth of international trade and economic development. From the perspective of national development, literature [10] believes that individual multilingual ability becomes an important skill to integrate into the world, the more languages the citizens master, the stronger the national language ability will be.

Based on Gray Markov Prediction Model[11], this paper puts forward a method of using Analytic

Hierarchy Process to correct the result, which shows that the result reduces the computational complexity, and get a more accurate prediction. Our work provides certain reference for the population prediction research in the field of sociology and other areas.

The structure of this paper is as follows: the first part introduces the preliminary prediction model of the system, the second part proposes the modified model, the third part conducts simulation and analysis of the results and the fourth part summarizes the whole paper.

System Model

In order to obtain the optimal solution, this paper ignores the influence of force majeure and assumes that the change of the number of speakers of each language is relatively independent. Based on the fact that the data of world's language use is too little, and to ensure the accuracy of the results, this paper uses Gray Markov Prediction Model and existing data, to analyze and predict the number of

each language carries in the next 50 years. The specific steps of Gray Markov Prediction Model[11] are

as follows:

Step 1: Based on the variation law of gray time series, the original data sequence x(0)(n) is

established by using the existing data about the number of speakers of various languages, where n

represents the data in the nth year.

Step 2: Make the new data sequence x(1)(n) as a accumulation generation sequence of the original

data of x(0)(n), namely x(1)(k)={x(1)(1), x(1)(2), ..., x(1)(n)}, where

 1

 

 0

 

1

= k

i

x k x i

.

Step 3: The differential equation

    1

1 d

d

x

ax u

t   is established for the sequence generated by one

time accumulation, where a and u are taken as undetermined parameters and solved by the least

square method, and the time response formula can be obtained as:

 0

 

 1

 

 1

 0

1 u e ak u

x k x k x k x

a a

  

 

   

  . (1)

Step 4: Calculate the relative error between time response equation (1) and the original data

sequence, and record the error sequence as m states, which are represented as Q1, Q2, ..., Qm.

Step 5: If the probability of changing from state Qi to state Qj through k steps is denoted aspijk, then

the state transition probability can be denoted as

k ij k ij

i g p

G

 , where gijk is the frequency of state change,

and Gi is the number of the occurrence of state Qi. Therefore, the markov transition probability matrix

(3)

11 12 1

21 22 2

1 1

k k k

m

k k k

m

k k k

m m mm

P P P

P P P

P

P P P

 

 

 

 

 

 

. (2)

Step 6: The predicted value of step k in the prediction model can be obtained by the sum of the intermediate value of the state and the corresponding transition probability:

 1

 

1 2

1 2

k

i i ij

i

x k Q Q p

 . (3)

Where, Q1i and Q2i are respectively upper and lower limits of state Qi. According to equation (3),

the predicted number of future users of a language can be obtained as follows:

 

 

 

 

0

1

1, 2,..., 1

t

x t

y t n

x k

 

, . (4)

Modified Model

The above model is a representative prediction model, but it ignores the influence of many external factors such as politics and economy. Therefore, on the basis of the Gray Markov Model, this paper uses the Analytic Hierarchy Process to modify the results.

As time goes by, the change of global total population follows the Logistics model of population

retarded growth [14], and the trend diagram is shown in Fig.1. In the future, the number of people is a

quantifiable indicator. Assuming that the maximum number of people that the earth can support is R

and the coefficient of population growth is r, the total population of the earth can be expressed as:

 

0

1 e r t t

R N t

C  

 . (5)

[image:3.595.198.415.522.628.2]

Where, C is the change coefficient and N(t) is the total population of the earth in year t.

Figure 1. Schematic Diagram of Logistics Population Growth Model.

Through the research on the data and materials, it is found that the factors that have a great impact

on the number of language users include the political and economic strength of each region[12][13], the

population[14], the language audience[15], immigration rate[16] and the coverage rate of the use of electronic communication and social media. Based on the Analytic Hierarchy Process, we score the above five factors according to the weight of the number of speakers. Official or internationally used

languages are easy to gain advantages in popularization and teaching[6], while languages with lower

political status are at a relative disadvantage. Economic strength also has a substantial impact on language status. According to the ranking of GDP and national comprehensive strength, we put the factors of political and economic strength in the first place, the weight of the influence is the largest given 9 points. In a period of time, the number of speakers of different languages in the future is often

Population N(t) (million)

(4)

related to the current population. Countries with a large population base will also have more native speakers in the future. Therefore, we set the influence weight of population number factor as 7 points; The educational popularity of a country determines the audience level of different languages. In a few years, the most popular language will be spoken by most people in the country. So we give it a weight of 5. The change of immigration rate will inevitably affect the use of local language, but due to the influence of the environment is weak, so the weight is set as 3 points; With the development of electronic communication and social media, people have more and more opportunities to communicate with the outside world, but they cannot master a language fundamentally. Therefore, the weight of this influencing factor is the lowest, which is 1 point.

Considering the above factors, we can obtain the following weight matrix:

1 3 5 7 9

1 / 3 1 4 6 8

1 / 5 1 / 4 1 5 7 1 / 7 1 / 6 1 / 5 1 3 1 / 9 1 / 8 1 / 7 1 / 3 1

A

 

 

 

  

 

 

 

 

. (6)

By using the function Bij=Aij / Aij to normalize each column of the matrix, the feature vectors of

the matrix A are obtained:

1 (2.398,1.401,0.743,0.299,0.159) T

w  . (7)

By calculation, the maximum eigenvalue λ of matrix A is 5.43, consistency index Ci=0.11, and

consistency ratio CR=0.098< 0.1, through the consistency test, we can believe results are reasonable.

Based on this, the total matrix w(26*5) of the five influencing factors is listed for 26 countries. By

multiplying the total matrix and the feature vectors, the proportion of different languages in 50 years

w=w(26*5)w1 is obtained. Finally, the number of speakers of each language is calculated from the total

population of the world. The weight of the result obtained by the AHP and Markov are both given as 0.5, and the sum of the weight and the actual predicted result is the revised result, representing the number of users of each language in the final prediction.

Simulation and Analysis

In this paper, the Gray Markov Prediction and AHP modified results are simulated and the modified model is tested. The simulation and test results are shown in Fig. 2 and Fig. 3.

[image:4.595.187.410.646.775.2]

Based on the traditional prediction model, we modified the results obtained by the traditional method with AHP, predicted the number of speakers of various languages and compared it with the current number of speakers. The results are shown in Fig. 2. It can be found that English has the largest number of users and the fastest growth rate, followed by Indian and Chinese. English as the most widely spoken official language, is expected to see rapid growth, while India and China will also see a relatively large number of speakers in the future, due to their large population bases and the lack of a proper birth policy in India. In contrast, the number of speakers of Arabic has decreased slightly, which may be related to the slow economic development and population decline in the region.

(5)
[image:5.595.175.423.136.262.2]

In order to ensure the accuracy of predicted results, we use the same method to predict the current situation with early data. By comparing the forecast results with the actual one, it can be found that after the introduction of AHP, the predicted results are more accurate than the results using only common forecasting model, as shown in Fig.3.

Figure 3. Comparison of Different Prediction Model’s Results.

According to the result of our forecast, after 50 years, some changes have taken place. Malay, French, Hausa replaced Japanese, Portuguese, Arabic and become one of the top ten languages spoken by the largest number of people, the numbers of speakers of seven other languages change slightly, but still among the top ten. For Example, English instead of Chinese to be the language spoken by the largest number of people in the world. Compared with other algorithms, the complexity

of our algorithm is low, with only O(nT), and the prediction speed is fast. This algorithm also takes the

actual factors into account, which is more reasonable than only use the prediction model.

Conclusion

At first, this paper establishes the Gray Markov Prediction model to predict the number of speakers of different languages in 50 years, and then use Analytic Hierarchy Process (AHP) to modify the preliminary results, considering the influential factors such as regional political and economic power, population, language audience degrees, immigration rates as well as electronic communications and social media coverage. The results of simulation and test results show that, after considering the external factors we get a more accurate forecast data.

After 50 years, English, Indian, Chinese become three languages spoken by the largest number of people, Malay, French, Hausa replaced Portuguese, Japanese, Arabic and become one of the top ten languages spoken by the largest number of people. Such predicted results have important guiding significance for the future trend of language education, and can even predict endangered languages to remind people to take corresponding protection measures, which is of great significance for social development. On top of the above analysis, other important factors can also be taken into account later to obtain the best forecast results.

References

[1] Z. Boliang, C. Ruiyi, Z. Yilin and X. Wendi, "Language trend prediction based on adaptive composite language network," 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing, 2018, pp. 862-868.

[2] Y. Huang, H. Ying, M. Chen, Q. Yu, M. Jiang and H. Li, "Location Model Based on Population Distribution of Language," 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, 2018, pp. 2227-2230.

(6)

[4] Y. Gandica, "Population Preferences Through Wikipedia Edits," 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, 2018, pp. 278-283.

[5] S. L. Pucci, "Bilingualism and the Lifespan: Young Adult Heritage Speakers of Spanish," 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, 2018, pp. 222-226.

[6] Xiaole Ma.The national significance and the characteristics of The Times of language and culture communication[J].Dongyue review, 2018, 39(02):133-141.

[7] Hua Liu. A survey of language attitudes towards Chinese language programming in southeast Asia [J]. Language application, 2018(02):11-19.

[8] Jinfeng Li. Language attitude, language environment and language use of rural preschool left-behind children [J]. Language application, 2017(01):23-31.

[9] Lifei Wang. Measurement and influence of language barriers in China's foreign trade: analysis of gravity model [J]. Foreign language teaching, 2018, 39(01):14-18.

[10] Manchun Dai. Individual multilingual competence from the perspective of national language competence building [J]. Language application, 2018(01):2-11.

[11] Huanzhen Chen, Grain Production Prediction in Qingdao City Based on Grey Markov Model[D], Qingdao, Shandong, Qingdao Polytechnic University School of Architecture, 266033, 2013.5.

[12] Jianakoplos N A, Bischoff C W, Mankiw N G. For Use with Macroeconomics, fifth edition[M]. America. Worth Publishers.2003:55-57.

[13] Songfen Wang, Comparative study on comprehensive strength of major countries in the world [M]. Hu Nan. Hu Nan Press.1996:98-101.

[14] Thomas Malthus. An Essay on the Principle of Population[D],IN.ST.PAULS Churchyard. 1978.

[15] Jay Mcinerney, The Good Life[M], Seattle, USA. Bloomsbury, 2006:102-105.

[16] Liu Miao, Huiyao Wang, International Migration of China[M].China. Springer Group. 2017: 35-37.

[17] Xin Zhang, Prediction of annual precipitation based on improved gray markov model [J]. Mathematical practice and understanding, 2011, 41(11):51-57.

[18] Debao Gao. Construction and application of population prediction optimization model in shenzhen [J]. Journal of heilongjiang bayi agricultural university, 2017, 29(01):140-143.

[19] Jiabao Zhu. A study on the selection of provincial population temporal prediction model [D]. Dongbei university of finance and economics, 2013.

Figure

Figure 1. Schematic Diagram of Logistics Population Growth Model.
Figure 2. Predicted Results of the Number of Speakers of Each Language.
Figure 3. Comparison of Different Prediction Model’s Results.

References

Related documents

With  the  taxonomy  agreed  to  in  principle,  2  judges  not  involved  in  the  project,  a  third-year  linguistics 

professional training for teacher quality development in Higher Education especially for. rural

teaching and learning principles of students of the recent time teachers education dynamics of.. teaching and learning psychology of students and teachers

Thus, percutaneous aspiration of breast abscess is simple, painless, day care procedure and effective alternative method of treatment to incision and drainage in properly

The process of this review was guided by the following research objectives [ 1 ]: To identify all the primary, qualitative litera- ture on the reasons why women who live in

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

The national health priority areas are disease prevention, mitigation and control; health education, promotion, environmental health and nutrition; governance, coord-

Twenty-five percent of our respondents listed unilateral hearing loss as an indication for BAHA im- plantation, and only 17% routinely offered this treatment to children with