• No results found

Almighty Google knows almost EVERYTHING! (Big-data & Complex Networks)

N/A
N/A
Protected

Academic year: 2021

Share "Almighty Google knows almost EVERYTHING! (Big-data & Complex Networks)"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

“Almighty” Google knows

almost EVERYTHING!

(Big-data &

Complex Networks)

Hawoong Jeong

(Dept. of Physics, KAIST, KOREA)

D. Kim, J. Yun, H. Soh (KAIST), S.H. Lee (Oxford), P.J. Kim (APCTP), Y.Y. Ahn (Indiana U)

Big data: The next Google

What will happen in the next 10 years? “Integration of the worlds of

matter and information” • ELECTRONIC PAPER

• HAPTICS • VIDEO VISORS

• PRODUCTS WITH MEMORIES

• AUTONOMOUS ROBOTS

• GENETIC INFORMATION

• OPEN CONTENT MANAGEMENT

• THREE-DIMENSIONAL ENVIRONMENTS

• THE SEMANTIC WEB

• BETTER BROWSERS

(2)

It’s BIG

ONLY BIG?

• Definition of BIG DATA by Gartner Inc.

= 3V

• Volume : Really big in size

–Translation: Kor.Eng. << Kor.JpnEng.

• Velocity : fast & real time analysis

–Twitter/facebook : create & diffuse fast

• Variety : various format, no-fixed-format

(3)

Importance of unstructured data…

• Most of newly generated data is (>90%) • video, music, message, social-media,

geo-location etc, unstructured!

• Not a simple text or number anymore!!! • Needs new analysis technique as well as

new way of storage [e.g. CCTV recording]

2007 Presidential Election

in Korea via Google

• Pearson corr. R~0.98796 • Dong-A Newspaper survey R~0.98598 M. Lee Jeong H. Lee Moon

Google hits (millions)

Vo te s (mil lio ns ) http://openlook.org/blog/2007/12/21/cb-1195/

(4)

US presidential candidates (2008/4)

Obama McCain Hilary

Seoul Mayor (2011)

Na vs Park

(5)

46.6Million

As of 2011/10/25 23:15 One day before the election

54.3Million

Ask Google for help?!?

Final result? Na : Park = 46.2 : 53.4

As of 11/4 10:33PM

(6)

As of 11/4 10:33PM

US 2012 Presidential Election

US 2012 Presidential Election

Result: 50 % vs 48%

(7)

This year in Korea? Park vs Moon

People don’t lie when they SEARCH!

Search interest in weight loss peaks at every January!

(8)

Google claims that they can do better!

In flu season,

number of search for flu-related keywords is increasing! (Obviously! You don’t??)

Q: How many flu patients out there?

Why interested? If it increases rapidly, outbreak! CDC (Center for Disease Control) collects

% of patients with flu-like symptom from doctors of each state!

 Collect & statistics, it takes two weeks!

Nature (2008)

Find out best set of keywords for flu!

• Comparing 2003~2007 old (useless) CDC data with Google’s search history, pick up 50 keywords

 real time prediction of # of flu patients in 2008 (2 weeks faster than CDC, with geo-location)

(9)

BUT, in 2013, Google Flu fails!!

Disruptions:

Data Without

Context Tells a

Misleading Story

Why?? Media!!

(vaccine shortage

in 2013 Jan.

flooded the news)

Chanel Louis Vuitton Gucci

Then what???

Don’t forget the Power of Networks!

(10)

in Google… full of data!!!

• Google N-gram Project! Civil War

(1861-5)

Civil right movements (1955-1968)

J.-B. Michael et al Science (2011)

Grammar Correction via big-data

(11)

We have to combine Data

together with networks!

SNS = Data + Networks

• Twitter between politicians

“How do they communicate…

Clustered…

(12)

Social Network with Google???

S. H. Lee, P.-J. Kim, Y.-Y. Ahn, H. Jeong "Googling social interactions: Web search engine based social network construction", PLoS ONE e11233 (2010)

(13)

ESHIA Winter Workshop Speakers

Basic idea: Using search engine for

finding something…

Laszlo Barabasi’s Google Hits

(14)

To make a network, we need “link” information between 2 persons… HOW?

Ask !

“Laszlo Barabasi” “Hawoong Jeong”

Basic idea: Constructing weighted social networks by using web search engines

Hawoong Jeong Laszlo Barabasi wBJ = 8530 (Google correlation) wVJ = 2630 Alessandro Vespignani wBV = 6130 and so on …

(15)

ESHIA Winter Workshop Speakers

(16)

ESHIA Winter Workshop Speakers

(17)

ESHIA Winter Workshop Speakers

More familiar example:

Transportation Networks with agents

We are suffering everyday because of the traffic congestion… Why? Can we solve the problem? Phys.Rev.Lett.(2008)

(18)

∝ 1/width

What is important?

• Dynamics of the networks : The topology of the network itself often evolves in time

• Dynamics on the networks : Agents are moving on the networks (E.g. Driver wants to find the shortest paths , Finding OPTIMAL PATH)

∝ # of travelers ∝ length

CSSPL

What to optimize? Latency function (like time or cost)

length

width

Latency  

 1 #of travelers

Global Optimum

vs

User Optimum

(19)

Network flow with congestion

CSSP L

Based on the model of Roughgarden & Tardos, 2000

S

Cost function on path i Latency function

T

width of path i

length of path i

# of agent on path i

Given network with many agents going from S (source) to T (target), what will be the optimized distribution of agents for best performance??

Optimizations in physics

• Euler-Lagrange differential equation

• minimal free energy in thermodynamic physics • Fitting experimental DATA with formula

• Low temperature behavior of disordered magnets • …

Centralized control Minimizing Global Cost

Decentralized control Each agent minimizes

its own personal cost

User Optimization (Nash equilibrium)

Global Optimization

CSSPL

(20)

The “Price of Anarchy”

CSSP L

Koutsoupias & Papadimitriou, 1999

Price of Anarchy (Roughgarden & Tardos, 2000)

1 ≤

total cost of

Centralized control

Minimizing Global Cost

Decentralized control

Each agent minimizes its own personal cost

total

cost of Global Optimum

User Optimum

Price of Anarchy

“Price we have to pay not being coordinated by central agency” “Price of being selfish”

Price of Anarchy: Contrived Example

CSSP L

Pigous’s example: Congestion sensitive network

S T

What will be the min total cost, i.e. Global Optimum = ? 10 agents want to Go from S to T. If xa=x, then xb=10-x, ∴ total cost=10ᆞx + (10-x) ᆞ(10-x) = x2-10x+100=(x-5)2+75

(21)

Price of Anarchy: Contrived Example Global Optimum = 5x10 + 5x5 = 75 75/10 = 7.5$/person in average CSSP L xa = xb =5 S T Envy

BUT

The upper agents get envious of people with lower costs!

What will be the User Optimum?

(Nash Equilibrium: everyone happy) CSSP L

Price of Anarchy: Contrived Example

xa = 5

xb = 5

(22)

CSSP L user cost = 5 + 1 < 10

Price of Anarchy: Contrived Example

Move to Lower path +1 S T xa = 5-1 xb = 5+1  CSSP L

Price of Anarchy: Contrived Example

again +1 S T xa = 4-1 xb = 6+1 user cost = 6 + 1 < 10

(23)

CSSP L

Price of Anarchy: Contrived Example

again +1 S T xa = 3-1 xb = 7+1 user cost = 7 + 1 < 10 CSSP L

Price of Anarchy: Contrived Example

again +1 S T xa = 2-1 xb = 8+1 user cost = 8 + 1 < 10

(24)

CSSP L

Price of Anarchy: Contrived Example

User Optimum = 10 x10 = 100

avg 10$/per travel cost > avg 7.5$/per travel cost again +1 S T xa = 1-1 xb = 9+1 User Optimum = 10 x10 = 100 Global Optimum = 5x10 + 5x5 = 75 CSSP L

Price of Anarchy: Contrived Example

4/3 Price of Anarchy! S T xa = 5 vs 0 xb = 5 vs 10 There is a gap between global optimum & user optimum!

(25)

More realistic/complex example

• Assumption: traffic reaches at equilibrium

• Price of Anarchy on a real world

– the Boston Road Network

– (with real geometrical information like width, length, one-way etc)

– Traffic from central square (S) to copley square (T)

CSSP L

Boston Road Map

CSSP L

(26)

Boston Road Network

Start

End

CSSP L (nodes 59, edges 108, regular-like)

Latency function = ax + b

length Width

More realistic/complex example

• Assumption: traffic reaches at equilibrium

• Price of Anarchy on a real world – the Boston Road Network

– (with real geometrical information)

• Global optimum : mapping to Min-cost Max-flow problem • User optimum ~ approximate optimum:

Metropolis Algorithm and Annealing method to find out the optimum configurations

CSSP L

(27)

CSSP L Number of traveler =1

User Optimum Global Optimum

User Optimum Global Optimum

Number of Agents: 20

CSSP L

(28)

CSSPL

Variation of POA with Agent #

Where to use??

• To write a paper …

(29)

Making network more efficient

without central government??

• Lower PoA ~ better(?) system

(∵even w/o central control, user optimum

is closer to global optimum, better!)

• Let’s make better network with lower PoA

– Simple thought (by stupid government): construct more roads with our tax money!  But beware of Braess paradox!!!

(counter-intuitive consequences)

Braess’s Paradox

T x x 10 10 0: cost-free express road

User Optimum without middle arc = 150 = Global Optimum PoA=1

CSSPL

Price of Anarchy = 200/150 = 4/3

User Optimum with middle arc = 200 S

Again 10 travelers want to move from S to T.

(30)

In real Boston Road Network?

Start

End

CSSPL

(31)

London

NEW YORK

Quantifying Trading Behavior

in Financial Markets Using

Google Trends

(32)

How to buy and sell?

• Data

– Dow Jones Industrial Average (DJIA) p(t)

• from 5 January 2004 until 22 February 2011

– a set of 98 search terms related stock market

• Google trend from 2004 to 2011 (week) • n(t): how many searches at week t • Δn(t, Δt) = n(t) − N(t − 1, Δt)

with N(t − 1, Δt) = (n(t − 1) + n(t − 2) + + n(t − Δt))/Δt

If Δn(t-1, Δt)>0, sell p(t) at t and buy p(t+1) at t+1

<0, buy p(t) at t and sell p(t+1) at t+1 (i.e. search volume is ↑(↓), stock price will be ↓ (↑))

Using the word “Debt”

Best performance 326%

(

Δ

t=3 weeks)

(33)

if search volume increases(decreases)

compared to last 3 weeks’ avg Sell(Buy) on Monday

Buy(Sell) on next Monday

i.e. stock price will go down(up)

(34)

When sear ch v ol ume de cr eas es When sear ch v ol ume incr eas es

(35)

In Korea???

(by MoneyWeek mag.) Google search volume too small, using Koran portal Naver

Search vs KOSPI/KOSDAQ

Stock (0.67) Fund (0.77) Real estate (-0.64) region (-0.71)

(36)

Conclusion Summary

• Google knows many things and

can also predict many things…

• We can also do many things using

Google with big-data…

• Especially, combining big-data & networks can do better…

• Dynamics on complex network is fun! • Which direction are we going ???

–ASK GOOGLE itself!

Finally

References

Related documents