Study of the Application of Disagreement based Collaborative Regression in Log Interpretation

(1)

) 8 1 0 2 S M S M C ( s c it s it a t S l a c it a m e h t a M d n a n o it a l u m i S , g n il e d o M , l a n o it a t u p m o C n o e c n e r e f n o C l a n o it a n r e t n I 8 1 0 2 8 7 9 : N B S

I -1-60595- 25 -9 6

f

o

y

d

u

t

S

t

h

e

A

p

il

c

a

it

o

n

o

f

D

i

s

a

g

r

e

m

e

n

t

-

b

a

s

e

d

C

o

ll

a

b

o

r

a

it

v

e

n

o

i

s

e

r

g

e

R

i

n

L

o

g

I

n

t

e

r

p

r

e

t

a

it

o

n

u

Y - e

z Z

h

H

E

N

G

1

,

Z

h

a

o

- i

h YE

u

1

a

n

d

C

o

n

g

-

h

u

i

Z

H

A

N

G

2

1_D_e_p_a_r_t_m_e_n_t_o_f_A_u_t_o_m_a_it_o_n_,_T_s_i_n_g_h_u_a_U_n_i_v_e_r_s_ti_y_,_B_e_i_ij_n_g₁₀₀₀₈₄_,_C_h_i_n_a

2_C_h_i_n_a_O_li_if_e_l_d_S_e_r_v_i_c_e_s_L_i_m_ti_e_d_,_S_a_n_h_e_,_H_e_b_e_i₀₆₅₂₀₁_,_C_h_i_n_a

: s d r o w y e

K Semi-supervised learning, Disagreement-based, Collaboraitve regression, Log . n o it a t e r p r e t n i t c a r t s b

A .In thefield of log interpretation ,it’seasy to acquirea lo tof data ,however ,i trequires , n o i t a m r o f n i l e b a l e h t t e g o t t s o

c thus the labeled samples are often no tenough .The secondary r e b m u n e g r a l a d n a s e l p m a s d e l e b a l w e f h t i w k s a t n o it a c i l p p a l a c i p y t a s i y t i s o r o p f o n o i t a t e r p r e t n i n o r t s f o s e g a t n a v d a s i d e h t s a h n o i t a t e r p r e t n i l a u n a M . s e l p m a s d e l e b a l n u f

o g subjectivity and low

y c a r u c c

a . A disagreement-based c -o training style semi-supervised regression algorithm was it a n r e t l a n a s a d e s o p o r

p ve to the manua linterpretation .Two kNN regressors with disagreemen t f n o c h g i h h t i w s e l p m a s d e l e b a l n u s l e b a l m e h t f o h c a E . d e y o l p m e e r e

w idenceleve lfortheotherto

. s e t a m it s e n o i s s e r g e r e v o r p m

i Themethod wasverifiedthroughtheexperimentswhichshowedtha t r o f r e p n o i t a z i l a r e n e g e h

t mance of this semi-supervised mode lis better than the other supervised s

l e d o

m insuchcases.

n o it c u d o r t n I f o s s e c o r p e h t n

I petroleumexploraiton[ ,p1] eopleusespecia linstruments ,suchasacousticwave , , y t i v it c a o i d a r d n a , y t i c i r t c e l

e to measurevariousparametersofthestratuma tdifferen tdepthsinthe ,

l l e

w and then analyzetheparamete . rs Thisiscalled log interpretaiton .Becausereservoirresources , s k c o r r o s k c a r c e r o p d e t c e n n o c r e t n i d n u o r g r e d n u n i d e t u b i r t s i d y ll a r e n e g e r

a prediction ofporosity

g o l f o s s e c o r p e h t n i t n a t r o p m i s

i interpretation. Prediction of porosity includes primary d n a n o i t a t e r p r e t n

i secondary interpretaiton .Tradiitonally ,people usethe dataacquired to calculate n o i t a t e r p r e t n i y r a m i r p f o s s e c o r p e h t n i y ti s o r o

p based on response equation .Bu tthe secondary s d e e n n o i t a t e r p r e t n

i tobecarriedoutt hroughcoreanalysis .Coreanalysisi st heprocessofmeasuring d n u o r g r e d n u e h t n i h t p e d e m o s t a s e l p m a s k c o r n i y ti s o r o p l a u t c a e h

t andt hencorrectingt heprimary

h t y b t l u s e r n o i t a t e r p r e t n

i ese samples. In actua lproduciton ,the resul tof primary interpretation is . t s o c f o t o l a s e r i u q e r o s l a s i s y l a n a e r o c d n a , e t a r u c c a n i n e t f

o Moreover ,secondary interpretaiton

h g u o r h

t aritficia lmethodsi shighlysubjecitveandhashighrequirementsfort echnicians.

a t a d l a c i r o t s i h g n it s i x e e h t f o d i a e h t h ti w t u o d e i r r a c e b n a c l l e w n a f o s i s y l a n a e h t , t c a f n

I of

a e r a e m a s e h t n i s l l e w r e h t

o . Artificia lintelligencetechnology can independenltydiscoverandlearn e l p m a s w e n f o t u p t u o e h t t c i d e r p d n a a t a d l a c i r o t s i h g n i t s i x e m o r f s e l u

r s .Itswayofprocessingdata

. y r o e h t l a n o i ti d a r t m o r f t n e r e f f i d y l e t e l p m o c s

i Many scholars have applied aritficia lintelilgence o l o t y g o l o n h c e

t g interpretation[2][3][4] , bu t mos t of these are supervised methods . I n log y r t s u d n i n o i t a t e r p r e t n

i ,usuallythecos tofgetitngthelabe linformaiton ishigh ,so ti ’soften hard to t

e

g toomanylabeledsamples .Therefore ,theaccuracyofthesesupervisedlearningmethodsisoften . h g i h t o n

Because core analysis can only be carried ou ta tsome depth ,the secondary interpretaiton of l a c i p y t a s i y ti s o r o

p appilcaiton wherethedatase tconsistsofasmal lnumberoflabeled dataanda . a t a d d e l e b a l n u f o r e b m u n e g r a

l Experienceshowst ha tgenerallysupervisedmethodst endt ofalli nto e r e h w s n o it a u t i s r o f g n it t i f r e v

o labeledsamplesarescarce ,whliesemi-supervisedmethodcan make l n u f o e s

u abeled samples and perform better. A tpresent ,thereare few studies of semi-supervised . y r t s u d n i n o it a t e r p r e t n i g o l n i n o it a c i l p p a g n i n r a e

l In this paper ,we propose to apply a semi

-d e s i v r e p u

(2)

t n e m i r e p x

e s wtih the actua l produciton data of China Olifield Services Limtied show tha t r

o i r e p u s s i n o i s s e r g e r e v it a r o b a ll o

c toothermethodsinthisapplicaiton.

i m e

S -supervsiedLearning

h t i

W thedevelopmen tofmoderni nformaitont echnology ,tii susuallyeasyt oacquireal argenumber . ] 5 [ n o it a m r o f n i l e b a l e h t t e g o t t s o c e m o s s e k a t t i t u b , s d l e i f y n a m n i s e l p m a s d e l e b a l n u f

o The

i m il s i d o h t e m g n i n r a e l d e s i v r e p u s f o e c n a m r o f r e p n o it a z i l a r e n e

g ted by the number of labeled

d n a , s e l p m a

s ifonlyunsupervisedlearningisadopted ,thevalueoflabeledsamplesiswasted ,while i

m e

s -supervisedl earningmethodcanmakeuseofbothl abeledandunlabeledsamples[ . 6] i

m e s r a l u p o p t s o m e h t ,t n e s e r p t

A -supervisedl earningmethodi sdisagreement-basedcollaboraitve g

n i n r a e

l ,whichtakesadvantageoft hedifferencesbetweenmulitpleclassifiersorregressorstomake .

s e l p m a s d e l e b a l n u f o e s

u I thasthe advantagesof few assumpitons ,simpleand effecitve learning h

t e

m ods ,and wide appilcation scopeso ti’ hs t emainstream algortihm in semi-supervised learning .

y l t n e r r u

c Figure1i saschematicdiagramoft hedisagreement-basedcollaboraitvel earning[ . 7] t

n e m e e r g a s i d f o n o it a c il p p a r o

F -basedcollaborativelearningi nregressionproblems ,ZhouandLi[ 8] d e n i m r e t e d e b n a c r o s s e r g e r e h t y b d e t a m it s e s l e b a l e l p m a s e h t f o y ti c it n e h t u a e h t t a h t d r a w r o f t u p

d n a s e l p m a s e h t f o l e v e l e c n e d i f n o c e h t g n i n i m a x e y

b samples wtih high confidence level is

n o i s s e r g e r f o d n e r t e h t h ti w t n e t s i s n o

c . In this paper ,a disagreement-based collaboraitve learning d

o h t e

m isdesigned forthe secondary interpretaiton of porosity ,which wli lbe detalied in the nex t .

n o it c e s

Figure1. Schemaitcdiagramoft hedisagreement-basedcollaborativelearning.

n o it a t e r p r e t n I y r a d n o c e

S o fPorostiyBasedonDsiagreement-basedCo llabora itveLearning

y l n o s d o h t e m d e s i v r e p u s l a n o it i d a r

T uitilze labeled samples .This algortihm mainly solves the e

s u e k a m o t w o h m e l b o r

p fo unlabeledsamplestoi mprovet hegenerailzationperformanceundert he e

r e h w s e c n a t s m u c r i

c labeledsamplesisi nsufficient. t

e

L L={(x1,y1),(x2,y2)...(xn,yn)} denote the labeled sample set, na d U={x1',x2'...xn'} denote the

t e s e l p m a s d e l e b a l n

u .Disagreement-based collaborative regression utliizes se tLandU to train a r

o s s e r g e

r f :X →Y. Theprocessoft healgorithmi sdesignedasfollows.

t e S a t a D e h t e z il a it i n I

y l m o d n a

R pickN_Lsamples from labeled datase tto form tse Lused fortraining ,and theremaining s

i a t a

d retainedast es tse.tThenrandomlypickN_U samplesfromunlabeleddatasett oformsetU.

r o s s e r g e R e r u g if n o C

s i h t n

I paper ,kNN regressor, which is simple bu teffecitve ,is used as the base learner. The n

o it a r u g i f n o

c of the regressor includes determining the neighbor number k and distance c

i r t e

(3)

: c i r t e m e c n a t s i d d e n i f e d e h

t X1,X2...Xk .Suppose their labels areY1,Y2...Yk ,then the labe lof Xu is

: e b o t d e t a m it s e

2

1 ... k

u

Y Y

k

+ + +

= ( 1) n

i s e l p m a s d e l e b a l y s i o n e m o s e r a e r e h t e s o p p u

S L ,asshown in Figure 2 ,C isa noisy sample . l

n o n e h

W yonekNNregressorisemployed ,supposeanunlabeledsampleX1 islabeledthenpu tinto

L .ForasampleX2whichi sverycloset oX1, i twli lsufferfromnoisemoreseriouslyt hanX1 .

Figure2 .Singleregressoraffectedbyno . ise o

w t f i t u

B regressors wtihcertaindifferencesareemployedandX1isl abeledbyanotherregressor ,

X2maysufferfromnoiseonlyonce.Soi ti swisert ouset woregressorst oreducet heeffec tofnoise.

r o

F a sampleXuand two kNN regressorsK1 and K2 , tle Ω₁={X₁₁,X₁₂...X₁_k}denote these tof k

-s e l p m a s g n i r o b h g i e n t s e r a e

n of Xu on K1 , and Ω2={X21,X22,...X2k}denote the se t on K2 . If

1

Ω andΩ₂isno tsame ,K1andK2 isdifferen tonXu. Thedisagreemen tleve loftwo regressorscanbe

e c n e r e f f i d e h t y b d e r u s a e

m onlabeledse tL .

e z il a i ti n

I L₁andL₂ wtihL ,whichdenotet hel abeledse tofK1andK2.

g n i n i a r T e v it a r o b a ll o C

, s m e l b o r p n o it a c i f i s s a l c n

I classifierscanprovidean esitmated probabiilty foreveryclass .Suppose f

o y t il i b a b o r p e h t t a h

t asampleX1belonging to classAis0.7and to classB is0.3,whileasample

X2 belonging to class A is 0.9 and to class B is 0.1 ,then obviously X2 is more confiden tto be

. d e l e b a

l

e r a s n o i t c i d e r p e l b i s s o p e h t , s m e l b o r p n o i s s e r g e r n i , y l e t a n u t r o f n

U infintie .The defintiion of

y t i c i t n e h t u

a is thekey to this algortihm .S ma ples with high confidence leve lshould eb consisten t n

o i s s e r g e r f o d n e r t e h t h t i

w ,soasamplewtih high confidenceleve lshould makethemean squared .

e s a e r c e d t e s e l p m a s d e l e b a l e h t n o r o s s e r g e r e h t f o ) E S M ( r o r r

e Since repeatedly measuring t he

t a e r g f o s i t e s d e l e b a l e l o h w e h t n o E S

M computaitona lload ,thefollowingmethodisadoptedasan .

n o i t a m i x o r p p a

e l p m a s d e l e b a l n u n a r o

F Xa inU ,useK1topredic ttislabe lYa .Le tΩ={X_a₁,X_a₂...X_a_k}denotethe

k f o t e

s -neares tneighboring samples of Xa ,and their labels areY_a₁,Y_a₂,...Y_a_k .Le tK₁(X)denote the

K f o e u l a v d e t a m it s

e 1toX .Thent heerrorofK1 on Ω isdefinedas:

2 1 1

] ) ( [

k

i a i

a i

X K Y E

=

−

=

∑

( 2)

t e

L '

1

K denote therefined regressor which hasuitilzed theinformaiton provided by (Xa ,Ya) .The

f o r o r r

e '

1

K onΩi s:

2 ' '

1 1

] ) ( [

k

i a i

a i

X K Y E

=

−

=

∑

( 3)

X f o l e v e l e c n e d i f n o c e h t n e h

T acanbedefinedas:

' a E E

(4)

n i e l p m a s e h

T Uwhich maximizes Ta is the samplewith highes tconfidence .L Xe t m denote this

K . e l p m a

s 1 wil lpu t(Xm,Ym) into L2 ,and in the same iteraiton K2 wil lpu tthe sample with the

o t n i e c n e d i f n o c t s e h g i

h L₁.

s i h t f o y ti l i b i s a e f e h t f o s i s y l a n a n a s i g n i w o ll o f e h

T criterion.

, t s r i

F assumethat Xmisonly among thek-neares tneighborsofsomesamplesinΩ .In thiscase ,

y lt n e r a p p

a maximizing Ta also makesthe MSE ofthe regressoron thewholelabeled se tdecrease

t s o m .

X t a h t e m u s s a , d n o c e

S misno tamong thek-neares tneighborsof any samplesinΩ. In thiscase ,

, o r e z s i l e v e l e c n e d i f n o c s ti ) 4 ( o t g n i d r o c c

a thusi tleads toacontradiciton. e

m u s s a , d r i h

T tha tXm is among the k-neares tneighbors of some samples inΩ and some other

s e l p m a

s no tinΩ .In thiscase ti’shard toevaluate whetherXmcan makestheMSE ofregressoron

. t s o m e s a e r c e d t e s d e l e b a l e l o h w e h

t Nevertheless ,experimentsshowtha tin mos tcasesthismethod .

e v it c e f f e s i

n o it i d n o C g n i p p o t

S

: s r u c c o s n o it i d n o c e s e h t f o y n a l it n u C n o it c e s n i d e n o it n e m s s e c o r p g n i n i a r t e h t t a e p e R

)

1 Themaximumnumberofi terationi sreached.

)

2 Themaximumamoun toft imei sexceeded.

)

3 Nosamplet ha tcanmake(4)beposiitveexistsi nU. e

l p m a s w e n a r o f t u p t u o l a n i

F X :i s

2

1( ) ( )

2

X K X K

Y = + ( 5)

s t n e m i r e p x E

f o a t a d n o i t c u d o r p l a u t c a e h t m o r f e r a a t a d l a t n e m i r e p x e e h

T China Olifield Services Limited in

. a e r a i a h o B

. t n e m i r e p x e e h t n i d e s u e r a s l l e w t n e r e f f i d m o r f a t a d f o s t e s o w

T Each ad tase tismadeupof300

. s e l p m a s d e l e b a l n u 0 0 7 4 d n a s e l p m a s d e l e b a

l Each sample has 7 attributes :diameter ,neutron , ,

e m it c it s u o c

a Gammaray ,deepresistivtiy ,density ,andphotoelectricabsorpitoni ndex .Allt hei npu t .

1 , 0 . 0 [ o t d e z il a m r o n n e e b e v a h s e t u b i r t t

a 0 ].Theoutputi sporostiy. K

r o s s e r g e r f o e u l a v k e h

T 1issett o3 ,andEucildeandistancei sused:

2 / 1 2 , , 1

) | |

( ) ,

( _a _b d _a_l _b_l

l

X X X

X D

=

−

=

∑

( 6)

K r o s s e r g e r f o e u l a v k e h

T 2isalsose tto3 ,bu tMinkowskidistanceisusedandthepvalueisse t

: 5 o t

/ 1 , , 1

) | |

( ) ,

( d p p

l b l a b

a

l

X X X

X D

=

−

=

∑

( 7)

e h

T maximum numberof tieration T is se tto 100 ,the maximum amoun tof itme is se tto 600 .

s d n o c e

s 50%ofl abeledsamplesareusedfortrainingandtheremaining50%ofsamplesarekep tas .t

e s t s e

t Ineachi teraiton ,unlabeleddatasetUcontains100sampleswhicharerandomlypicked. ,

n o s i r a p m o c r o f d e t s e t s i m h t i r o g l a n o i s s e r g e r r a e n i l n o n

A whichi susuallyadoptedbymanuall og

, r e v o e r o M . n o i t a t e r p r e t n

i b ka -c propagaiton neura lnetwork is also tested as a representative of g

n i n r a e l d e s i v r e p u

s .

d e t a e p e r s a w t n e m i r e p x e e h

T fivehundred itmesforeach dataset .Theresul tisshownin Table1 2

e l b a T d n

a . Average relaitve error and correlation coefficien tare employed to measure their .

e c n a m r o f r e p n o it a z i l a r e n e

g The data in Table 1 and Table 2 refer to the average value of five .

s n u r d e r d n u

(5)

t n e m i r e p x E . 1 e l b a

T result ons 1. et m

h ti r o g l

A AverageRelaitveError CorrelaitonCoefifcient e

v i t a r o b a l l o C

n o i s s e r g e

R 12.7% 0.8576

n o i s s e r g e R r a e n i l n o

N 20.4% 0.7922

k c a

b -propagation k r o w t e n l a r u e

n 16.3% 0.8254

. 2 e l b a

T Experimen tresul tonse t2.

m h ti r o g l

A AverageRelaitveError CorrelaitonCoefifcient e

v i t a r o b a l l o C

n o i s s e r g e

R 8 % .6 0.9137

n o i s s e r g e R r a e n i l n o

N 18.4% 0.8126

k c a

b -propagation k r o w t e n l a r u e

n 13.2% 0.8679

1 e l b a

T andTable2 showstha tCollaborativeRegressionhasaloweraveragerelaitveerroranda e

h g i

h rcorrelationcoefficientt hant heothert woalgorithm ,whichprovest hati tcanexploi tunlabeled g o l n i y ti s o r o p f o n o it a t e r p r e t n i y r a d n o c e S . e c n a m r o f r e p n o i t a z il a r e n e g e v o r p m i o t s e l p m a s

s e l p m a s d e l e b a l e r e h w k s a t n o i t a c il p p a l a c i p y t a s i n o i t a t e r p r e t n

i isi nsufficien,t sothegenerailzaiton

. d e ti m il e b o t y l e k il s i s d o h t e m g n i n r a e l d e s i v r e p u s f o e c n a m r o f r e

p Disagreement-based

. s e s a c h c u s n i r e tt e b s m r o f r e p n o i s s e r g e r e v it a r o b a ll o c

n o is u l c n o C

s i n o it a t e r p r e t n i g o l n i y ti s o r o p f o n o it a t e r p r e t n i y r a d n o c e

S atypica lappilcationtask wherelabeled

t n e i c i f f u s n i e r a s e l p m a

s .The generailzaiton performance of supervised models is related to the s

i y c a r u c c a e h t n e t f o o s , s e l p m a s d e l e b a l f o r e b m u

n ilmtied. This paper propose to use

t n e m e e r g a s i

D -basedcollaborativeregressiontosolvet hisproblem .Thealgortihmemployst wokNN r

o s s e r g e

r s wtihdisagreement ,and in everyiteraitoneach regressorlabelsan unlabeled samplewith m

h t i r o g l a s i h T . r o s s e r g e r r e h t o e h t r o f l e v e l e c n e d i f n o c t s e h g i

h can exploi tunlabeled samples to

s e t a m it s e n o i s s e r g e r e v o r p m

i .The experimen tresul tshows tha tin both sets disagreement-based n

o i s s e r g e r e v it a r o b a ll o

c issuperiort ot heotheralgortihms. d

o h t e m s i h

T doesn’ trequire rich experience and high cost ,thus i tcan avoid the shortage of d

n a n o i t a t e r p r e t n i l a u n a

m bea substtiution for ti .In the actua lproduction ,especially in the newly .

e u l a v n o it a c i l p p a n i a t r e c s a h t i , e c r a c s y l r i a f s i a t a d d e l e b a l e r e h w a e r a d e p o l e v e d

s e c n e r e f e R

] 1

[ B.T. Sun, C.C. Zhou ,and J.W. Zhao .Identification and Evaluation of petroleum reservoir .

d e t s 1 , g n i g g o

l PetroleumIndustryPress :Bejiing ,China ,2014 ,pp1- .1 5 ]

2

[ A .Dashti ,E. Sefidari .Physica lproperties modeilng of reservoirs in Mansur ioi lfield,Zagros ,

n o i g e

r Iran .PetroleumExplorationandDevelopment ,vo.l43 ,pp .559-563 ,Apri l2016. ]

3

[ R.B. Han ,e tal .Selection of Mode lVariables for Pattern Recognition Methods and Its .

n o i t a f f i c i t n e d I y a P y t i v i t s i s e R w o L n i n o i t a c i l p p

A Wel lLoggingTechnology, v o.l41 ,pp .171-175 , u

r b e

F a ry2017. ]

4

[ M. Li ,K.G. Chen ,Z. Yang ,J.H. Zhang ,X. Liu .ComplicatedLithology Identificaiton in Heavy s

e R l i

O ervoirBasedonPatternRecognition .Wel lLoggingTechnology ,vo .l41 ,pp .453-457 ,Apri l .

7 1 0 2

] 5

[ J.Y. Liang ,J.W. Gao ,Y. Chang .Research Progress of Semi-supervised Learning .Journa lof .l

o v , y t i s r e v i n U i x n a h

(6)

] 6

[ X.J. Zhu .Semi-supervisedLearningLiteratureSurvey .Madison :UniversityofWisconsin ,2008. ]

7

[ Z.H. Zhou .Disagreement-basedSemi-supervisedLearning .ActaAutomaticaSinica ,vo.l39 ,pp . 1

7 8

1 -1878 ,November2013. ]

8

[ Z.H. Zhou ,M. Li .Sem -isupervised regression wtih co-training .Internationa lJoin tConference e

c n e g il l e t n I l a i c i f i t r A n