A general framework for Residual Weighted Learning

CHAPTER 3: RESIDUAL WEIGHTED LEARNING

3.2 Methodology

3.2.4 A general framework for Residual Weighted Learning

We have proposed a method to identify the optimal ITR for continuous outcomes. How- ever, in clinical practice, the endpoint outcome could also be binary, count, or rate. In this section, we provide a general framework to deal with all these types of outcomes.

The procedure is described as follows. First, estimate the conditional expected outcomes given clinical covariates X, Eb

R 2π(A,X) X

, by an appropriate regression model. It is equivalent to fitting a weighted regression model, in which each subject is weighted by ₂_π₍_A,1_X₎. Second, we calculate the estimated residual _bri by comparing the observed outcome and expected outcome estimated in the first step. If the observed outcome is bet- ter than the expected one, the estimated residual _bri is positive, otherwise it is negative. Specifically, if larger values ofR are more desirable, as with our assumption for continu-

ous outcomes,ri_b = ri −Eb R 2π(A,X) X =x_i

; if smaller values ofR are preferred,e.g. the number of adverse events, then _bri = Eb

R 2π(A,X) X =x_i

−ri. Third, identify the decision functionfb(x)by using the estimated

ri in RWL.

The underlying idea is simple. Generally, the outcomeRis a random variable depend- ing on X and A. We estimate common effects of X by ignoring the treatment assign- ment A, and then all information on the heterogeneous treatment effects is contained in the residuals. As demonstrated previously, the benefits from using residuals include sta- bilizing the variance and controlling the treatment matching factor, both of which have a positive impact on finite sample performance. In previous sections, we discussed RWL for continuous outcomes in detail. We will provide two additional examples to illustrate the general framework.

The first example is for binary outcomes, i.e. R ∈ {0,1}. Assume that R = 1 is desirable. We may fit a weighted main effects logistic regression model,

E(R|X) = exp(β0+X T_β) 1 + exp(β0+XTβ)

The estimatesβb₀ andβbcan be obtained numerically by statistical software, for example, the glm function in R. The residual_briisri− exp(βb0+XTβb)

1+exp(βb0+XTβb)

. Then we estimate the optimal ITR by RWL.

The second is for count/rate outcomes. We use the pulmonary exacerbation (PE) outcome of the cystic fibrosis data in Section 3.6 as an example. The outcome is the number of PEs during the study, and it is a rate variable. The observed data are Dn = {xi, ai, ri, ti}n

i=1, where ri is the number of PEs in the duration ti. We treat the count

ri as the outcome, and (xi, ti)as clinical covariates. We may fit a weighted main effects Poisson regression model,

Since fewer PEs are desirable, we compute_bri asexpβb₀+xT_i βb+ log(ti)

−ri, which is the opposite of the raw residual in generalized linear models.

3.3 Theoretical Properties

In this section, we establish theoretical results for RWL. To the end, we assume that a sample Dn = {xi, ai, ri}n

i=1, is independently drawn from a probability measure P on X × A × M, where X ⊂ _Rp _{is compact, and} _M _{= [−}_{M, M}_] _⊂

R. For any ITR d:X → A, the risk is defined as

R(d) = E R π(A,X)I(A 6=d(X)) .

The ITR that minimizes the risk is the Bayes rule d∗ = arg mindR(d), and the corre- sponding risk R∗ ₌ _R(_d∗₎ _{is the Bayes risk. Recall that the Bayes rule is} _d∗_{(x) =}

sign(E(R|X =x, A= 1)−E(R|X =x, A=−1)).

In RWL, we substitute the 0-1 loss _I(A 6= sign(f(X))) by the smoothed ramp loss, T(Af(X)). Accordingly, we define theT-risk as

RT,g(f) =E R−g(X) π(A,X) T(Af(X)) ,

and, similarly, the minimal T-risk as R∗_T,g = inffRT,g(f)andfT,g∗ = arg minfRT,g(f). In the theoretical analysis, we do not requiregto be a regression fit ofR. gcan be any arbi- trary measurable function. We may further assume that(R−g(X))/π(A,X)is bounded almost surely.

The performance of an ITR associated with a real-valued functionfis measured by the excess riskR(sign(f))− R∗_{. In terms of the value function, we note that the excess risk}

Let fDn,λn ∈ HK +{1}, i.e. fDn,λn = hDn,λn + bDn,λn where hDn,λn ∈ HK and

bDn,λn ∈R, be a global minimizer of the following optimization problem:

min f=h+b∈HK+{1} λn 2 ||h|| 2 K + 1 n n X i=1 ri−g(xi) π(ai,xi) Taif(xi)) .

Here we suppressgfrom the notation offDn,λn,hDn,λn andbDn,λn.

The purpose of the theoretical analysis is to estimate the excess riskR(sign(fDn,λn))−

R∗ _{as the sample size} _n _{tends to infinity. Convergence rates will be derived under the}

choice of the parameterλnand conditions on the distributionP.

In document Zhou_unc_0153D_15716.pdf (Page 52-55)