Predictor Variables - Variable Generation and Variable Description

7 Chapter 2. Predicting Failures of High Tech Innovations-in-Use: Application of

2.4 Research Design and Methodological Foundation

2.4.2 Variable Generation and Variable Description

2.4.2.2 Predictor Variables

The first step in our problem formulation is to specify the correct signal variable that firms can use for detecting the failure of a high tech innovation-in-use and specify the failure to a source such as design, manufacturing or supply chain. To that end, let us consider the scheme of adverse event data distribution on a normal time-line shown in Figure [2.6]. A simple and intuitive measure of the variable would be the adverse event rate given by the following expression [2.11]:

2 … 3- (3) =P)+> 5; 2 … 3 a5 * (P)

† + = 5 (†) … [2.11]

While simple and intuitive, the problems of using the above expression [2.11] are the following: Firstly, this variable ignores the influence of any reports outside the time-period even if the reports may fall very close to the boundary of the time-periods. Secondly, the effectiveness of R as a signal specification would be heavily influenced by the choice of the time-period, T.

Increasing T would reduce variance of the detection process but would increase the bias of the detection process due to aggregation effect and vice-versa. Thirdly, within the time-period, T, all reports will have the same weights and the local distributions would be ignored in the specification of R.

Figure 2.6 Representative Timeline of Adverse Event Reporting

Similarly, the simple cumulative frequency count, as has been used in past studies (namely, Thirumalai and Sinha 2011) is not appropriate for predictive purpose. While cumulative count may be appropriate for an explanatory model, cumulative count can cause problems for a predictive model. The cumulative frequency count beyond any time would either remain the same or increase with time, and, hence, would mean that the likelihood of failure would probably remain the same or monotonically increase with time. From a prediction standpoint, this is not informative.

Instead, we can consider the histogram of blocked frequency count on the line at some

time-period. This measure of adverse events has a better likelihood of being more informative for a predictive model since it contains the information of the distribution of events albeit at a blocked level. However, this would be sensitive to the choice of interval for blocking. A narrow time-interval would have higher information on the time distribution of the events but would increase the variation of data distribution leading to a consistency issues with the prediction model. On the other hand choosing a wide interval would reduce the information content of the adverse event measure. These issues can be addressed by considering a weighting scheme for the frequency count which accounts for the relative timing and temporal concentration of the adverse event data. Here we use a kernel weighted frequency count (Kass-Hout and Zhang, 2010) of the adverse event data given by equation [2.12] as the primary predictor.

;_< = ] 1 choices of a kernel function like the triangular, biweight, triweight, epanechikov and Gaussian kernels, we selected the Gaussian kernel function for its smoothness property as well its ease of interpretation. So, the first predictive variable for our model is the Kernel.Weighted.Adverse.Events.

We also created lagged variables for the Kernel.Weighted.Adverse.Events with 100-day, 200-day and 300-day lags. This is because it is likely that, in reality, there would be a lag in information flow to the firm and its being able to detect the failure. The idea of a predictive model is to create variables that are likely candidates of being significantly predictive and then select the right variables from the data using an appropriate selection scheme.

While the above measure is informative in terms of the frequency of adverse events, it misses out information on the severity of the adverse event. To capture the severity of adverse events, we used the severity data from the FDA’s patient database. FDA classifies the severity of adverse events based on the impact on patients involved in the case of an adverse event. The data on the severity of adverse events were available in codified format in the database, however, still a large part of this data were recorded in text format. We used a text key word search method to classify the events. The method used was very similar to the text classification algorithm described earlier in the context of the device classification. Next, following FDA’s coding scheme, we coded the severity of adverse events on a 5-point ordinal scale with “no harm” = 0, “minor injury” = 1,

“injury” = 2, “severe injury” = 3, and “death” = 4. Using this definition and denoting the severity by _b we created the following two severity weighted adverse event variables, namely, a linear

weighted kernel and an exponential weighted kernel:

‰ - . . j *ℎ . ‡ 1 ;< = Š_ˆ‹^kŒ ∑^ˆb{k b∗ 4b∗ ‡ Š^<S•_‹ ^xŒ … [2.13]

…Qa5 -1. . j *ℎ . ‡ 1 ;< = Š_ˆ‹^kŒ ∑ exp ( ^ˆ_b{k b) ∗ 4b∗ ‡ Š^<S•_‹ ^xŒ… [2.14]

Apart from the above, we also created kernel weighted variables for the three categories of time-varying covariates we have identified, namely, design, supply chain and manufacturing. We present a complete description of the variables in Table [2.2].

Table 2.2 List and Description of Variables Generated for Predictive Model Building

Variables Description

Primary Predictor: User Reported Adverse Events (MAUDE)

1 Maude.Kernel.000 Simple adverse event kernel density estimate

2 Maude.Kernel.300 Simple adverse event kernel density estimate with 300 days lag 3 Exponential.Weighted.Kernel.000 Severity weighted (exponential weights) kernel density

estimates of the adverse events

4 Exponential.Weighted.Kernel.300 Severity weighted (exponential weights) kernel density estimates of the adverse events with 300 days lag

Time-Varying Covariates related to the precision of prediction Design

5 Product.Dev.300 Number of design changes in a time-period (quarter) within a specific device category (kernel density estimate) of a firm with 300 days lag

6 Firm.Product.Dev.100 Firm.Product.Dev.300

Number of design changes in a time-period (quarter) across all device categories of a firm with 100 and 300 days lag

7 Firm.Innovation.100 Firm.Innovation.300

Number of new devices introduced by a firm within a time-period (quarter) across all device segments with 100 and 300 days lag

Supply Chain 8 Supply.Chain.Change.000

Supply.Chain.Change.100/300

Number of changes in supplier components registered within a time-period (quarter) for a device code with no lag, 100 days lag and 300 days lag (density estimate)

9 Firm Global Sourcing Index Proportion of firm product (device) changes that included shifting sourcing abroad (outside US) for a time-period (quarter)

Manufacturing 10 Manufacturing.Change.000

Manufacturing.Change.100/300

Number of manufacturing process changes implemented within a time-period (quarter) for a device code with no lag, 100 days lag and 300 days lag (kernel density estimate)

Control Variables

11 Technology.Life.Cycle Number of substantially equivalent priors of a device code 12 Firm.product.Scope Entropy Index of firm’s number of models across device

segments

13 Product.Competition Number of competing models in a device segment

14 Industry Competition Number of competing players within a usage class (Industry segment)

15 Regulation Type Categorical variable of approval type of a device: PMA and 510K

16 Device Class Categorical variable for complexity class of device Class I, II and III

17 Usage Class Categorical variable of device usage classes (20 classes) 18 Implant Binary variable (1= if the specific device is an implantable

device; 0 = otherwise)

19 Product class Product category control (732 different device codes in the study sample)

20 Manufacturer Manufacturer control (105 manufacturers in the study sample) 21 Time Age of the device in number of quarters

In document Managing the Risks and Potential of High-tech Innovations-in-use: Predictive Analytic Modeling with Big Data and a Longitudinal Field Study (Page 41-45)