VALIDATION OF THE UNC PERCEIVED MESSAGE EFFECTIVENESS SCALE

Overview

Background. Interventionists commonly identify promising messages for health

communication efforts based on audience members’ ratings of perceived message effectiveness (PME).

Purpose. We sought to validate a new PME measure that improved on existing scales by focusing on the behavior and respondent, being brief, and having strong psychometric properties.

Methods. Participants were a national convenience sample of 999 adults and national probability samples of 1,692 adults and 869 adolescents recruited in 2015. Smokers and non- smokers rated up to six brief messages about the chemicals in cigarette smoke on two PME scales. The first was the new 3-item UNC PME Scale that assessed effects perception. The second was an established 6-item PME scale that assessed message perceptions. We examined the UNC PME Scale’s psychometric properties and compared both scales using item factor analysis.

Results. The UNC PME Scale measured the same construct across multiple chemical messages (all factor loadings ≥ 0.86). It exhibited high reliability (> 0.85) over very low to moderate levels of PME (z = -2.5–0.2), a range that is useful for identifying more promising messages. Samples of adults and adolescents showed a similar pattern of results. As expected,

1_{This chapter previously appeared as an article in the Annals of Behavioral Medicine. The original citation is as} follows: Baig SA, Noar SM, Gottfredson NC, Boynton MH, Ribisl KM, & Brewer NT. UNC Perceived Message Effectiveness: Validation of a brief scale. Annals of Behavioral Medicine. October 2018. doi:10.1093/abm/kay080

the UNC PME Scale was strongly positively correlated with the message perceptions scale (r = 0.84). It also exhibited strong psychometric properties among participants regardless of

education, reactance, sex, and smoking status.

Discussion. The UNC PME Scale reliably and validly measured PME among adults and adolescents from diverse groups. This brief scale may be used to efficiently evaluate candidate anti-smoking messages and may be suitable for adaptation to other risky behaviors.

Keywords: health communication, message development, formative research, item response theory

Validation of the UNC Perceived Message Effectiveness Scale

Perceived message effectiveness (PME) is concerned with the perception that candidate messages will or will not achieve their objectives, and the use of PME as a tool for message selection has become increasingly common since year 2000.105_{Many researchers specifically use} PME as an early indicator of a health message’s potential to change behavior. There is growing evidence for the predictive validity of PME, with a small number of longitudinal studies

demonstrating that PME predicts changes in smoking behavior in the context of anti-smoking messages.19,40_{A comparatively larger number of cross-sectional studies show that PME is} associated with attitudes49_{and behavioral intentions in a variety of health messaging contexts,} including those seeking to promote colonoscopy,46_{improving social support outcomes,}70_and preventing sexually transmitted diseases.42

PME measures have traditionally used two different types of perceptions to inform their measurement.105_{The first type are}_{message perceptions}_{, two examples being the extent to which} a message is thought of as compelling or informative. Message perceptions are rooted in

that important characteristics of a message such as argument quality function as a gateway to greater elaboration about the message.58,79,116_{Greater elaboration facilitates, in turn, attitude} change that may affect performance of a behavior. PME assessments that use message perceptions (i.e., attribute or ad-directed PME) are judgments about whether a message has characteristics that should enable further processing of that message.125_{In addition, this type of} PME is more general and may be a marker for the broad persuasive potential of a message.153

The second type of perceptions used in PME measures are effects perceptions. The extent to which the viewers of an anti-smoking message think that it makes smoking less appealing to them is an example of an effects perception. Another example is how much the recipients of a marijuana risk message think that it encourages them to avoid marijuana use. Effects perceptions are rooted in research at the intersection of economics, neuroscience, and psychology suggesting that people view a message for a brief period of time and use their initial affective responses to summarily process it.30,135,137_{The resulting overall impression may motivate behavior change.} PME assessments that use effect perceptions (i.e., impact or personalized PME) quantify the overall impression of a message and are judgments about a message’s potential to change important antecedents of behavior. Furthermore, this type of PME is often more narrowly concerned with predicting changes in behavior among the target audience. As a result, effects perception items use behavioral and personal referents that direct respondents to consider the effects of messages on their own attitudes, beliefs, thoughts, or behaviors.105

Although there is a clear conceptual distinction between message a nd effects perceptions, there is a notable tendency among researchers to use them interchangeably or combine them in a single measure of PME.50,105,152_{Due to this tendency, some researchers have criticized the} current measurement of PME as inadequate152_{and questioned the meaningfulness and utility of}

PME judgments in message development altogether.112,114_{Given that many health messages aim} to change behavior and effect perceptions are conceptually proximal to behavior, PME scales with a clear effects orientation are a promising direction for the literature. Additional concerns about the measurement of PME are that existing scales are either lengthy or too generic and do not uniformly use behavioral or personal referents in their items. Both practices may increase measurement error in a scale by increasing the cognitive burden on respondents or reducing the clarity and precision of PME judgments. A high cognitive burden on respondents may also limit the number of messages that can be efficiently evaluated using PME judgments in a single study.

To address issues in the current measurement of PME, we developed the UNC Perceived Message Effectiveness Scale, conceptualizing PME as the extent to which a person believes that a health message will affect them in ways that are consistent with message objectives,

particularly changing behavior.152_{The UNC PME Scale has only three items that uniformly} focus on behavior and the respondent. We sought to examine its psychometric properties in the context of brief messages about the chemicals in cigarette smoke that were designed to

discourage smoking. Three main goals for the use of PME guided our study. First, the UNC PME Scale should measure the same construct across different messages so that researchers can meaningfully compare individual messages using PME ratings. Second, it should function

similarly among diverse populations so that researchers can compare messages among subgroups of interest, such as those with health disparities. Finally, unlike many PME measures used in the literature,105,152_{it should demonstrate construct validity.}

Methods Participants

We recruited a convenience sample of U.S. adults (ages ≥ 18; n = 1,034) using Amazon Mechanical Turk. In addition, the Carolina Survey Research Laboratory invited all 13–25-year- olds, all smokers, and a randomly selected subset of adult (ages ≥ 25) non-smokers, who had previously completed a tobacco-related phone survey to participate in an online follow-up survey.17,23_{Data collection was multimodal (desktop computers, mobile devices, and by mail),} with nonresponders being contacted up to three times through telephone reminder calls and priority mailings. The follow-up survey had an overall response rate of 73% (2637/3612). We treated the data from adults (18+; n = 1,758) and adolescents (13–17; n = 877) as separate probability samples. After eliminating participants with missing data on the UNC PME Scale, construct validators, or demographic characteristics, the three analytic samples had 999 (adult convenience), 1,692 (adult probability), and 869 (adolescent probability) participants.

Procedures

In a repeated-measures design, participants in the adult convenience sample rated two chemical messages. Those in the probability samples rated six chemical messages in one of five randomly assigned orders. The messages varied by chemical and associated contextual

information (Table 3.1). Adult convenience participants received $3 while adult and adolescent probability participants received $45 for completing their respective surveys, which were much longer. The Institutional Review Board at the University of North Carolina approved the procedures for all three samples.

Measures

UNC PME Scale. In the adult convenience study, we developed twelve candidate items to assess various perceptions of chemical messages designed to discourage smoking. We worded the items such that they could be answered by both smokers and non-smokers. The five-point

response scale for all items ranged from "strongly disagree" to "strongly agree" (coded as 1–5). Exploratory factor analysis using maximum likelihood estimation and promax rotation revealed a four-factor solution of effects perceptions, message perceptions, message reactance,66_and

message credibility.

The factor for effects perceptions had four items that assessed respondents’ perceptions of discouragement, concern, unpleasantness, and appeal as related to the contents of the chemical messages. The discouragement item, "This message discourages me from wanting to smoke," came from our previous work on cigarette warnings and is theoretically derived from work on behavioral intentions.131,147_The_concern_{item, "This message makes me concerned about the} health effects of smoking," focused on the health consequences of smoking, and is derived from work on affect and risk perception.57_The_{unpleasantness}_{item, "This message makes smoking} seem unpleasant to me," and the appeal item, "This message makes smoking seem less appealing to me,"110_{were focused on reduced pleasure from smoking and are derived from work on}

smoking expectancies.18_{Due to the overlap between these last two items (}_r_{= .84) and cognitive} testing revealing greater clarity in the unpleasantness item, we dropped the appeal item yielding the 3-item UNC PME Scale (𝛼 = .93). For clarity, we generally refer to the UNC PME Scale as our effects perceptions scale in the remainder of this paper to emphasize the conceptual

difference between it and the message perceptions scale that we utilized.

Other measures. To support analyses of construct validity, we assessed message

perceptions, message reactance, and message credibility for each chemical message. The 6-item message perceptions scale (assessed in adult convenience sample only) references the respondent in one item only and does not use behavioral referents at all.41_{We measured message reactance,} or resistance to the message, using the Brief Reactance to Health Warnings Scale.66_{Finally, we}

measured message credibility (adult convenience sample only) using two items, "This message is believable to me," and "This message seems credible to me." The five-point response scale for all items ranged from "strongly disagree" to "strongly agree" (coded as 1–5). We predicted that our effect perceptions scale would correlate positively with the message perceptions scale and message credibility but would be negatively correlated with message reactance.

The survey also assessed smoking status and standard demographic variables. Adult smokers were those individuals who had smoked at least 100 cigarettes in their lifetime and currently smoke every day or some days,43_{and adolescent ever smokers were those who had ever} tried smoking cigarettes, even one or two puffs.3

Data analysis

Analyses used R (ver. 3.4.3)121_{with three selected add-on packages, lavaan (ver. 0.5-} 23.1097)124_{and mirt (ver. 1.27.1)}32_{for estimating psychometric models and ggplot2 (ver.} 2.2.1)148_{for plotting related mathematical functions.}

Psychometric properties. To parse variability in the items on our effects perceptions scale that is inherent to PME from variability that is specific to chemical messages, we used a two-tier item bifactor analytic (IFA) model with a general factor for PME spanning all chemical messages and orthogonal message-specific factors.26_{We compared the loadings on and variance} accounted for by the general factor with those for the message-specific factors to determine the extent to which the scale may function differently in the context of specific chemical messages.87 We also examined information curves from the IFA model to characterize scale and item

reliability. The information score is a quantification of the variability that a measure captures about the construct of interest and varies across the possible range (standardized) of the construct. Higher information points to lower standard error of measurement and, thereby,

30 greater reliability.

To arrive at the preferred IFA model with acceptably low levels of measurement non- invariance across messages in the message-specific and general PME factors, we estimated a series of increasingly constrained IFA models and compared them using the likelihood ratio (LR) test for nested models. We confirmed model selection by examining global fit of the preferred model using the appropriate IFA 𝜒2_{analog and the root mean square error of approximation} (RMSEA), item fit with graded response parameterization using the S–𝜒2_{index, and person fit} using the Zh index. The preferred IFA model incorporated strong invariance across chemical messages in the general and specific dimensions and had adequate global fit in the adult convenience (G2_{= 1839,}_df_{= 15606,}_p_{> .05), adult probability (}_M_{2 = 973,}_df_{= 153,}_p_{< .001),} and adolescent probability (M2 = 277, df = 153, p < .001) samples. The RMSEA (range = 0– 0.056) was small in the three samples. The IFA model did not exhibit systematic deviations in item fit in the adult convenience (range S–𝜒2_{= 35.7–46.5, range}_df_{= 28–33), adult probability} (range S–𝜒2_{= 134-213, range}_df_{= 101–112), and adolescent probability (range}_S–_𝜒2_{= 22.4–} 56.0, range df = 21–29) samples. The model also fit better than expected for a large majority of participants (range Zh > 0 = 77.8–82.9%) in each of the three samples.

Differential item and test functioning. To determine whether individual items on our effects perceptions scale had similar psychometric properties among subgroups that differed by education (adults: ≤ some college, or > some college; adolescents: middle school, or high school), reactance (≤ "neither disagree or agree," or higher), sex, and smoking status (adults: smoker or not; adolescents: ever-smoker or not), we conducted differential item functioning (DIF) analyses. We treated each message as a potential instance of DIF and conducted separate analyses for each instance using multiple-group unidimensional graded response models. We

used the LR/f ratio to select anchor items.150_{Next, we estimated a series of more constrained} models and used the LR test for nested models to identify items with any DIF. Additional LR tests revealed whether an instance of DIF was related primarily to the reliability or

dimensionality of the involved item or both.

To assess whether any observed DIF caused our effects perceptions scale as a whole to function differently for subgroups in terms of reliability and dimensionality, we conducted differential test functioning (DTF) analyses using effect sizes. Specifically, we calculated Cohen’s d based on a final model with between-group constraints for DIF to characterize the magnitude of any instance of DTF as well as DIF.93_{Effect sizes with absolute values of 0.2} amounted to negligible DIF or DTF while those greater than 0.2 warranted further

investigation.93,138_{DIF testing involves many comparisons, inflating the false-discovery rate.} DIF on individual items may cancel out at the scale level if the direction of DIF varies across items within the scale or the magnitude is small. Conducting DTF analyses using effect sizes allowed us to avoid unnecessarily flagging items and pursue all possibilities of DIF without having to correct for inflated false-discovery rates.

Construct validity. We evaluated the construct validity of our effects perceptions scale by examining the average correlations across messages between our scale and the message perceptions scale, message credibility, and message reactance. To take advantage of our

multitrait-multioccasion data,69_{we used the correlated trait-correlated uniqueness (CTCU) model} to estimate all factor correlations and variance components.84_{We also compared the unexplained} variance (uniqueness) in the items on our effects perceptions scale and the message perceptions scale to assess the measures’ relative susceptibility to measurement error. The CTCU model in the adult convenience sample had four correlated factors for both measures of PME, message

reactance, and message credibility that spanned the two messages. The CTCU model in the probability samples had two correlated factors for our message perceptions scale and message reactance that spanned the six messages. The models retained relevant constraints for

measurement invariance from the IFA model for our effects perceptions scale and applied similar constraints for all construct validators. In the three samples, the CTCU model had adequate global fit (RMSEA = 0.074–0.10; CFI = 0.95–0.98).

Results

The mean ages of adult convenience and probability participants were 33.8 (SD = 11.0) and 43.1 (SD = 17.7), respectively; adolescents had a mean age of 15.0 (SD = 1.37). Fewer than half of adult convenience (46.7%) and probability (30.0%) participants had a bachelor’s or advanced degree (Table 3.2). In both adult samples, around one-third of participants

(convenience = 31.1%, probability = 37.4%) were current smokers, and 10.2% of adolescents had ever tried smoking cigarettes.

Psychometric properties

Our effects perceptions scale measured the same construct in the context of six unique chemical messages. In the adult convenience sample, the three items strongly loaded on the general factor for PME (0.89–0.92) and weakly loaded on the message-specific factors (0.18– 0.27; Table 3.3). The general factor for PME accounted for the vast majority of the variance in the items (82.6%). In comparison, the two message-specific factors together explained an additional 5.2% of the variance in the items. These patterns indicated that participants

understood the scale similarly in the context of two chemical messages. The adult and adolescent probability samples also replicated these findings across the larger set of six chemical messages.

probability z = -2.3) to mean levels (convenience: z = 0.2; probability: z = -0.1) of PME with large amounts of information that corresponded to high reliability (≥ 0.85; Figure 3.1). In contrast, the scale reliably measured extremely low (z = -2.9) to somewhat low (z = -0.7) levels of PME among adolescents. This is because the majority of participants (≥ 55%) in the three samples responded to each of the three items with the highest option, "strongly agree" (coded as 5) irrespective of message resulting in left skewed response distributions. Thus, our effects perceptions scale did not provide information about individuals who were likely to elicit higher PME than the five-point response scale allowed. This ceiling effect was more pronounced among adolescents and present in all items in the three samples. In the three samples, concern

contributed the least amount of information (max = 5.1–7.3) to the scale. Discouragement contributed the most information (max = 7.1) in the adult convenience sample while unpleasant did so in the adult (max = 10.3) and adolescent (max = 17.5) probability samples.

Differential item and test functioning

Among adults and adolescents who varied by education, reactance, sex, or smoking status, our effects perceptions scale exhibited similarly strong psychometric properties (Figure 3.2). Across all three samples, the items on our effects perceptions scale exhibited negligible to small DIF (absolute value Cohen’s d = 0.003–0.18) in 34 out of 168 potential instances of DIF (all p < .01) and larger DIF (absolute value d = 0.22–0.36) in eight instances (all p < .01). The 34 instances of negligible to small DIF were distributed over all three samples, all three items, all four grouping variables, and all six chemical messages. Similarly, the eight instances of larger DIF were distributed over all three samples and all chemical messages even though they involved the concern and discouragement items and smoking status only. These patterns suggested that the items on our effects perceptions scale generally maintained strong

psychometric properties among adult and adolescent participants who varied by the aforementioned characteristics.

Most of the statistically significant instances of DIF may be attributable to high power to detect even negligible DIF. DTF analyses provided evidence in support of this possibility. Any

In document Baig_unc_0153D_18270.pdf (Page 33-53)