CHAPTER 3 Methods
3.1 Dataset and rationale for the choice of the dataset
Accomplishing the specific aims and ensuring external validity at the national level requires a nationally representative dataset that enables the identification of primary prophylactic G-CSF administration, along with the identification of chemotherapy administration, neutropenia occurrence, level of healthcare service utilization, Medicare expenditures, administration of other therapies, and patient socio-demographic and clinical characteristics. Population data improve the identification of chemotherapy-related toxicities, like neutropenia, by overcoming the under-reporting issues of clinical trials. The data also facilitate the estimation of actual healthcare utilization and Medicare expenditures at the population level. The population-based SEER linked to Medicare claims meets these requirements.
This study uses the SEER-Medicare data containing newly diagnosed breast cancer cases from 1994 to 2002, linked to Medicare claims through 2003. The linkage of SEER and Medicare files are a collaborative effort between the National Cancer Institute (NCI), the SEER registries and the Centers for Medicare & Medicaid Services (Warren, 2002).
The 17 geographic areas from which the SEER data are collected account for 25% of the US population. The data have been collected by NCI annually since 1973. Comparisons of SEER cancer mortality rates with those of the entire US population suggest that the SEER data are predominantly
representative of the national population (Warren, 2002). SEER data are very similar to the US population in terms of socio-demographic variables like age and gender, but the data have a higher proportion of non-whites, and more urban and affluent individuals. The data are valid, high quality and complete in terms of cancer incidence and diagnosis reporting in the United States (Warren, 2002). The data contain information on the patient’s demographic characteristics (age, race, gender, marital status, education, income, geographic location), date of diagnosis, tumor characteristics (stage, grade, histology, size, lymph node positivity and hormone receptor status), presence of other malignancies, whether the cancer of interest is the first or a later malignancy, type of surgical
treatment and radiation therapy recommended or provided within four months of diagnosis, follow-up of vital status, and cause of death. Thus, they provide sufficient information about variables known to influence primary prophylactic G-CSF administration and its clinical and cost effectiveness. Also, since the date of diagnosis is available in the SEER data it is easy to distinguish between prevalent versus incident cases, which is not possible with just the claims data.
Medicare claims data are available for 97% of the US population 65 years and above, and include health service claims for care provided by physicians, inpatient hospital stays, hospital outpatient clinics, home health care agencies, skilled nursing facilities, hospice programs, and durable medical equipment suppliers. The inpatient (part A) claims are available from the year 1986. The part B physician service claims and outpatient services are only available from the year 1991. Also, since it was made mandatory under the National Claims History System from 1991 to include the diagnosis codes in the physician claims, diagnosis codes are present in all physician claims only from that year. Medicare claims can be used to construct co-morbidity indices for the patients, to identify any service utilization, and costs (Charlson, 1987; Romano, 1993; Deyo, 1992; Klabunde, 2000; Klabunde, 2007).
Linking the Medicare claims to the SEER data provides a unique database to study cancer control, prevention, treatment, healthcare service utilization, and Medicare expenditures for patients above the age of 65 years. The SEER-Medicare data are an efficient and cost effective source of information on large heterogeneous, patient populations, unlike the geographically limited clinical trials (Potosky, 1993). These observational data include all women residing in a community setting, and biases such as volunteer bias in clinical trials and recall bias are also reduced. Also, since SEER data have been collected from medical charts and pathology reports, they contain a wealth of
information on cancer histology, type, stage and extent of spread. Medicare data have the advantage of being longitudinal, and also provide the ability to identify tests and procedures more accurately; claims data have higher sensitivity for tests and procedures than chart audits (Nattinger, 2002). The two datasets complement each other as they combine the details during initial diagnosis from chart review, with a lifetime of utilization and cost data from claims.
Since the SEER Medicare database is large, it provides an opportunity to study the occurrence of neutropenia in breast cancer patients receiving chemotherapy with higher statistical power. It has been established that chemotherapy use in women above 65 years of age, rates of hospitalization for chemotherapy-induced toxicity (including neutropenia), and administration of G- CSF can be identified using SEER Medicare data (Du, 2002; Earle, 2001; Schrag, 2001, Du, 2001a; Du, 2001b). Chemotherapy and G-CSF are covered by Medicare and thus can be identified using the claims. Also, the validity and reliability of Medicare claims to identify chemotherapy administration has been successfully documented by previous studies (Warren, 2002; Du, 2005). Medicare has a high sensitivity for detecting the receipt of chemotherapy, which is around 88% for breast cancer. The claims code for the actual chemotherapy drug delivered, if present, is in high agreement with the drug as reported by comparative chart reviews in case of breast cancer.
One limitation of the data is that the chemotherapy drug is often not indicated in the claims in case of breast cancer. The sensitivity to identify specific chemo agents for breast cancer are 52.5% (Methotrexate), 76.2% (Cyclophosphamide) and 79.2% (5-FU). The sensitivity of other agents has not been verified (Warren, 2002). If the administrative data cannot be used to identify the specific agent, then it is difficult to identify the agent as breast cancer chemotherapy involves numerous agents. Also, frequency of the claims do not necessarily determine the frequency or duration of the chemotherapy since some providers bill Medicare for multiple administrations using just one claim. However, this is an issue only if we want to identify the actual chemotherapy drug, and we do not aim to accomplish that in this study. Claims data are very sensitive in identifying the chemotherapy administration itself.