Abstract—In recent years, consumers have become empowered to share personal experiences regarding prescription drugs via Web page discussion groups. This paper describes our recent research involving automatically identifying adverse reactions from patient-provided drug reviews on health-related web sites. We focus on the statin class of cholesterol-lowering drugs. We extract a complete set of side effect expressions from patient-submitted drug reviews, and construct a hierarchical ontology of side effects. We use log-likely ratio estimation to detect biases in word distributions when comparing reviews of statin drugs with age-matched reviews of a broad spectrum of other drugs. We find a highly significant correlation between statins and a wide range of disorders and conditions, including diabetes, amyotrophic lateral sclerosis (ALS), rhabdomyolysis, neuropathy, Parkinson’s disease, arthritis, memory loss, and heart failure. A review of the research literature on statin side effects corroborates many of our findings.
I. INTRODUCTION
The last few decades have witnessed a steady increase in drug prescriptions for the treatment of biometric markers rather than overt physiological symptoms. Today, people regularly take multiple drugs in order to normalize serum levels of biomarkers such as cholesterol or glucose, or to reduce blood pressure. All drugs have side effects, which are sometimes debilitating or even life-threatening. When a person taking multiple drugs experiences a new symptom, it is not always clear which, if any, of the drugs or drug combinations are responsible.
Increasingly, consumers are turning to the Web to seek information, and, increasingly, this information comes in the form of consumer-provided comments in discussion groups or chat rooms. User reviews of products and services have empowered consumers to obtain valuable data to guide their decision process. Recently, statistical and linguistic methods have been applied to large datasets of reviews to extract summary and/or rating information in various domains ([9] [22]).
Health care and prescription drugs are a growing topic of discussion online, not surprising given that almost half of all Americans take prescription drugs each month, costing over $200 billion in 2008 alone ([5]). Though drugs are subject to clinical trials before reaching market, these trials are often too short, and may involve too few people to give conclusive results. A large study recently conducted on the heart failure drug, nesiritude, invalidated the findings of the smaller study that had led to the drug’s approval [11]. While regulatory agencies do attempt to monitor the safety of approved medical treatments, surveillance programs such as the U.S. Food and Drug Administration’s (FDA’s) and Adverse Event Reporting System (AERS) are often difficult for patients to use.
In addition, the large language gap between medical documents and patient vocabulary can cause confusion and misunderstanding ([23]). We hope to take advantage of the vast amount of information available in patient anecdotes posted online to address the dual problems of insufficient clinical studies and mismatched terminology.
We envision a system that increases patient awareness of drug-related side effects by enabling consumers of prescription drugs to easily browse a large consolidated database of posts from health-related web sites. Beyond aggregating data from drug review and health discussion sites, we plan to support spoken queries, which would be answered via a set of succinctly summarized hits that best match the query, based on sophisticated statistical and linguistic techniques. The user could then click on any one of these displayed summaries to read the associated post.
This paper describes our preliminary efforts to detect associations between a drug class and its side effects. We use statistics and heuristic methods to build up a hierarchical ontology of side effects by aggregating patient-submitted drug reviews. We use log-likelihood ratios to extract summary information derived from biases in word and phrase distributions, and to quantify associations between drugs and symptoms. For the scope of this paper, we focus on statin drugs, which are among the most costly and commonly prescribed drugs in the United States. The methods described are applicable to all drug classes.
In the remainder of this paper, we will first review the research literature reflecting known or suspected side effects associated with statin drugs. After explaining our data collection and side-effect ontology construction, we describe our methodology and verify that many of our extracted associations align with observations from the literature.
II. BRIEF LITERATURE REVIEW
A. Side Effects of Statin drugs
Statins (Hydroxy methyl glutaryl coenzyme A reductase inhibitors) have become increasingly popular as very effective agents to normalize serum cholesterol levels. The most popular of these, atorvastatin, marketed under the trade name, Lipitor, has been the highest revenue branded pharmaceutical for the past 6 years. The official Lipitor web site lists as potential side effects mainly muscle pain and weakness and digestive problems. However, several practitioners and researchers have identified suspected side effects in other more alarming areas, such as heart failure, cognition and memory problems, and even severe
neurological diseases such as Parkinson’s disease and ALS (Lou Gehrig’s disease). [21] provides compelling arguments for the diverse side effects of statins, attributing them mainly to cholesterol depletion in cell membranes.
It is widely acknowledged that statin drugs cause muscle pain, weakness and damage ([7] [12]), likely due in part to their interference with the synthesis of the potent antioxidant Coenzyme Q10 (CoQ10) ([10]). CoQ10 plays an essential role in mitochondrial function to produce energy. Congestive heart failure is a condition in which the heart can no longer pump enough blood to the rest of the body, essentially because it is too weak. Because the heart is a muscle, it is
plausible that heart muscle weakness could arise from longterm statin usage. Indeed, atorvastatin has been shown to impair ventricular diastolic heart performance ([14]). Furthermore, CoQ10 supplementation has been shown to improve cardiac function ([13] [20]).
The research literature provides plausible biological explanations for a possible association between statin drugs and neuropathy ([15] [24]). A recent evidence-based article ([1]) found that statin drug users had a high incidence of neurological disorders, especially neuropathy, parasthesia and neuralgia, and appeared to be at higher risk to the debilitating neurological diseases, ALS and Parkinson’s disease. The evidence was based on careful manual labeling of a set of self-reported accounts from 351 patients. A mechanism for such damage could involve interference with the ability of oligodendrocytes, specialized glial cells in the nervous system, to supply sufficient cholesterol to the myelin sheath surrounding nerve axons. Genetically-engineered mice with defective oligodendrocytes exhibit visible pathologies in the myelin sheath which manifest as muscle twitches and tremors ([16]).
Cholesterol depletion in the brain would be expected to lead to pathologies in neuron signal transport, due not only to defective myelin sheath but also to interference with signal transport across synapses ([17]). Cognitive impairment, memory loss, mental confusion, and depression were significantly present in Cable’s patient population ([1]). Wagstaff et al. ([19]) conducted a survey of cognitive dysfunction from AERS data, and found evidence of both short-term memory loss and amnesia associated with statin usage. Golomb et al. ([6]) conducted a study to evaluate evidence of statin-induced cognitive, mood or behavioral
changes in patients. She concluded with a plea for studies that “more clearly establish the impact of hydrophilic and lipophilic statins on cognition, aggression, and serotonin.”
B. Relationship between Cholesterol and Health
ALS and heart failure are both conditions for which published literature suggests an increased risk associated with statin therapy ([1] [10]). Indeed, for both of these conditions, a survival benefit is associated with elevated cholesterol levels. A statistically significant inverse correlation was found in a study on mortality in heart failure. For 181 patients with heart disease and heart failure, half of those whose serum cholesterol was below 200 mg/dl were dead three years after diagnosis, whereas only 28% of the patients whose serum cholesterol was above 200 mg/dl had died. In another study on a group of 488 patients diagnosed with ALS, serum levels of triglycerides and fasting cholesterol were measured at the time of diagnosis ([2]). High values for both lipids were associated with improved survival, with a p-value <0.05.
A very recent study on the relationship between various measures of cholesterol status and health in the elderly came up with some surprising results, strongly suggesting that elevated cholesterol is beneficial for this segment of the population [18]. A study population initially over 75 years old was followed over a 17 year period beginning in 1990. In addition to serum cholesterol, a biometric associated with the ability to synthesize cholesterol (lathosterol) and a biometric associated with the ability to absorb cholesterol through the gut (sitosterol) were measured. For all three measures of cholesterol, low values were associated with a poorer prognosis for frailty, mental decline and early death. A reduced ability to synthesize cholesterol showed the strongest correlation with poor outcome. Individuals with high measures of all three biometrics enjoyed a 4.3 year extension in life span, compared to those for whom all measures were low.
III. SIDE-EFFECT DISCOVERY
A. Data Collection
To learn the underlying associations between side effects and drug usage from patient-provided reviews, we collected drug reviews from three drug discussion forums (“AskPatient.com,” “Medications.com” and “WebDB.com”) which allow users to post reviews on specific drugs and share their experiences. Table 1 gives the statistics on the review data collection. A total of 8,515 statin reviews were collected from the three data sources. We also collected 105K drug reviews from the AskPatient.com, on drugs to treat a broad range of problems such as depression, acid reflux disease, high blood pressure, diabetes, etc. This set includes reviews for non-statin cholesterol lowering drugs.
Continue Reading the Research Study Here: http://people.csail.mit.edu/seneff/IMMM.pdf