The study obtained patient data from a local hospital for purposes of analysis only. The data was particularly for patients with heart complications and was taken through multiple features to diagnose whether the patient is sick or not. Heart complications rank as the leading ailments in all major hospitals. It is also among the leading causes of death. It is therefore very critical that the issues underlying the subject be studied and efficiently understood. Even outside health facilities, individuals are constantly advised to check the health status of their hearts (Ward, Schiller, & Goodman, 2014). There are several issues which affect the heart, and it is one of the body organs that can kill abruptly. That calls for a prior understanding of the issue and addressing associated symptoms to aid in addressing the condition. That underlines the reason why the data is a subject of study in this case.
Phase 1: Data Discovery
With heart complications ranking on the top five diseases in the mortality index, it is imperative to study the condition. The purpose of this study is thus trying to find a connection between age, and the symptoms of heart complications like the persons blood pressure, cholesterol level and maximum heart rate. Essentially, it will be looking whether there is a correlation between ones age and those other variations. Additionally, the study will extend to look the existence of such complications based on gender. To fulfill the above purpose; the study uses various data mining and extraction techniques and obtains the most relevant data.
Just before dropping to check the hypothesis, it is important to highlight the specific questions in this study. The first is to establish whether age has a relationship with any complication that leads to heart diseases like cholesterol, high blood pressure, and heart rate. The study will explore age against each independent variable, and check the level of correlation.
The main hypotheses of this study are that there is no relationship between age and an individuals heart complications. The alternative hypothesis is that there is a relationship between age and the heart health. However, there would be another issue to study while investigating the results of the primary hypothesis. For instance, the model produced would predict the relationship between gender and health status. Among the variables which would be used, and whose correlation to heart health will be studied, including blood pressure, cholesterol level, and maximum heart rate. During the data mining, the study was able to gather more variables based on the overall data that was given. However, for purposes of focusing on a single hypothesis, the number of variables has to be narrowed down to those necessary only. This will be discussed in detail in the next phase of data analytics.
Phase 2: Data Preparation
The focus will be one form of data set, which will also have multiple other data sets within the data. First, it is the age of each patient who had been diagnosed. The hospital offered data for 303 patients, from which all the data that would be relevant to the study were extracted.
Age Data Set (Extract, Transform, and Load Process)
I organized the data into an excel file. Since each file had the patient data, the focus was age and other quantitive data. However, for purposes of clarity, the initial file had all patient information including qualitative data. After that, the data elements which were not essential in the data like fasting blood sugar, thal and class were eliminated.
Sex Data Set (Extract, Transform, and Load Process)
Since the persons sex was also a significant issue of consideration, the gender of each patient was taken and recorded a faint age. The data was organized for every patient. Similarly like in the first case, all unneeded fields were deleted. What remains were data in the columns which would be used in the analysis exercise.
Phase 3: Model Planning
Classification: Multi-Level Classification
The data set that would be used in the overall study encompasses four independent variables; blood pressure, cholesterol, maximum heart rate and peak heart rate. The purpose of the study will be to look the relationship between these conditions to age and gender. These conditions lead to heart complications, and their strength based on age will imply a substantial likelihood of being responsible for ones heart health. However, the multilevel will apply in determining which ones have the highest impact, which one had medium implications and which one has the least in that order.
Multilevel classifications are standard in healthcare facilities, where they are used to classify the level of various health determinants, patients risk level among other things (Costa, & Filho, 2016). It is important when building models, particularly when organizing and analyzing large data sets. It enables the statistician to quickly detect critical patterns, which can aid in important deductions. It is not easy to obtain such quick observations in other data mining methodologies, giving multilevel classification and critical advantage over all others. In this case, it would not be prudent just to leave each variable just show the level of correlation. It would give better results if each correlation can be classified into a certain group based on the strength of correlation. That will be a major consideration in the study.
Exploratory Data Analysis
As a complementary to a multilevel classification which will be resolved through regression model and hypothesis testing, it is also worth conducting and exploratory data analysis to visualize some common trend in the data. Exploratory data analysis summarizes the main characteristics of a data set, which the other methods of analysis like hypothesis testing may miss (Willekens, 2014). This early process may enable the research to make deductions and compare with other methods; often if there is consistency, then the results are presumed to be highly accurate. If the deductions differ, then it is advisable that the researcher uses more methods to verify the data, to make the results more accurate.
A scatter lot on R is one powerful exploratory data analysis model which can enable easy visualization and deductions (Husson, Josse, & Mazet, 2014). The scatter plot pitting age versus any other variable, or gender vs. any variable on consideration will show where a variable has the most impact. For example, a scatter plot between age and blood pressure will show the age at which the median blood pressure occurs, or in other words, where a single value appears most. A histogram would also be a powerful tool, and complementary to the other testing procedures. These models can easily be produced on R software, showing the anticipated results.
Phase 4: Model Building
Classification Modeling
Using age as the dependent variable and three others; blood pressure, cholesterol and maximum heart rate as independent variables, the R produces the following model for predicting age.
Y = 0.12A + 0.03B 0.12C + 50.06
Where A denotes the blood pressure, B is for cholesterol and C is for maximum heart rate. The value Y stands for the patients age. There are important observations from this model, where the coefficients indicate the level of correlation between the variables. The first is the positive correlation, albeit very small, between the blood pressure and cholesterol levels. That means that as age increases, there is a likelihood of high blood pressure occurring on ones body. The same is true for cholesterol. However, when it comes to maximum heart rate, the correlation is negative. As age increase, the patients maximum heart rate declines, indicated by a negatives coefficient.
However, in terms of classification, blood pressure and maximum heart rate are equal but opposite determinants of a patients health. Whereas more age increases the likelihood of blood pressure, more age reduce the impact of maximum heart rate. The cholesterol level is a low impact determinant, basing it on its correlation coefficient.
Scripting
> data1 <- read.csv(file.choose(), header=T)
> data1
> lm(age = blood.pressure + cholesterol + maximum.heart.rate)
Exploratory Analysis
The following histogram as modeled by R studio software. Evidently, from the histogram model, most patients were between 55 and 60 years, whose frequency was almost 70 years. That is an advanced age, and one postulated to be the peak for most diseases, particularly heart complications. Note that the data was obtained from patients who have come to seek heart-related solutions, therefore from the view of the model; a very important deduction can be made. The oldest among the patients was 80 years old, with a negligible number being less than 30 years of age.
Scripting
> data1 <- read.csv(file.choose(), header=T)
> data1
> hist(age)
A scatter plot also showed the same results, showing more values around the age brack of 55 to 60 years. However, most of those who were seeking the treatment concentrated between the age of 40 years to 65 years. That is the age considered high risk for heart-related complications.
Phase 5: Communicate Results
Age is a critical determinant when it comes to determining ones health status if the analysis above is anything to by. However, people in advanced age seem to have high blood pressure and high cholesterol levels. However, unlike those two, as people age, their maximum heart rate decline.
From the model, the exact age where the three variables will have no impact is 50 years. That value is obtained when every other variable is zero, giving age as the value of the constant alone. From the evidence obtain through analysis, heart complications are related to the health status of a person. A high blood pressure and cholesterol level are key symptoms of heart disease, and given the model finds that as one advance in age, such issues are also bound to increase. In that case, a conclusion that age is strongly related to the independent variables can be made.
There are potentially many other variables which would determine ones health, but in this brief analysis, blood pressure and cholesterol level are found to be determinants. The next study would probably focus on more variables, to ascertain their effect on heart complications.
References
Costa, L. B. M., & Godinho Filho, M. (2016). Lean healthcare: review, classification, and analysis of literature. Production Planning & Control, 27(10), 823-836.
Husson, F., Josse, J., Le, S., & Mazet, J. (2014). FactoMineR package: Multivariate exploratory data analysis and data mining with R. Version 1.28. CRAN.Ward, B. W., Schiller, J. S., & Goodman, R. A. (2014). Peer-reviewed: Multiple chronic conditions among us adults: A 2012 update. Preventing chronic disease, 11.
Willekens, F. (2014). Exploratory Data Analysis. In Multistate Analysis of Life Histories with R (pp. 81-107). Springer International Publishing.
Request Removal
If you are the original author of this essay and no longer wish to have it published on the thesishelpers.org website, please click below to request its removal:
- Essay Sample on Idle Dreaming
- Role of SOCS3 in Disease Pathology - Literature Review Example
- Children's Health Awareness - Presentation Example
- Presentation Example: My Final Story Project
- Medicine Essay Example: Nephritis
- Literature Review Example: Complementary and Alternative Medicine
- Micronutrients: Minerals Case Study