Need a unique essay?
Order now

Problem Solving Example: Eliminating Errors in the Application DataSet

2021-07-22
3 pages
789 words
University/College: 
Wesleyan University
Type of paper: 
Problem solving
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

There are many errors in the Module 3 Application DataSet. For instance, a look at the Age-BP variable reveals six typographical errors (68 yrs, 42.1, 70 Years, 300 mos, 29 years 2 months, and 55y 6 m). Because the other correct data values do not contain decimal points, I changed the first error (42.1) to 42 clicking on the data value and editing followed by saving. Second, because all the data in SPSS should be numerical, I double-clicked on the data value 70 Years and changed it to 70. Using the same procedure, I changed 68 yrs, 29 years 2 months, and 55y 6 m to 68, 29, and 55 then save the changes. Additionally, 300 mos should probably be 30 years, so I changed the data value to 30 and saved the changes.

A look at the second measure (sex) reveals coding eight errors (M, F, Male, female, man, woman, f, and m). In SPSS, sex is coded as 1 or 2 to indicate the gender of the participant (male or female). However, in our case here, we are not sure whether 1 indicates a female and 2 a male or vice versa and since I had no access to the original record of participants, I deleted all the eight incorrectly entered data for sex variable.

Next, I examined body mass variable and identified seven errors (obese, overweight, normal, overweight, OK, fat, and pathologic). They were identified as errors because they are non-numeric. Because I had no access to the original paper record which shows the correct body mass index values, all these errors were deleted.

In the age at death variable, eight errors (80+, almost 70, elderly, 90+, old, retired, 90+, and mid-60s). All of them were deleted because they were not numeric. On the other hand, diabetic variable had no errors.

Lastly, the place variable has nine errors (rural, City, small, nowhere, town, >1million, medium, large, and large). They were identified as errors because they were non-numeric. I used the following coding criteria to make corrections to the data; Rural=1, sub-urban= 2, and urban= 3. Using this coding criterion, I changed rural to 1, City to 3, and small to 1. However, nowhere, town, >1million, medium, large, and large were deleted because they did not help much to tell whether or not the place was rural, suburban, or urban.

BP measure has 16 errors (high, 165 100, ok, good, too high, 89 153, elevated, 139 79, 144, 141_87, 93, s180 d190, 146 85, low, 167-95, and super hi). All of these were errors because they were either non-numeric, were not numerically written as systolic over diastolic, or contained the systolic value only. All the non-numeric values were deleted. Also, cases with only the systolic data value or one data value were deleted. Lastly, cases with two data values but incorrectly written were corrected placing the sign / over the systolic and diastolic values). After making these corrections, the corrected data view is as shown below:

Corrected Data View

Assignment Part 2

An examination of the first variable, ID_Num, reveal three errors in case number 5, 11 and 16. The error is 1o0, probably a typographical error. This should have been 11, so I deleted and entered 11. On the other hand, the value 4 in case 5 is a repetition and should have been 5. Therefore, the correction was done by deleting and replacing with 5. The same procedure was used to replace 17 in case 16 to 16.

In SPSS, sex can be categorized as 1 or 2 for male or female. It can also be categorized as 0 or 1 for male or female. However, in the sex variable, there are three categories (0, 1, and 2). This must have been a typographical error. Five errors (values indicated 2) were identified and deleted.

In Systolic_BP, four errors were identified. They included s139, sy170, sys151, and -190. They were corrected by removing the letters and the negative sign. In, Diastolic_B, six errors were identified. They included d81, di110, dia109, /85, 0, and -99. The first three were identified as errors because they contained letters. A correction was done by deleting these letters, then saving the changes. The fourth error (/85) was corrected by deleting the over sign while the fifth error (0) was corrected by deleting the entry. Lastly, in error (-99), the negative sign was removed and changes saved.

In the BMI variable, three errors were noticed (0.25, 223lbs, and 2). The values (0.25 and 2) were too low to be BMI, hence were deleted while 223lbs was an error because it contained letters. In the Death_age variable, one error (-84) was noticed. Corrections were made by deleting the negative sign. In the diabetic category, one error was noticed (-1). Correction involved deletion of the negative sign. Age cannot be negative. Therefore, this was an error. In the place variable, two errors (1.5 and 3.5) were identified. A correction was done by replacing 1.5 with 1 and 3.5 with 3. Lastly, in Age_BP variable, two errors were identified (25y and -99).they were corrected through deletion of the letter and the negative sign respectively.

Corrected Data View

 

 

Have the same topic and dont`t know what to write?
We can write a custom paper on any topic you need.

Request Removal

If you are the original author of this essay and no longer wish to have it published on the thesishelpers.org website, please click below to request its removal: