Analysis of Descriptive Statistics - Paper Example

2021-08-27 11:14:50
5 pages
1261 words
University of California, Santa Barbara
Type of paper: 
Case study
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

As shown in Appendix 1, the mean income is 43.48 with a standard error of 2.058. The mean household size is 3.42 while the average amount charged is 3,964.06. The range statistic for the income variable is 46 implying that the difference between the highest and lowest income is 46,000. The range for household size is 6 while that of the amount charged is 3814. The standard deviation of income is 14.551 implying that the incomes of 50 consumers deviated from the mean income by a standard of 14.551. The standard deviation for the household size is 1.739 while that of the amount charged is 933.494. Comparing the means and standard deviations of the three variables, we can conclude that the amount charged had the greatest variability while the household size had the least variability.

Simple Linear Regression

Regression equations express the dependent variable in terms of the predictor or independent variables. The equation can be used to estimate the value of the dependent variable given any value of the explanatory variable.

Regression of Amount Charged Against Income

As shown on Appendix 2, the regression equation is given as follows:

Amount Charged, Y = 2204.00 + 40.48Income(X).

The model shows that there is a positive association between the amount charged by the credit card companies and the income of the consumer. The t-value of the coefficient of income in the model is 5.635 and its probability is 0.000. This implies that the relationship between the amount charged and the income of consumers is statistically significant.

The R Square of this model is 0.398. It means that changes in the incomes of consumers explained 39.8% of the changes in the amount charged by credit card companies. It further indicates that the models explanatory power is low since more than 50% of the changes in the amount charged are not explained by the changes in consumer incomes.

Regression of Amount Charged Household Size

This model, as shown in Appendix 3, is:

Amount Charged, Y = 2,581.941 + 404.128Householdsize

It also shows that the amount charged and household size are positively related. The t-value of this coefficient is 7.924 and the Significance value is 0.000. This indicates that the coefficient of household size is statistically significant. The coefficient of determination of this equation is 0.753. This indicates that 75.3% of the changes in the amount charged was explained by the changes in household size.

Comparing the Two Models

Both models indicate statistically significant relationships with the amount charged. The second model has a greater explanatory power (R Square) than that of model 1. Thus, household size is the better predictor of annual credit card charges than income. If household size is used, a greater percentage of the changes in annual credit card charges will be explained than when income is used as the independent variable.

Multiple Regression Equation

The multiple regression equation includes income and household sizes as independent variables. The model can be expressed as follows (see Appendix 3):

Annual credit card charge, Y = 1304.905 + 33.133Income + 356.296Householdsize

The standardized coefficients indicate that household size has a greater impact on annual credit card charge than consumers income.

The F Statistic is 111.218 with a significance of 0.000. This shows that the model is statistically significant and the relationship between annual credit card charges and income and household size is not merely accidental. The models coefficient of determination as shown by the R Square is 0.826. It means that 82.6% of the changes in annual credit card changes were explained by the changes in income and household sizes. The explanatory power of the model is high since less than 10% of the changes in annual credit card charges are not explained by the independent variables in the model. The multiple regression equation has the greatest explanatory power of all the three models. Thus, it is the best equation for predicting the annual credit card charges.

Predicting Annual Credit Card Charge

Equation: Y = 1304.905 + 33.133Income + 356.296Householdsize

Household size = 3

Annual income = 40,000

Annual credit charge = 1304.905 + (33.133 40) + (356.296 3)

= 1,304.905 + 525.32 + 1,068.888

= $2,899

Need for Other Independent Variables

The explanatory of the model is just over 90%. There is need to add more independent variables to improve the models coefficient of determination. A perfect model is one in which all the changes in the dependent valuable are explained by changes in the independent variable. Other variables that can be added include interest rates, credit history and debt-to-income ratio.


Scatterplot Matrix

The above scatterplot matrix indicates that there are relationships between the six variables. The scatter plots indicate patterns implying that the variables are correlated.

Suitability of Multiple Regression

Multiple regression seem suitable for these data. Multiple regression is used when expressing a response variable in terms of more than one independent variable. They indicate linear relationships between the independent and dependent variables.

Multiple Linear Regression Equation

The multiple linear regression equation, as shown on appendix 5, is as follows:

Y = 82.255 + 0.321X1 + 0.150X2 + 28.119X3 + 10.968X4 + 0.161X5

Where Y = average insurance rate

X1 = Population density

X2 = auto theft rate

X3 = Deaths/100M miles

X4 = Average drive time

X5 = Hospital cost per day

The ANOVA table for the model indicates that the models F Statistic is 25.427 with a significance of 0.000. This means that the model is statistically significant. There is a significant statistical association between average insurance rate and the five independent variables.

Interpretation of the Coefficients

B1: Coefficient of Population Density

The coefficient of population density in the model is 0.321. This indicates that there is a positive relationship between average insurance rate and population density. The t value of the coefficient is 4.884 and the significance value is 0.000. Thus, we can infer that the coefficient of population density is significant. It means that it is a good measure of the change in insurance rate resulting from a unit change in population density.

B2: Coefficient of Auto Theft Rate

The value of this coefficient is 0.15 implying a positive association between average insurance rate and auto theft rate. The t-value of this coefficient is 1.774 and the p-value is 0.083. The coefficient is not statistically significant since the significance value is more than 5%. Thus, the coefficient is not a reliable indicator of the variation in average insurance rate that results from a 1% change in auto theft rate.

B3: Coefficient of Deaths per 100M Miles

The coefficients value is 28.119 implying that an increase in the number of deaths per 100M miles by 1% causes a 28% increase in the average insurance rate. Thus, average insurance rate and deaths per 1000 miles are positively correlated. The p-value of the t Statistic for this coefficient is 0.415 indicating that it is not statistically significant.

B4: Coefficient of Average Drive Time

The coefficient is 10.968 showing that a unit increase in average drive time will increase average insurance rate by 10.968. The significance value of the t-Statistic of the coefficient is 0.034. This shows that the coefficient is statistically significant hence it is a good measure of the change in average insurance rate per unit change in average drive time.

B5: Coefficient of Hospital Cost per Day

This coefficient is 0.161 indicating a positive association between hospital cost per day and the response variable. Its p-value is slightly more than 5% indicating that it is not statistically significant.

The standardized coefficients Beta show that population density has the greatest impact on average insurance rate while deaths per 100M miles has the least impact on the response variable.

Coefficient of Determination

The multiple linear regression equations R Square is 0.743. This shows that 74.3% of the variation in the observed average insurance rates can be accounted for the give independent variables in the regression equation. About 25% of the variation is accounted for by factors other than the five predictor variables.

Suitability of Independent Variables

Not all the five independent variables should remain in the model since some are not statistically significant. Only population density and average drive time are statistically significant. The rest are not and should be eliminated. Alternatively, increasing the sample size and the number of variables can improve the models fit.

Residual Plots

Have the same topic and dont`t know what to write?
We can write a custom paper on any topic you need.

Request Removal

If you are the original author of this essay and no longer wish to have it published on the website, please click below to request its removal: