Categorical Regression With IBM SPSS

Blog

Categorical Regression With IBM SPSS

Categorical Regression Analysis With IBM SPSS

If you have been following the blogs on this platform you will notice that regression analysis has been explained in detail in both theory and practical. However, this blog aims to explain another type of regression analysis which is categorical regression. Categorical implies the regression analysis on categorical data. In this type of regression analysis, the categorical variables are recoded. Hence, we treat the data as normal continuous regression analysis. That simple!

Let’s apply this…!!!

Introduction

This categorical regression describes the relationship between a response variable and a set of explanatory variables. Quantifying this relationship can predict response values for any combination of predictors. For instance, a business looking to sell a new carpet cleaner tool wishes to investigate the impact of five variables: packaging design, brand name, price, a Good Housekeeping seal, and a money-back guarantee on customer preference.

Data

The carpet-cleaner variable is used to study the impact between the variables, as shown in the table below. Three brand names (K2R, Glory, and Bissell), three price levels, and two (either no or yes) are used for the last two variables. This variable represents a general indicator of preference for each customer profile. Using categorical regression, we will investigate the connections between the five components versus preference.

Variable name

Variable label

Variable values

package

Package design

A*,    B*,    C*

brand

Brand name

K2R,   Glory,   Bissell

price

Price

$1.19,    $1.39,    $1.59

seal

Good Housekeeping seal

No, yes

money

Money-back guarantee

No, yes

Table 1: Explanatory variables in the carpet-cleaner study

The average ranks for each profile are contained in the variable preference. High preference is correlated with low ranks, and ten customers rank 22 profiles. The carpet_data file contains this data collection.

Data Import

From the link above, import the data as a CSV file, then in Step 2 of the text import wizard, Select the labelled [1] as shown in figure 1 to notify Spss that your data has a header row.

data import

Figure 1: To indicate that the data has a header row

So as shown in figure 2, uncheck the Space option as a delimiter for importing data, then continue selecting Next until the data is imported.

delimiter

Figure 2: Deselected the Space as a CSV delimiter

Analysis:

Before analysing the imported data, we must first make sure our variables view is changed and lablled as shown in figure 4 and using the value shown in the table above.

data view

Figure 3: Encode values and label in a given column

To produce categorical linear regression output from the menus, click Analyse => Regression => Linear.

regression

Figure 4: Linear regression Analyse option

Drag and drop preference as the dependent variable as shown in figure 5, then select Package design and other variables as independent variables. Click on Plot, then select Y as *ZRESID and X as *ZPRED.

plotting

Figure 5: Standardised Linear regression plot

We have to select standardised in the Residuals group before doing the data analysis the select continue; in the Linear Regression dialogue, click OK to create.

model

Figure 6: Select Standardised Statistic

Model summary

Linear regression is the standard method for explaining the relationships between variables; R2 is the most typical metric for assessing how well a regression model fits the data. This statistic shows the proportion of the response's variation that can be accounted for by the weighting combination of predictors. The more the model R2 is closer to 1, the better. Since R2 is 0.707, it indicates that the predictor variables in the linear regression account for almost 71% of the variation in the customer preference rankings when preference is inferred on the five predictors.

 

Model Summary b

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.841a

.707

.615

3.998

a. Predictors: (Constant), Money-back guarantee, Price, Good Housekeeping seal, Brand name, Package design

b. Dependent Variable: Preference

Table 2: Regression Model summary

Coefficients interpretation

In the table, the standardized coefficients are displayed. If all other predictors remain constant, it is a sign that the coefficient tells us whether the expected response rises or falls when the predictor rises.

Coefficients a

Model

Unstandardised Coefficients

Standardised Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

22.529

5.177

 

4.352

.000

Package design

-4.159

1.036

-.560

-4.015

.001

Brand name

.429

1.054

.056

.407

.689

Price

2.703

1.009

.366

2.681

.016

Good Housekeeping seal

-4.314

1.780

-.330

-2.423

.028

Money-back guarantee

-2.779

1.921

-.197

-1.447

.167

a. Dependent Variable: Preference

Table 3: Standardised Regression Coefficient

The category coding for categorical data establishes the significance of an increase in a predictor.

For instance, a higher money-back guarantee, a better package, or the Good Housekeeping mark will lead to a lower anticipated preference rating. For instance, a one standard deviation change in the brand name results in a 0.056 standard deviation increase in the projected preference. Because Preference's standard deviation is 6.44, it rises by 0.056 × 6.44 = 0.361. The most significant changes in predicted preference result from changes in package design.

 

Regression Chart:  Scatter plot

Plotting is done between the standardised predicted values and the standardised residuals. The target is check if the linear model is appropriate for this analysis. How do we know?

Linear model fits the problem well if and only if no patterns is observed in the residual plot. But if a pattern is observed such as a U-shape then it is probably that linear model is not the best to fit the problem; a non-linear model would perform better. Figure 7 shows a blurry-U-shape pattern.

scatter plot

Figure 7: Standardised regression plot

This can be further understood by plotting another chart of standardised residual and package name. Select the following menu options to create a scatterplot of the residuals by the predictor Package design: Graphs >>> Chart Builder.

scatter plot

Figure 8: Using a graph to confirm the U shape pattern

Choose Simple Scatter from the Scatter/Dot collection. Choose the y-axis variable to be Standardised Residual and the x-axis to be Package design. Select OK.

residual plot

Figure 9: A plot of standardised residual and package design

From the figure above, a U-shape pattern can be seen more clearly and it implies that a non-linear model better fits the relationship between preference and the predictors.

 

Conclusion

In this blog, categorical regression analysis has been explained and applied to real-world problem. The first part of it explains the categorical regression and it differences from continuous regression. Using the categorical regression on the data showed that it is not appropriate to best describe the relationship between the target variable and predictors.

The residual plots' U-shape suggests that the package design should be nominally treated. However, the influence of a predictor or the connections between the predictors cannot be entirely captured using only regression coefficients in this case or analysis. Thus, it is recommended that this approach of validating model is conducted to ensure that the best model is used for a problem.

 


← Back


Comments

No comments added


Leave a Reply

Success/Error Message Goes Here
Do you need help with your academic work? Get in touch

AcademicianHelp

Your one-stop website for academic resources, tutoring, writing, editing, study abroad application, cv writing & proofreading needs.

Get Quote
TOP