Meal Replacement Product Attributes: A Discrete Choice Analysis

 

INTRODUCTION

With many entertainment options, relationship obligations and job/student responsibilities, today’s society is increasingly overwhelmed with choices on how to best allocate their time. According to the Bureau of Labor Statistics’ 2015 American Time Use Survey, women spend 50 minutes conducting food preparation and kitchen clean-up while men spend around 25 minutes in the same activity [1]. Accordingly, this is 34% and 30% of women and men’s total household activity-time, respectively. Thus, since food preparation, consumption and clean-up is a large chunk of waking, non-work hours, it may be an opportunity to optimize the time in a given day.

Enter meal replacement products. Historically, meal replacement products were marketed as weight-loss products, i.e. Slim Fast. However, a new wave of meal replacement companies are positioning their products as complete sources of nutritional needs. Soylent, for example, is a meal replacement product popular among Silicon Valley “techies”, young professionals and students who prefer to dedicate their time to work and extracurricular activities outside of the kitchen [2]. Therefore, given a potential increase in demand for meal replacement products, this project attempts to measure what factors affect a consumer’s choice in selecting meal replacement products by evaluating their different attributes. The results of which may be beneficial to the marketers of meal replacement products.

METHODS

Discrete choice analysis is a statistical method for discovering the relative weights among attributes a decision maker has when making a choice from a set of possible products or services. For example, which meal replacement products to purchase. Additionally, discrete choice analysis is based on a decision maker’s response to experimentally designed profiles of possible alternatives, where all alternatives have a different combination of product attributes. (See the Example Choice Set below, for an example).

In order to produce an optimized experimentally designed set of product profile alternatives, we utilized the SAS %ChoiceEff macro. (The script may be found in the Appendix.) There were a total of six choice sets.

In this generic discrete choice experiment, we simultaneously showed the respondents two alternative sets of various attribute levels and asked them to choose one. Each alternative had a different combination of attributes (independent variables). This selection process was repeated six times with each respondent. We can then say that the decision maker’s choice (dependent variable) is driven by the determinant attributes from the independent variables. Therefore, we use logistic regression, the glm function in R, to discover the relative weights and statistical significance of the several attributes.

DATA

The data was gathered from student’s responses from Dr. Alan Safer’s undergraduate level statistic courses at CSULB. Questions in the survey included gender, whether a student lived on campus or commuted, and meal replacement alternatives. Below is an example of one of the meal replacement comparisons, and respondents were asked which choice would they prefer, given that they were interested in a meal replacement product. Comparisons included if the product was vegan, included soy, protein content, calories, and price.

Example Choice Set:

Attributes………. Product #1…….. Product #2

Diet…………………. Vegan…………… Not Vegan

Soy Content……. Soy Present…… No Soy Present

Protein Content…. 20 grams…….. 37 grams

Calories……………. 500…………….. 400

Price……………….. 4.50…………….. 3.50

Below we load the data. Respondents chose among six choice sets (columns q1-q6). However, only the choice (first alternative marked as “0”, second marked as “1”) was recorded, and thus no data about the attributes which are under study is seen. Thus, since the data was entered incorrectly, there will be some heavy cleaning necessary to conduct the analysis.

data <- read.csv("/home/nbuser/library/meal.csv", h=T)
#install.packages("dplyr")
#install.packages("reshape")
library("dplyr")
library("reshape")
head(data)
nrow(data)
id gender livecampus mealreplacement q1 q2 q3 q4 q5 q6 which
1 1 0 0 1 1 0 1 1 0 0
2 0 1 1 1 0 1 1 1 0 NA
3 0 1 0 1 1 1 0 0 1 3
4 0 0 0 1 0 0 0 0 1 0
5 0 1 0 1 0 0 0 0 1 0
6 1 1 1 1 0 1 0 1 0 NA
142
Below, we use the “melt” function to make each chosen alternative into a row.
# Make the data long form.
data_questions <- melt(data[,c(1,5:10)], id=("id"))
head(sample_n(data_questions,10))
id variable value
212 70 q2 0
678 110 q5 1
544 118 q4 1
439 13 q4 1
515 89 q4 0
596 28 q5 0
Now,  we append the attribute columns, depending on which alternative was chosen.
For example, in the above data frame we have the column “variable” denoting the question number. Every question had two alternatives, the value “column”, 0 or 1. Each alternative had differing attribute levels. The attribute levels are not in the data, thus they need to be entered. The below functions do this. As we can see, if the row is question 1 (“q1”), and the alternative chosen is 0 (value column), the attribute level for diet was “vegan”. If the second alternative was chosen (value column = 1) the diet attribute level is “not vegan”. This was done for all attributes.
# Give the attributes for the questions.
data_questions_1 <- mutate(
  data_questions, 
  vegan = ifelse(variable == 'q1' & value == 0, 'vegan',
  ifelse(variable == 'q2' & value == 1, 'vegan',
  ifelse(variable == 'q3' & value == 1, 'vegan',
  ifelse(variable == 'q4' & value == 1, 'vegan',
  ifelse(variable == 'q5' & value == 1, 'vegan',
  ifelse(variable == 'q6' & value == 0, 'vegan',
  'not vegan')))))))

data_questions_2 <- mutate(
  data_questions_1, soy = ifelse(variable == 'q1' & 
  value == 0, 'soy',
  ifelse(variable == 'q2' & value == 0, 'soy',
  ifelse(variable == 'q3' & value == 0, 'soy',
  ifelse(variable == 'q4' & value == 1, 'soy',
  ifelse(variable == 'q5' & value == 1, 'soy',
  ifelse(variable == 'q6' & value == 0, 'soy',
  'no soy')))))))

data_questions_3 <- mutate(
  data_questions_2, protein = ifelse(variable == 'q1' & 
  value == 0, '20g',
  ifelse(variable == 'q2' & value == 1, '20g',
  ifelse(variable == 'q3' & value == 0, '20g',
  ifelse(variable == 'q4' & value == 1, '20g',
  ifelse(variable == 'q5' & value == 0, '20g',
  ifelse(variable == 'q6' & value == 1, '20g',
  '37g')))))))

data_questions_4 <- mutate(
  data_questions_3, calories = ifelse(variable == 'q1' & 
  value == 1, '400',
  ifelse(variable == 'q2' & value == 0, '400',
  ifelse(variable == 'q3' & value == 1, '400',
  ifelse(variable == 'q4' & value == 1, '400',
  ifelse(variable == 'q5' & value == 0, '400',
  ifelse(variable == 'q6' & value == 0, '400',
  '500')))))))

data_questions_5 <- mutate(
  data_questions_4, price = ifelse(variable == 'q1' & 
  value == 1, '$3.50',
  ifelse(variable == 'q2' & value == 1, '$3.50',
  ifelse(variable == 'q3' & value == 0, '$3.50',
  ifelse(variable == 'q4' & value == 1, '$3.50',
  ifelse(variable == 'q5' & value == 1, '$3.50',
  ifelse(variable == 'q6' & value == 0, '$3.50',
  '$4.50')))))))
data_questions_6 <- cbind(data_questions_5, data[,2])
colnames(data_questions_6)[9] <- "gender"
data_questions_6$gender <- factor(data_questions_6$gender,
                                levels = c(0,1),
                                labels = c("Male", "Female"))
head(data_questions_6)
id variable value vegan soy protein calories price gender
1 q1 1 not vegan no soy 37g 400 $3.50 Female
2 q1 1 not vegan no soy 37g 400 $3.50 Male
3 q1 1 not vegan no soy 37g 400 $3.50 Male
4 q1 1 not vegan no soy 37g 400 $3.50 Male
5 q1 1 not vegan no soy 37g 400 $3.50 Male
6 q1 1 not vegan no soy 37g 400 $3.50 Female

Above we can see the questions are now the rows, and we have the proper attribute levels in columns. Below, we now append a column to denote these were the alternatives chosen by the respondent.

#### Add column to indicate these are the chosen options.
questions <- nrow(data_questions_6)
chosen <- rep(1, questions)
data_questions_7 <- cbind(data_questions_6, chosen)
head(data_questions_7)
id variable value vegan soy protein calories price gender chosen
1 q1 1 not vegan no soy 37g 400 $3.50 Female 1
2 q1 1 not vegan no soy 37g 400 $3.50 Male 1
3 q1 1 not vegan no soy 37g 400 $3.50 Male 1
4 q1 1 not vegan no soy 37g 400 $3.50 Male 1
5 q1 1 not vegan no soy 37g 400 $3.50 Male 1
6 q1 1 not vegan no soy 37g 400 $3.50 Female 1

Now, in the above data.frame, we have a label to denote these are the alternatives chosen by the respondents. However, to conduct the analysis, we need to see the alternatives which were not chosen. Since the alternatives which were not chosen are the direct opposite to those which were chosen, we merely take the rows of those which were chosen and create new rows which are the exact opposite.

# remove 'value' column.
drops <- c("value")
data_chosen <- data_questions_7[ , 
      !(names(data_questions_7) %in% drops)]
## Make a copy of the chosen options in order to 
## insert the un-chosen alternatives.
not_chosen <- data.frame(id=integer(),
                         variable=character(), 
                         vegan=character(),
                         soy=character(),
                         protein=character(),
                         calories=character(),
                         price=character(),
                         gender=character(),
                         chosen=numeric(),
                         stringsAsFactors=FALSE)
col_names <- names(data_questions_7[3:7])
for (i in 1:questions) {
  not_chosen[i,] <- data_chosen[i,]
}
#### Putting a place holder.
not_chosen[,3] <- ifelse(not_chosen[,3] == "vegan", "v",
                  ifelse(not_chosen[,3] == "not vegan", "nv", NA))
not_chosen[,4] <- ifelse(not_chosen[,4] == "soy", "s",
                  ifelse(not_chosen[,4] == "no soy", "ns", NA))
not_chosen[,5] <- ifelse(not_chosen[,5] == "20g", "20",
                  ifelse(not_chosen[,5] == "37g", "37", NA))
not_chosen[,6] <- ifelse(not_chosen[,6] == "500", "5",
                  ifelse(not_chosen[,6] == "400", "4", NA))
not_chosen[,7] <- ifelse(not_chosen[,7] == "$4.50", "4",
                  ifelse(not_chosen[,7] == "$3.50", "3", NA))
#### Convert to opposite value.
not_chosen[,3] <- ifelse(not_chosen[,3] == "v", "not vegan",
                  ifelse(not_chosen[,3] == "nv", "vegan", NA))
not_chosen[,4] <- ifelse(not_chosen[,4] == "s", "no soy",
                  ifelse(not_chosen[,4] == "ns", "soy", NA))
not_chosen[,5] <- ifelse(not_chosen[,5] == "20", "37g",
                  ifelse(not_chosen[,5] == "37", "20g", NA))
not_chosen[,6] <- ifelse(not_chosen[,6] == "5", "400",
                  ifelse(not_chosen[,6] == "4", "500", NA))
not_chosen[,7] <- ifelse(not_chosen[,7] == "4", "$3.50",
                  ifelse(not_chosen[,7] == "3", "$4.50", NA))
## Change Gender
not_chosen[,8] <- ifelse(not_chosen[,8] == "1", "Male",
                  ifelse(not_chosen[,8] == "2", "Female", NA))
## Label these as the options which were not chosen.
not_chosen[,9] <- rep(0, questions)
## Combine and write data.
data_combined <- rbind(data_chosen, not_chosen)
head(arrange(data_combined, id), n=12)
drop_2 <- c("variable","id")
combined_data <- data_combined[ , 
    !(names(data_combined) %in% drop_2)]
head(combined_data)
id variable vegan soy protein calories price gender chosen
1 q1 not vegan no soy 37g 400 $3.50 Female 1
1 q2 vegan no soy 20g 500 $3.50 Female 1
1 q3 not vegan soy 20g 500 $3.50 Female 1
1 q4 vegan soy 20g 400 $3.50 Female 1
1 q5 vegan soy 37g 500 $3.50 Female 1
1 q6 vegan soy 37g 400 $3.50 Female 1
1 1 vegan soy 20g 500 $4.50 Female 0
1 2 not vegan soy 37g 400 $4.50 Female 0
1 3 vegan no soy 37g 400 $4.50 Female 0
1 4 not vegan no soy 37g 500 $4.50 Female 0
1 5 not vegan no soy 20g 400 $4.50 Female 0
1 6 not vegan no soy 20g 500 $4.50 Female 0
vegan soy protein calories price gender chosen
not vegan no soy 37g 400 $3.50 Female 1
not vegan no soy 37g 400 $3.50 Male 1
not vegan no soy 37g 400 $3.50 Male 1
not vegan no soy 37g 400 $3.50 Male 1
not vegan no soy 37g 400 $3.50 Male 1
not vegan no soy 37g 400 $3.50 Female 1

Now, As seen above, we have the data in the proper format: each alternative is a row, each attribute is a column, and the “chosen” column indicates whether the respondent chose this option or not, which is necessary for employing logistic regression.

Descriptive Stats

From the tables below we can see that the majority of respondents were Females who live on campus. Both genders had a higher number of responses who never consumed a meal replacement drink compared to those that had consumed a meal replacement drink. This may be attributed to the fact that students who live on campus have access to buffet style dining halls and do not have to worry about time lost from cooking or cleaning. The bar graph shows that Ensure was the most consumed brand of meal replacement but the majority of students never consumed a meal replacement product.

In [22]:
count(data, gender)
# Male = 0, Female = 1

count(filter(data, livecampus == 0 | livecampus == 1), livecampus)
count(filter(data, mealreplacement==0| mealreplacement==1), 
                mealreplacement)
count(filter(data, which <= 5), which)
# Ensure = 0, 
# Huel =1, 
# Soylent = 2, 
# Slim Fast = 3
# Herbalife
# Other = 5
library(ggplot2)
ggplot(data, aes(x=factor(which, exclude = "6"))) +
   geom_bar(stat="count", width=0.7, fill="steelblue") +
   theme_minimal() + xlab("Meal Replacement Product") +
   scale_x_discrete(breaks=0:6,
      labels=c("Ensure","Huel","Soylent",
                 "Slim Fast","Herbalife","Other", "None"))
gender n
Male 40
Female 102
Do you live on campus? n
Yes 34
No 104
Have you consumed a Meal Replacement product? n
Yes 56
No 84
If so, which? n
Ensure 18
Huel 7
Soylent 2
Slim-Fast 5
Herbalife 5
Other 18
meal

RESULTS

The results of the logistic regression, for all respondents, are shown in Table 5. As we can see, Protein is a highly desired attribute. That is, the higher the protein content of a meal replacement product, the more likely it is to be purchased.

The results also indicate Vegan and Soy products decrease the likelihood of a product being chosen, with the Soy attribute holding the lowest weight. The highest relative weight is observed to be the Price attribute, the higher the price, the lower the likelihood of the product being purchased. All attributes are found to be statistically significant. This means the ideal combination of choices is a Non-Vegan, Non-Soy, 400 calories, 37g Protein drink for $3.50. This of course means the best drink that the sample opted to choose was the cheaper, no calorie, higher protein drink, which is a fairly obvious statement. The importance of the result is that it indicated price is the most influential factor when considering the drink, with protein, vegan option, calories, then the soy option being the next significant factors in decreasing significance. It is interesting to note that the calorie count was less significant than having the option for vegan, as more people would be conscious of their caloric intake than whether the content is vegan friendly. It is possible that because the difference of calories was only 100, so it could’ve been less prioritized in the options available.

#str(combined_data)
factor_data <- lapply(combined_data[,1:5], as.factor)
#str(factor_data)
#contrasts(factor_data$vegan)
#contrasts(factor_data$soy)
#contrasts(factor_data$protein)
#contrasts(factor_data$calories)
#contrasts(factor_data$price)
model_data <- cbind(factor_data, combined_data[,6:7])
#str(model_data)
#### Full Model
model <- glm(chosen ~ vegan+soy+calories+protein+price,
             family = binomial(link='logit'), data = model_data)
summary(model)
Call:
glm(formula = chosen ~ vegan + soy + calories + protein + price, 
    family = binomial(link = "logit"), data = model_data)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.005  -1.130   0.000   1.130   2.005  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.9205     0.1659   5.550 2.86e-08 ***
veganvegan   -0.6020     0.1178  -5.111 3.20e-07 ***
soysoy       -0.2914     0.1179  -2.472   0.0134 *  
calories500  -0.5158     0.1096  -4.707 2.51e-06 ***
protein37g    0.9461     0.1095   8.640  < 2e-16 ***
price$4.50   -1.3779     0.1309 -10.523  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2362.2  on 1703  degrees of freedom
Residual deviance: 2114.0  on 1698  degrees of freedom
AIC: 2126

Number of Fisher Scoring iterations: 4

The estimates can be interpreted with a bit of calculation. They represent the log odds of choosing one option over the other (when other factors are constant). So by taking the exponent of the estimate, we can interpret the value as the odds increase/decrease when the option is available. For example, more protein being available (37g instead of 20g) means the odds are exp(0.9461) = 2.576. This means the odds of favoring one drink over the other is increased 157% (in other words, 2.5 times more likely) when more protein is available, given that the other factors are constant between the choices. For the rest of the factors, the odds increase by 82.6% when non-vegan is available, 33.8% when non-soy is available, 67.4% when calories is less, and almost 300% when price is cheaper. It should be noted that although these are possible representations of the output, they should be reasonably interpreted (for example, it does not necessarily mean someone will more likely choose a non-vegan drink over the vegan option just because it is non-vegan). This misrepresentation is why some people believe vaccines cause autism.

Now, we look at the results of the attribute weights when partitioning the survey responses on gender, found below. The most important attribute for females is the price, followed by the protein and calorie content. Females generally want a low priced, low calorie and high protein meal replacement product.

# Females
model_female <- glm(chosen ~ vegan+soy+calories+protein+price,
             family = binomial(link='logit'), 
                     data = filter(model_data, gender=="Female"))
summary(model_female)
Call:
glm(formula = chosen ~ vegan + soy + calories + protein + price, 
    family = binomial(link = "logit"), data = filter(model_data, 
        gender == "Female"))

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.889  -1.142   0.000   1.142   1.889  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.8392     0.1895   4.428 9.49e-06 ***
veganvegan   -0.1237     0.1360  -0.909    0.363    
soysoy       -0.1630     0.1360  -1.198    0.231    
calories500  -0.6319     0.1312  -4.815 1.47e-06 ***
protein37g    0.7611     0.1313   5.796 6.80e-09 ***
price$4.50   -1.5209     0.1475 -10.310  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1696.8  on 1223  degrees of freedom
Residual deviance: 1488.7  on 1218  degrees of freedom
AIC: 1500.7

Number of Fisher Scoring iterations: 4

Males, on the other hand are much more sensitive to the diet of the product. If a product is Vegan, they are much less likely to conduct a purchase. This is an interesting conclusion since a vegan product is simply a product which is entirely plant-based, i.e., there are no animal products. The second most important attribute for males is the protein content. This may likely derive from a male’s desire to gain relatively higher levels of muscular development. Generally, men desire a meal replacement product that is composed of animal products (most likely cow milk), is high in protein, low cost and has no soy.

# Males
model_male <- glm(chosen ~ vegan+soy+calories+protein+price,
                    family = binomial(link='logit'), 
                      data = filter(model_data, gender=="Male"))
summary(model_male)
Call:
glm(formula = chosen ~ vegan + soy + calories + protein + price, 
    family = binomial(link = "logit"), data = filter(model_data, 
        gender == "Male"))

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.494  -1.061   0.000   1.061   2.494  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.3491     0.4126   3.270 0.001077 ** 
veganvegan   -2.0520     0.2775  -7.394 1.43e-13 ***
soysoy       -0.7751     0.2846  -2.723 0.006463 ** 
calories500  -0.2937     0.2348  -1.251 0.211043    
protein37g    1.7146     0.2472   6.935 4.05e-12 ***
price$4.50   -1.2921     0.3383  -3.819 0.000134 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 665.42  on 479  degrees of freedom
Residual deviance: 529.22  on 474  degrees of freedom
AIC: 541.22

Number of Fisher Scoring iterations: 5

CONCLUSION

Respondents to the survey are not generally consumers of meal replacement products. However, using discrete choice analysis, we were able to uncover certain attributes which marketers of meal replacement products may find useful when targeting college students. We found the most important attribute for meal replacement products is, unsurprisingly, the price. Additionally, survey respondents, especially male respondents, are wary of vegan products. All respondents, however, are sensitive to the total calorie level and amount of protein. Further Research should begin their study by conducting informal interviews with individuals in the target population to order to discover the important attributes of a meal replacement product, as the models in this study had low McFadden R-squared values.

APENDIX

A. Sources

B. Code

A. Sources:

  1. https://www.bls.gov/TUS/CHARTS/HOUSEHOLD.HTM
  2. https://www.salon.com/2017/05/28/what-soylent-tells-us-about-silicon-valley/

B. Code:

SAS code

%mktex(2 ** 5, n=32);

%mktlab(data=design, int=f1-f2)

proc print;

run;

%choiceff(data=final,

model=class(x1-x5),

nsets=6,

maxiter=50,

flags=2,

beta=zero);

proc print; by set; id set; run;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s