This project will analyse how effective the current channels are in acquiring new customers. Therefore, the following data were collected:
The variables and their descriptions are shown below:
The first approach is to categorise all variables in the form of a market response framework.
The data has been cleaned prior to its upload into R. Therefore, we can proceed with a linear regression analysis on the data.
#normalise function
normalize <- function(x) {
return((x-min(x))/ (max(x)-min(x)))
}
mrm1 = mrm %>% mutate(
acquisition_n = normalize(acquisition),
promotions_n = normalize(promotions),
lag_traditional.advertising_n = normalize(lag_traditional.advertising),
lag_Twitter_valence_n = normalize(lag_Twitter_valence),
lag_Twitter_volume_n = normalize(lag_Twitter_volume),
lag_FB_impressions_n = normalize(lag_FB_impressions)
)
model = lm(acquisition_n ~ buzz.event + media.event + holidays + promotions_n + lag_traditional.advertising_n + lag_Twitter_valence_n + lag_Twitter_volume_n + lag_FB_impressions_n, mrm1)
summary(model)
##
## Call:
## lm(formula = acquisition_n ~ buzz.event + media.event + holidays +
## promotions_n + lag_traditional.advertising_n + lag_Twitter_valence_n +
## lag_Twitter_volume_n + lag_FB_impressions_n, data = mrm1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32888 -0.10356 -0.01784 0.10310 0.51593
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.068409 0.054156 1.263 0.209219
## buzz.event 0.005015 0.101408 0.049 0.960645
## media.event 0.144434 0.041402 3.489 0.000702 ***
## holidays 0.060637 0.039139 1.549 0.124212
## promotions_n 0.275896 0.072476 3.807 0.000233 ***
## lag_traditional.advertising_n 0.296964 0.065331 4.546 1.43e-05 ***
## lag_Twitter_valence_n 0.303281 0.096155 3.154 0.002081 **
## lag_Twitter_volume_n -0.201797 0.070405 -2.866 0.004986 **
## lag_FB_impressions_n 0.084160 0.122632 0.686 0.493995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1702 on 109 degrees of freedom
## Multiple R-squared: 0.4277, Adjusted R-squared: 0.3857
## F-statistic: 10.18 on 8 and 109 DF, p-value: 1.565e-10
The performance of the linear regression model will be evaluated in 3 ways:
Diagnostic plots:
par(mfrow = c(2, 2))
plot(model)
Results:
H0: The means between test and training sets are the same
HA: The means between test and training sets are different
#Two fold cross-validation
split <- round(nrow(mrm1)*0.75)
split
## [1] 88
nrow(mrm1) - split
## [1] 30
set.seed(1)
#Function for two-sample t-test
ttest <- function(training, testing) {
x1_mean <- mean(testing)
x2_mean <- mean(training)
s1 <- sd(testing)
s2 <- sd(training)
n1 <- length(testing)
n2 <- length(training)
dfs <- min(n1-1, n2-1)
tdata <- (x1_mean - x2_mean) / sqrt((s1^2/n1)+(s2^2/n2))
tdata
pvalue <- 2*pt(abs(tdata), df = dfs, lower.tail = FALSE)
return(pvalue)
}
trainindex <- createDataPartition(mrm1$acquisition_n, p = .75,
list = FALSE,
times = 1)
trainmrm <- mrm1[trainindex,]
testmrm <- mrm1[-trainindex,]
Derivation of coefficients can be found in the associated .rmd file
Results: Since the p-value of the t-statistic for all explanatory variables are much larger than the 5% significance level, there is not enough evidence to reject the null hypothesis, meaning that the means of explanatory variables in the Training and Test sets are similar.
Most effective communication channel (traditional advertising, Facebook, Twitter) in stimulating customer acquisition:
Results:
Recommendations for telecommunication provider