Spanish Translation A/B Testing

Challenge Description

Company XYZ is a worldwide e-commerce site with localized versions of the site.

A data scientist at XYZ noticed that Spain-based users have a much higher conversion rate than any other Spanish-speaking country. She therefore went and talked to the international team in charge of Spain And LatAm to see if they had any ideas about why that was happening.

Spain and LatAm country manager suggested that one reason could be translation. All Spanish-speaking countries had the same translation of the site which was written by a Spaniard. They agreed to try a test where each country would have its one translation written by a local. That is, Argentinian users would see a translation written by an Argentinian, Mexican users by a Mexican and so on. Obviously, nothing would change for users from Spain.

After they run the test however, they are really surprised cause the test is negative. I.e., it appears that the non-localized translation was doing better!

You are asked to: 1. Confirm that the test is actually negative. That is, it appears that the old version of the site with just one translation across Spain and LatAm performs better 2. Explain why that might be happening. Are the localized translations really worse?

Data Preparation

#Read Data
user = read.csv("user_table.csv")
test = read.csv("test_table.csv")

#Check if user is unique by user id
length(user$user_id)==length(unique(user$user_id))

## [1] TRUE

#Check if test is unique by user id
length(test$user_id)==length(unique(test$user_id))

## [1] TRUE

#we find some user in test not found in user.
identical(test$user_id,user$user_id)

## [1] FALSE

length(user$user_id)-length(test$user_id)

## [1] -454

#Merge user and test tables to one
df=merge(user, test, by = "user_id",all.x = TRUE)
#Format the date
df$date=as.Date(df$date)
summary(df)

##     user_id        sex             age             country      
##  Min.   :      1   F:188382   Min.   :18.00   Mexico   :128484  
##  1st Qu.: 249819   M:264485   1st Qu.:22.00   Colombia : 54060  
##  Median : 500019              Median :26.00   Spain    : 51782  
##  Mean   : 499945              Mean   :27.13   Argentina: 46733  
##  3rd Qu.: 749543              3rd Qu.:31.00   Peru     : 33666  
##  Max.   :1000000              Max.   :70.00   Venezuela: 32054  
##                                               (Other)  :106088  
##       date               source          device       browser_language
##  Min.   :2015-11-30   Ads   :181693   Mobile:201551   EN   : 63079    
##  1st Qu.:2015-12-01   Direct: 90738   Web   :251316   ES   :377160    
##  Median :2015-12-03   SEO   :180436                   Other: 12628    
##  Mean   :2015-12-02                                                   
##  3rd Qu.:2015-12-04                                                   
##  Max.   :2015-12-04                                                   
##                                                                       
##    ads_channel            browser         conversion           test       
##  Bing    : 13670   Android_App:154977   Min.   :0.00000   Min.   :0.0000  
##  Facebook: 68358   Chrome     :101822   1st Qu.:0.00000   1st Qu.:0.0000  
##  Google  : 68113   FireFox    : 40721   Median :0.00000   Median :0.0000  
##  Other   :  4143   IE         : 61656   Mean   :0.04956   Mean   :0.4765  
##  Yahoo   : 27409   Iphone_App : 46574   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  NA's    :271174   Opera      :  6084   Max.   :1.00000   Max.   :1.0000  
##                    Safari     : 41033

Does Spain have higher conversion rate than other countries?

#Make sure Spain having a higher conversion rate
ConversionByCountry=df%>%
  group_by(country)%>%
  summarise(conversion=mean(conversion[test==0])
        )%>%
  arrange(desc(conversion))
head(ConversionByCountry)

## Source: local data frame [6 x 2]
## 
##       country conversion
##        (fctr)      (dbl)
## 1       Spain 0.07971882
## 2 El Salvador 0.05355404
## 3   Nicaragua 0.05264697
## 4  Costa Rica 0.05225564
## 5    Colombia 0.05208949
## 6    Honduras 0.05090576

Does local translation perform worse?

#Exclude the Spain users because Spain is not in the test.
control_test=subset(df, country!="Spain")
#T two sample test, find the conversion rate for control group and test group
t.test(control_test$conversion[control_test$test==1],control_test$conversion[control_test$test==0])

## 
##  Welch Two Sample t-test
## 
## data:  control_test$conversion[control_test$test == 1] and control_test$conversion[control_test$test == 0]
## t = -7.3539, df = 385260, p-value = 1.929e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.006181421 -0.003579837
## sample estimates:
##  mean of x  mean of y 
## 0.04341116 0.04829179

From T test result, we can see a significant difference between control group conversion rate and test group conversion rate. For the test group, the conversion rate is 0.0434. For the control group, the conversion rate is 0.048, which is 10% higher than the test group. Local translation did worse than the control group.

Why?

Some possible reasons for weird A/B testing result are:

Not enough data
Some bias is introduced

First, if we do not have enough data, result would be fluctuating, therefore, we plot the conversion rate by days to check the variance.

data_test_by_day=control_test%>%
  group_by(date)%>%
  summarize(test_vs_control=mean(conversion[test==1])/mean(conversion[test==0]))

ggplot(data=data_test_by_day,aes(x=date, y=test_vs_control))+
  geom_line()+ylab("test/control")+geom_hline(yintercept=1,linetype=2,color="blue")

From the plot, test is always worse than control. That probably means that we do have enough data, but there was some bias in the experiment set up.

Now, it’s time to find out if the test is biased. In an ideal world, the distribution of people in test and control for each segment should be the same. One way is to build a decision tree where the variables are the user dimensions and the outcome variable is whether the user is in test or control. If the tree splits, it means that for given values of that variable you are more likely to end up in test or control. But this should be impossible! Therefore, if the randomization worked, the tree should not split at all (or at least not be able to separate the two classes well).

tree=rpart(test~.,control_test[,-8],control=rpart.control(minbucket=nrow(control_test)/100,max_depth=2))

tree

## n= 401085 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 401085 99692.820 0.5379757  
##   2) country=Bolivia,Chile,Colombia,Costa Rica,Ecuador,El Salvador,Guatemala,Honduras,Mexico,Nicaragua,Panama,Paraguay,Peru,Venezuela 350218 87553.970 0.4987693 *
##   3) country=Argentina,Uruguay 50867  7894.097 0.8079108 *

The randomization is perfect for the countries on one side of the split. the test/control ratio is 0.498. but in Argentina and Uruguay together have 80% test and 20% of control.

Check the conversion rate for each country

data_test_by_country=control_test%>%
  group_by(country)%>%
  summarize(p_value=t.test(conversion[test==1],conversion[test==0])$p.value,
            conversion_test=t.test(conversion[test==1],conversion[test==0])$estimate[1],
            conversion_control=t.test(conversion[test==1],conversion[test==0])$estimate[2]
  )%>%
  arrange(p_value)

data_test_by_country

## Source: local data frame [16 x 4]
## 
##        country   p_value conversion_test conversion_control
##         (fctr)     (dbl)           (dbl)              (dbl)
## 1       Mexico 0.1655437      0.05118631         0.04949462
## 2  El Salvador 0.2481267      0.04794689         0.05355404
## 3        Chile 0.3028476      0.05129502         0.04810718
## 4    Argentina 0.3351465      0.01372502         0.01507054
## 5     Colombia 0.4237191      0.05057096         0.05208949
## 6     Honduras 0.4714629      0.04753981         0.05090576
## 7    Guatemala 0.5721072      0.04864721         0.05064288
## 8    Venezuela 0.5737015      0.04897831         0.05034367
## 9   Costa Rica 0.6878764      0.05473764         0.05225564
## 10      Panama 0.7053268      0.04937028         0.04679552
## 11     Bolivia 0.7188852      0.04790097         0.04936937
## 12        Peru 0.7719530      0.05060427         0.04991404
## 13   Nicaragua 0.7804004      0.05417676         0.05264697
## 14     Uruguay 0.8797640      0.01290670         0.01204819
## 15    Paraguay 0.8836965      0.04922910         0.04849315
## 16     Ecuador 0.9615117      0.04898842         0.04915381

After we control for country, the test clearly appears non significant. Not a great success given that the goal was to improve conversion rate, but a localized translation did not make worse.