If you are a Google Analytics Premium customer, you have access to all of your raw data through BigQuery Export. The visitor, session, and hit level BigQuery Export data is rich enough to perform more advanced analysis, such as statistical modeling and forecasting. This guide demonstrates how BigQuery Export data can be used to build a statistical model to forecast customer behavior and calculate Customer Lifetime Value (CLV).
Fill out this form to complete this analysis using your own Google Analytics Premium data.
Customer Lifetime Value is the amount of money that an individual customer will spend with a given business in the future. It is often used to:
Additionally, the probability that a customer is still "alive" is used to:
Taken together, a strong model of customer behavior serves as the foundation for many marketing activities and measurement.
To follow this guide and complete the Customer Lifetime Value analysis you need the following:
If you are new to iPython Notebooks, complete "An Introduction to Notebooks" before proceeding.
This guide uses BigQuery Export to calculate Customer Lifetime Value and related metrics by completing the following steps:
To extract the data from BigQuery you need to import Big Query functionality into this notebook with the following statements:
import gcp
import gcp.bigquery as bq
If you are using Ecommerce tracking or Goals, all of the data needed to calculate CLV is available via BigQuery Export.
The following query extracts the date, number of transactions, and transaction revenue for all transactions made by first time visitors in a given date range (i.e. a cohort).
It has been customized as follows, but you may wish to modify it to suit your business. For example, a different conversion event can be substituted, or a different visitor identifier, such as a custom dimension, can be used in place of fullVisitorId
.
class Column(object):
def __init__(self, name):
self._name = name
def _repr_sql_(self):
return self._name
ga_view_id = [your View id here]
cloud_project_number = [your Cloud Project number here]
visitor_id = Column('fullVisitorId')
conversion = Column('totals.transactions')
conversion_revenue = Column('totals.transactionRevenue')
start_date = '2014-01-01'
end_date = '2014-06-01'
cohort_end_date = '2014-01-15'
end_of_cal_period = '2014-03-15'
sampling_rate = Column('1/50')
Now that all of the input needed have been specified, run the query below to view the raw data.
%%bigquery sql --name event_log_query
SELECT
*
FROM
(
SELECT
date,
$visitor_id,
$conversion_revenue,
$conversion,
FROM
TABLE_DATE_RANGE([$cloud_project_number:$ga_view_id.ga_sessions_],
TIMESTAMP($start_date),
TIMESTAMP($end_date))
WHERE
$conversion IS NOT NULL
) a
JOIN
(
SELECT
$visitor_id
FROM
TABLE_DATE_RANGE([$cloud_project_number:$ga_view_id.ga_sessions_],
TIMESTAMP($start_date),
TIMESTAMP($cohort_end_date))
WHERE
totals.newVisits IS NOT NULL
AND RAND() < $sampling_rate
GROUP BY
1
) b
ON
a.$visitor_id=b.$visitor_id
ORDER BY
date
View the query with the print
command to ensure the fields have been set properly. The query as printed below can also be run in the BigQuery web tool.
print event_log_query.sql # View the query
SELECT * FROM ( SELECT date, fullVisitorId, totals.transactionRevenue, totals.transactions, FROM TABLE_DATE_RANGE([XXXXX:YYYYY.ga_sessions_], TIMESTAMP("2014-01-01"), TIMESTAMP("2014-06-01")) WHERE totals.transactions IS NOT NULL ) a JOIN ( SELECT fullVisitorId FROM TABLE_DATE_RANGE([XXXXX:YYYYY.ga_sessions_], TIMESTAMP("2014-01-01"), TIMESTAMP("2014-01-15")) WHERE totals.newVisits IS NOT NULL AND RAND() < 1/50 GROUP BY 1 ) b ON a.fullVisitorId=b.fullVisitorId ORDER BY date
Run the query and view the results.
event_log_data = event_log_query.results() # Run the query
event_log_data.to_dataframe() # Save the query to a dataframe and view the results
a_date | a_fullVisitorId | a_totals_transactionRevenue | a_totals_transactions | b_fullVisitorId | |
---|---|---|---|---|---|
0 | 20140101 | 5017790038239644653 | 1000000 | 1 | 5017790038239644653 |
1 | 20140101 | 8030597587931533103 | 1000000 | 6 | 8030597587931533103 |
2 | 20140101 | 8007084424397327094 | 4000000 | 3 | 8007084424397327094 |
3 | 20140101 | 7889612318617925253 | 15000000 | 1 | 7889612318617925253 |
4 | 20140101 | 7738341083666717494 | 25000000 | 1 | 7738341083666717494 |
5 | 20140101 | 7525652009656248795 | 30000000 | 2 | 7525652009656248795 |
6 | 20140101 | 8610629730095560978 | 150000000 | 1 | 8610629730095560978 |
7 | 20140101 | 8566403803300969597 | 275000000 | 1 | 8566403803300969597 |
8 | 20140101 | 2737390106049798924 | 299000000 | 1 | 2737390106049798924 |
9 | 20140101 | 2685613046935863042 | 100000000 | 2 | 2685613046935863042 |
10 | 20140101 | 2650073639940285657 | 150000000 | 4 | 2650073639940285657 |
11 | 20140101 | 3456865916325944274 | 99000000 | 7 | 3456865916325944274 |
12 | 20140101 | 3456865916325944274 | 10000000 | 4 | 3456865916325944274 |
13 | 20140101 | 3882569992235139113 | 525000000 | 2 | 3882569992235139113 |
14 | 20140101 | 3914796626505639481 | 1000000 | 2 | 3914796626505639481 |
15 | 20140101 | 4995458969964415911 | 1000000 | 50 | 4995458969964415911 |
16 | 20140101 | 1259414691818766528 | 50000000 | 2 | 1259414691818766528 |
17 | 20140101 | 5078320799045540451 | 240000000 | 1 | 5078320799045540451 |
18 | 20140101 | 889890209775498982 | 10000000 | 12 | 889890209775498982 |
19 | 20140101 | 947174447755019932 | 51000000 | 2 | 947174447755019932 |
20 | 20140101 | 1057455412481369266 | 199000000 | 1 | 1057455412481369266 |
21 | 20140101 | 1081016026814234084 | 10000000 | 1 | 1081016026814234084 |
22 | 20140101 | 1150796867278450940 | 49000000 | 1 | 1150796867278450940 |
23 | 20140101 | 4417026909276501138 | 20000000 | 5 | 4417026909276501138 |
24 | 20140101 | 4718954971222246310 | 129000000 | 2 | 4718954971222246310 |
25 | 20140101 | 4718954971222246310 | 79000000 | 1 | 4718954971222246310 |
26 | 20140101 | 4757220853489744229 | 10000000 | 7 | 4757220853489744229 |
27 | 20140101 | 6487324317020613085 | 199000000 | 1 | 6487324317020613085 |
28 | 20140101 | 6508187150725671277 | 5000000 | 4 | 6508187150725671277 |
29 | 20140101 | 6553405867780627589 | 1000000 | 1 | 6553405867780627589 |
... | ... | ... | ... | ... | ... |
5925 | 20140529 | 2123940850581287851 | 17000000 | 1 | 2123940850581287851 |
5926 | 20140529 | 2123940850581287851 | 39000000 | 2 | 2123940850581287851 |
5927 | 20140529 | 2123940850581287851 | 30000000 | 2 | 2123940850581287851 |
5928 | 20140529 | 2123940850581287851 | 30000000 | 2 | 2123940850581287851 |
5929 | 20140529 | 8812763329271802279 | 2000000 | 2 | 8812763329271802279 |
5930 | 20140529 | 1687926679891894827 | 99000000 | 1 | 1687926679891894827 |
5931 | 20140529 | 9211499580406574233 | 137000000 | 9 | 9211499580406574233 |
5932 | 20140529 | 7650455684740741189 | 25000000 | 1 | 7650455684740741189 |
5933 | 20140529 | 8179477440921671967 | 49000000 | 1 | 8179477440921671967 |
5934 | 20140529 | 8993124317588647958 | 2000000 | 5 | 8993124317588647958 |
5935 | 20140529 | 8993124317588647958 | 20000000 | 1 | 8993124317588647958 |
5936 | 20140529 | 8993124317588647958 | 20000000 | 1 | 8993124317588647958 |
5937 | 20140529 | 8993124317588647958 | 20000000 | 1 | 8993124317588647958 |
5938 | 20140529 | 5931510834014662770 | 1000000 | 1 | 5931510834014662770 |
5939 | 20140529 | 1679842512074369707 | 15000000 | 9 | 1679842512074369707 |
5940 | 20140530 | 5766885201415567301 | 2000000 | 2 | 5766885201415567301 |
5941 | 20140530 | 8993124317588647958 | 2000000 | 1 | 8993124317588647958 |
5942 | 20140530 | 9211499580406574233 | 10000000 | 9 | 9211499580406574233 |
5943 | 20140530 | 9211499580406574233 | 58000000 | 1 | 9211499580406574233 |
5944 | 20140531 | 2123940850581287851 | 59000000 | 9 | 2123940850581287851 |
5945 | 20140531 | 557356876586168799 | 5000000 | 1 | 557356876586168799 |
5946 | 20140531 | 9211499580406574233 | 10000000 | 3 | 9211499580406574233 |
5947 | 20140531 | 8812763329271802279 | 15000000 | 2 | 8812763329271802279 |
5948 | 20140531 | 4186788749218768802 | 5000000 | 1 | 4186788749218768802 |
5949 | 20140531 | 2430553514951030514 | 2000000 | 4 | 2430553514951030514 |
5950 | 20140601 | 1963045431437364100 | 15000000 | 5 | 1963045431437364100 |
5951 | 20140601 | 1963045431437364100 | 1000000 | 1 | 1963045431437364100 |
5952 | 20140601 | 9211499580406574233 | 5000000 | 4 | 9211499580406574233 |
5953 | 20140601 | 8566403803300969597 | 5000000 | 1 | 8566403803300969597 |
5954 | 20140601 | 8812763329271802279 | 137000000 | 1 | 8812763329271802279 |
5955 rows × 5 columns
Now that the data needed for the model is loaded into the notebook, R is used to build a model of customer behavior. This model will then be used to calculate CLV and related metrics. The BTYD R package will be used to run the Pareto/NBD Buy 'Til You Die (BTYD) model first introduced by Schmittlein, Morrison, and Colombo and later popularized by Fader, Hardie, and Lee. The Pareto/NBD model is very popular in academic marketing literature due to its applicability in areas such as customer retention, churn, and lifetime value. Similarly, due to increased availability of data and computational resources, these techniques have seen increased use and attention among businesses.
If you do not already have R and rpy2 installed, as well as the BTYD package, do so now by following the "Setup for Calculating Customer Lifetime Value using BigQuery Export for Google Analytics" notebook.
Load rpy2 and the BTYD package into this notebook.
%load_ext rpy2.ipython
%%R
library(BTYD)
Loading required package: hypergeo
In order to run the BTYD models, import the data from BigQuery into R.
import pandas
df = pandas.DataFrame(event_log_data)
%Rpush df
The code below prepares the data for the Pareto/NBD, runs the model, and then prints out the parameter estimates. Running the model may take several minutes.
%%R -i start_date,end_date,cohort_end_date,end_of_cal_period
elog <- data.frame(cust = df$a_fullVisitorId, date = df$a_date, sales = df$a_totals_transactionRevenue)
elog$sales <- elog$sales / 10^6
# format dates
elog$date <- as.Date(elog$date, "%Y%m%d")
# set key dates
end.of.cal.period <- as.Date(end_of_cal_period)
T.end <- max(elog$date)
T.star <- as.numeric(T.end - end.of.cal.period)
T.tot <- as.numeric(max(elog$date) - min(elog$date))
dataset <- dc.ElogToCbsCbt(elog, per="day",
T.cal=end.of.cal.period)
# estimate model
cal.cbs <- dataset[["cal"]][["cbs"]]
params <- pnbd.EstimateParameters(cal.cbs)
print(params)
Started making CBS and CBT from the ELOG... ...Completed Freq CBT Finished filtering out customers not in the birth period. No dissipation requested. Started merging same-date transactions... ... Finished merging same-date transactions. Started Creating Repeat Purchases Finished Creating Repeat Purchases Started Building CBS and CBT for calibration period... Started Building CBT... ...Completed Freq CBT Started Building CBT... ...Completed Freq CBT Started making calibration period CBS... Finished building CBS. Finished building CBS and CBT for calibration period. Started building CBS and CBT for holdout period... Started Building CBT... ...Completed Freq CBT Started making holdout period CBS... Finished building CBS. Finished building CBS and CBT for holdout. ...Finished Making All CBS and CBT [1] 0.5417495 5.9029028 0.8386304 4.8032654
Before using the model to make forecasts, model fit should be assessed. This can be done using the Mean Absolute Percentage Error (MAPE).
%%R
mape <- function(actual, forecasted) {
n <- length(actual)
(1/n) * sum(abs((actual - forecasted) / actual))
}
%%R
# calculate incremental and cumulative vectors
T.cal <- cal.cbs[,3]
actual.inc.tracking.data <- cbind(dataset[['cal']][['cbt']],
dataset[['holdout']][['cbt']])
actual.inc.tracking.data <- apply(actual.inc.tracking.data, 2, sum)
actual.cum.tracking.data <- cumsum(actual.inc.tracking.data)
expected.cum.trans <- pnbd.ExpectedCumulativeTransactions(params, T.cal, T.tot, length(actual.cum.tracking.data))
# find end of calibration date
end.of.cal.day <- as.numeric(end.of.cal.period - min(elog$date))
total.mape <- mape(actual.cum.tracking.data, expected.cum.trans)
in.sample.mape <- mape(actual.cum.tracking.data[1:end.of.cal.day], expected.cum.trans[1:end.of.cal.day])
out.of.sample.mape <- mape(actual.cum.tracking.data[end.of.cal.day:T.tot], expected.cum.trans[end.of.cal.day:T.tot])
mape.outputs <- data.frame("MAPE"=c(in.sample.mape, out.of.sample.mape, total.mape))
row.names(mape.outputs) <- c("In sample", "Out of sample", "Total")
print(mape.outputs)
MAPE In sample 0.05495860 Out of sample 0.01107181 Total 0.03221948
While there are no steadfast rules for assessing model fit, a general heuristic is a MAPE of less than 10% is good, between 10-20% is cause for concern, and above 20% is bad.
At this point, if you are confident in the model fit based on the MAPE calculation, you may continue to the Forecasting Spend and then the Compute Model Outputs section. If you are not confident in the model fit or would like to dig deeper into the model, continue to the Analyze the Model section below.
Now that the parameters of the Pareto/NBD have been estimated, they can be used to make projections. For example, the code below prints the expected number of transactions for a new customer in one year.
%R print(round(pnbd.Expectation(params, t=365), digits=1))
[1] 2.8
Model fit can also be assessed graphically. The graph below is a histogram of repeat transactions.
%%R
# histogram
pnbd.PlotFrequencyInCalibration(params, cal.cbs, 7)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1093.000 252.0000 114.0000 54.00000 38.0000 29.00000 20.00000 89.00000 [2,] 1096.556 237.8858 106.1921 60.83081 39.6089 27.86635 20.63852 99.42124
The following two graphs show cumulative and incremental transactions, respectively, for both the data and as fit by the model. The vertical line separates the calibration period data from the holdout data.
%%R
# cum tracking
cum.tracking <- pnbd.PlotTrackingCum(params, T.cal, T.tot,
actual.cum.tracking.data, xlab="Day")
%%R
# inc tracking
inc.tracking <- pnbd.PlotTrackingInc(params, T.cal, T.tot,
actual.inc.tracking.data, xlab="Day", title="Tracking Daily Transactions")
As a final graphical assessment of model fit, the graph below plots actual transactions against conditional expected transactions made by customers in the holdout period, binned by how many transactions the customers made in the calibration period.
%%R
x.star <- dataset$holdout$cbs[,1]
censor <- 7
cond.expectation <- pnbd.PlotFreqVsConditionalExpectedFrequency(params,
T.star, cal.cbs, dataset$holdout$cbs[,1], censor)
cond.expectation
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 7.319305e-02 0.2460317 0.5964912 0.7777778 1.421053 1.137931 [2,] 3.485406e-02 0.2167215 0.4977087 1.0236040 1.608808 2.090292 [3,] 1.093000e+03 252.0000000 114.0000000 54.0000000 38.000000 29.000000 [,7] [,8] [1,] 0.200000 6.359551 [2,] 0.969861 8.028275 [3,] 20.000000 89.000000
It is illustrative to view a heatmap of the conditional expected transactions for different recency and frequency values.
In the heatmaps below, red is bad and bright yellow is good. People who have purchased both recently and frequently are the best customers.
The P(Alive) heatmap illustrates the "increasing frequency paradox", whereby a given customer who has purchased more frequently may actually be less likely to be alive.
%%R
heatmap.palive.data <- matrix(NA, nrow=10, ncol=200)
heatmap.cet.data <- matrix(NA, nrow=10, ncol=200)
for(i in 1:10) {
heatmap.cet.data[i,] <- pnbd.ConditionalExpectedTransactions(params, T.star=365, x=i, t.x=1:200, T.cal=200)
heatmap.palive.data[i,] <- pnbd.PAlive(params, x=i, t.x=1:200, T.cal=200)
}
image(heatmap.palive.data, axes=FALSE, xlab="Number of Transactions", ylab="Days since last transaction", main="P(Alive) by Recency and Frequency")
axis(1, at=seq(0,1,.1), labels=0:10)
axis(2, at=seq(0,1,.1), labels=seq(200,0,-20))
image(heatmap.cet.data, axes=FALSE, xlab="Number of Transactions", ylab="Days since last transaction", main="Conditional Expected Transactions by Recency and Frequency")
axis(1, at=seq(0,1,.1), labels=0:10)
axis(2, at=seq(0,1,.1), labels=seq(200,0,-20))
In order to complete the Customer Lifetime Value calculation, it is necessary to build a model to forecast how much each customer will spend per transaction. The Gamma / Gamma model to forecast average spend per customer is run below.
%%R
#cal.cbt <- dc.CreateSpendCBT(elog, is.avg.spend = FALSE)
cal.cbt <- dc.ElogToCbsCbt(elog, per="day", T.cal=end.of.cal.period, cohort.birth.per=as.Date(cohort_end_date), statistic="total.spend")
cal.cbs <- cal.cbt$cal$cbs
cal.cbt <- cal.cbt$cal$cbt
calculateAvgSpend <- function(cbt.row) {
purchaseCols = which(cbt.row != 0)
sum(cbt.row[purchaseCols]) / length(purchaseCols)
}
m.x <- apply(cal.cbt, 1, calculateAvgSpend)
m.x[which(is.na(m.x))] <- 0
spendParams <- spend.EstimateParameters(m.x, cal.cbs[,1])
print(spendParams)
Started making CBS and CBT from the ELOG... ...Completed Freq CBT Finished filtering out customers not in the birth period. No dissipation requested. Started merging same-date transactions... ... Finished merging same-date transactions. Started Creating Repeat Purchases Finished Creating Repeat Purchases Started Building CBS and CBT for calibration period... Started Building CBT... ...Completed Spend CBT Started Building CBT... ...Completed Spend CBT Started making calibration period CBS... Finished building CBS. Finished building CBS and CBT for calibration period. Started building CBS and CBT for holdout period... Started Building CBT... ...Completed Spend CBT Started making holdout period CBS... Finished building CBS. Finished building CBS and CBT for holdout. ...Finished Making All CBS and CBT [1] 0.1697479 1.1257365 152.2140630
%%R
expected.spend <- spend.plot.average.transaction.value(spendParams, m.x, cal.cbs[,1], title = "Actual vs. Expected Avg Transaction Value Across Customers")
%%R
mape(m.x[which(m.x > 0)], expected.spend)
[1] 0.9892284
Once confident in model fit, the model must be rerun using all of the data. In the previous section, the data from the holdout period was used to assess model fit. In order to generate a true forecast, you must run the model on all of the data. Running the model may take a few minutes.
%%R
end.of.cal.period <- as.Date(end_date)
dataset <- dc.ElogToCbsCbt(elog, per="day",
T.cal=end.of.cal.period)
# estimate model
cal.cbs <- dataset[["cal"]][["cbs"]]
params <- pnbd.EstimateParameters(cal.cbs)
print(params)
Started making CBS and CBT from the ELOG... ...Completed Freq CBT Finished filtering out customers not in the birth period. No dissipation requested. Started merging same-date transactions... ... Finished merging same-date transactions. Started Creating Repeat Purchases Finished Creating Repeat Purchases Started Building CBS and CBT for calibration period... Started Building CBT... ...Completed Freq CBT Started Building CBT... ...Completed Freq CBT Started making calibration period CBS... Finished building CBS. Finished building CBS and CBT for calibration period. ...Finished Making All CBS and CBT [1] 0.5085131 6.2411749 0.9268665 6.5166227
If you'd like, you can analyze the parameters of the new model by returning to the Analyze the Model section. Otherwise, continue below and calculate some key outputs of the model, which will later be imported back into BigQuery for further analysis.
In addition to forecasting how many times a new customer will purchase in the next year, you can use the model to forecast how many times each existing customer will purchase in the next year, conditional on their past behavior. You can also compute the probability that they are still alive (p_Alive). Both the expected transactions and probability of being alive for each customer are computed below, conditional on their observed behavior.
%%R
x <- cal.cbs[, "x"]
t.x <- cal.cbs[, "t.x"]
T.cal <- cal.cbs[, "T.cal"]
d <- .15 # discount rate to be divided by 365, as we are running the model on daily data
discounted_expected_transactions <- pnbd.DERT(params, x, t.x, T.cal, d / 365)
p_Alive <- pnbd.PAlive(params, x, t.x, T.cal)
output <- data.frame(visitorId=dataset$cust.data$cust, discounted_expected_transactions=discounted_expected_transactions, p_Alive=p_Alive)
head(output)
visitorId discounted_expected_transactions p_Alive 1 5177747204596837269 0.020091172 0.004347853 2 871901349141564204 0.078373790 0.016960571 3 1868101133499893294 0.023218149 0.014299225 4 8368278366398443575 0.023218149 0.014299225 5 8656784264702200562 0.023218149 0.014299225 6 5149525721801940630 0.006224129 0.001346940
Finally, compute the expected spend per transaction and combine this with the expected transactions for a projected future revenue number.
%%R
expected_spend <- spend.expected.value(spendParams, m.x, cal.cbs[,1])
output$expected_spend <- expected_spend
output$projected_future_revenue <- output$expected_spend * output$discounted_expected_transactions
output[1:20,]
visitorId discounted_expected_transactions p_Alive 1 5177747204596837269 0.020091172 0.0043478534 2 871901349141564204 0.078373790 0.0169605714 3 1868101133499893294 0.023218149 0.0142992255 4 8368278366398443575 0.023218149 0.0142992255 5 8656784264702200562 0.023218149 0.0142992255 6 5149525721801940630 0.006224129 0.0013469399 7 2811328900058349109 0.023218149 0.0142992255 8 8408928092961375382 0.023218149 0.0142992255 9 5237025793460417264 0.023218149 0.0142992255 10 3095251960883073410 0.006224129 0.0013469399 11 1583858084726727557 0.306114199 0.0119569411 12 1260552643402035925 0.023218149 0.0142992255 13 1145004713037811602 0.023218149 0.0142992255 14 8345286530289651621 0.012017819 0.0026007300 15 8795080974681952023 0.023218149 0.0142992255 16 4873862033428478188 0.020091172 0.0043478534 17 74808852692086562 0.023218149 0.0142992255 18 7399414240697899145 0.003234085 0.0004245238 19 318277560721229411 0.042914586 0.0056332059 20 6771626816731085183 0.023218149 0.0142992255 expected_spend projected_future_revenue 1 88.01738 1.7683723 2 660.76719 51.7868292 3 205.49331 4.7711743 4 205.49331 4.7711743 5 205.49331 4.7711743 6 234.50805 1.4596084 7 205.49331 4.7711743 8 205.49331 4.7711743 9 205.49331 4.7711743 10 90.31527 0.5621339 11 19.01605 5.8210828 12 205.49331 4.7711743 13 205.49331 4.7711743 14 173.61389 2.0864602 15 205.49331 4.7711743 16 140.29444 2.8186797 17 205.49331 4.7711743 18 56.26761 0.1819742 19 606.48696 26.0271369 20 205.49331 4.7711743
The output table above can now be imported into BigQuery and analyzed alongside other Google Analytics data.
First, pull the output dataframe created in R back into the notebook.
%Rpull output
output[:5]
visitorId | discounted_expected_transactions | p_Alive | expected_spend | projected_future_revenue | |
---|---|---|---|---|---|
0 | 5177747204596837269 | 0.020091 | 0.004348 | 88.017378 | 1.768372 |
1 | 871901349141564204 | 0.078374 | 0.016961 | 660.767190 | 51.786829 |
2 | 1868101133499893294 | 0.023218 | 0.014299 | 205.493312 | 4.771174 |
3 | 8368278366398443575 | 0.023218 | 0.014299 | 205.493312 | 4.771174 |
4 | 8656784264702200562 | 0.023218 | 0.014299 | 205.493312 | 4.771174 |
Fill any missing values with zero.
output.fillna(value=0, inplace=True)
Create a schema for the table infered from the dataframe.
schema = bq.schema(output)
schema
name | data_type | mode | description |
---|---|---|---|
projected_future_revenue | FLOAT | NULLABLE | |
discounted_expected_transactions | FLOAT | NULLABLE | |
visitorId | STRING | NULLABLE | |
p_Alive | FLOAT | NULLABLE | |
expected_spend | FLOAT | NULLABLE |
Now create a table with the schema that was just created.
If the table exists, you can recreate it with the truncate=True
parameter (remove this to create a new table even if it already exists). Likewise for creating a DataSet that will contain the table.
bq.dataset('outputs').create()
table = bq.table('outputs.jan_june_2014').create(schema, truncate=True)
Finally, populate the table with data from the dataframe. This uses the BigQuery streaming insert API to stream in rows from the pandas dataframe into BigQuery.
table.insertAll(output)
To confirm the insert, sample the newly created and populated table.
It can take some time for BigQuery to process the newly inserted data and make it available to be queried (hence the added sleep).
import time
time.sleep(2)
table.sample()
projected_future_revenue | discounted_expected_transactions | visitorId | p_Alive | expected_spend |
---|---|---|---|---|
5.2427016479 | 0.0255127605 | 6367110080781788249 | 0.0153688402 | 205.493311627 |
4.940105496 | 0.0240402252 | 8396673460344463517 | 0.0146848711 | 205.493311627 |
5.3717321056 | 0.0261406664 | 584013359193549091 | 0.0156579647 | 205.493311627 |
0.7338994155 | 0.0072088927 | 2831918728462817167 | 0.0015259381 | 101.804735549 |
5.505683168 | 0.0267925176 | 7554959204332885431 | 0.0159565886 | 205.493311627 |
The data can now be queried and joined with the rest of your Google Analytics data in BigQuery.
The output table can now be used for various analyses:
The following query lists average Discounted Expected Transactions by the campaign (trafficSource.campaign
) that first brought the customer to the site or app.
%%bigquery sql det_by_campaign
SELECT
trafficSource.campaign,
round(avg(discounted_expected_transactions), 2)
FROM (
SELECT
date,
$visitor_id,
trafficSource.campaign
FROM
TABLE_DATE_RANGE([$cloud_project_number:$ga_view_id.ga_sessions_],
TIMESTAMP($start_date),
TIMESTAMP($cohort_end_date))
WHERE
totals.newVisits IS NOT NULL
) a
JOIN (
SELECT
visitorId,
discounted_expected_transactions
FROM
[outputs.jan_june_2014]
) b
ON
a.$visitor_id=b.visitorId
GROUP BY
trafficSource.campaign
ORDER BY
2 DESC
print det_by_campaign.sql
SELECT trafficSource.campaign, round(avg(discounted_expected_transactions), 2) FROM ( SELECT date, fullVisitorId, trafficSource.campaign FROM TABLE_DATE_RANGE([XXXXX:YYYYY.ga_sessions_], TIMESTAMP("2014-01-01"), TIMESTAMP("2014-01-15")) WHERE totals.newVisits IS NOT NULL ) a JOIN ( SELECT visitorId, discounted_expected_transactions FROM [outputs.jan_june_2014] ) b ON a.fullVisitorId=b.visitorId GROUP BY trafficSource.campaign ORDER BY 2 DESC
det_by_campaign.results()
Additionally, you can rank your customers:
%%bigquery sql
SELECT
*
FROM
[outputs.jan_june_2014]
ORDER BY
projected_future_revenue DESC
LIMIT
100
projected_future_revenue | discounted_expected_transactions | visitorId | p_Alive | expected_spend |
---|---|---|---|---|
22822.2919336 | 163.072369334 | 973780889686425985 | 1.0 | 139.951924577 |
8689.76721412 | 49.7137261223 | 1448190165300419034 | 1.0 | 174.796135634 |
8089.81551522 | 208.272739391 | 8146441991291096941 | 0.9824067265 | 38.8424118244 |
6960.04849626 | 85.5454952318 | 1346208889874163371 | 1.0 | 81.3607832581 |
6909.14201651 | 22.059668005 | 8954156796559126629 | 0.9518749335 | 313.202447786 |
6391.30677304 | 16.1936648574 | 4027033553698757889 | 0.5585310462 | 394.679452076 |
5129.93103465 | 188.576544525 | 2087178421607889248 | 1.0 | 27.2034416983 |
4670.54067969 | 41.36917117 | 5561925934643570438 | 0.9465973519 | 112.899063423 |
4607.22695823 | 50.7417640692 | 2701376289868868742 | 0.9519690726 | 90.7975322251 |
3740.4293766 | 178.458872942 | 432249220894065055 | 1.0 | 20.9596155962 |
3642.67507377 | 31.9368960524 | 3026198760894044147 | 0.9632775825 | 114.058519268 |
3565.63054058 | 50.5194661524 | 4247563820652373734 | 0.5643595174 | 70.5793392556 |
2992.39258762 | 98.8304400928 | 6814524778285651728 | 0.9855235527 | 30.2780457601 |
2977.40110619 | 34.5744908773 | 8112456546447613513 | 0.5790246305 | 86.1155444561 |
2832.6997274 | 30.2573994023 | 4160561544311311467 | 0.9260702902 | 93.620065946 |
2564.64918175 | 26.3581962634 | 639415551093729385 | 0.7613023018 | 97.2998742449 |
2432.00143946 | 27.7964440324 | 7761237468422939860 | 0.6202828271 | 87.4932576494 |
2158.61482783 | 30.3106452496 | 232107777890649684 | 0.5912059381 | 71.2163931206 |
2132.43920651 | 54.9440945794 | 2882844997504928135 | 0.7347442033 | 38.8110719238 |
2102.02946164 | 46.7074373789 | 23890503570489015 | 0.9470157839 | 45.004170205 |
1892.88289636 | 160.064711316 | 8556718407239640888 | 0.9838598104 | 11.8257352342 |
1517.81745673 | 98.5717718107 | 7670466641746983080 | 0.894826438 | 15.3980944935 |
1274.27472387 | 55.3308552169 | 5180388235822577527 | 0.9188002763 | 23.0300926828 |
1164.9686515 | 13.0397710118 | 7512264570814647539 | 0.6300207153 | 89.339655615 |
1029.91746028 | 11.7781707187 | 6051563673972803942 | 1.0 | 87.4429047498 |
1020.1048999 | 11.1077117623 | 5307331384439670089 | 0.6200648792 | 91.8375378952 |
987.249420453 | 29.5271413435 | 6520324474901726146 | 0.3926431712 | 33.4353200321 |
966.90896404 | 32.2309001707 | 5594129195724172458 | 0.7081152654 | 29.9994402551 |
947.562226559 | 9.6470100533 | 8148679493167717528 | 0.2681181286 | 98.2234102926 |
763.456231736 | 5.940911953 | 8759838332773748914 | 0.539639666 | 128.508255597 |
744.28644047 | 92.2151046858 | 8097915810112978440 | 0.4749496608 | 8.0711987803 |
663.044532448 | 34.5143581315 | 6147240646085737101 | 0.973153118 | 19.2106870399 |
657.110364662 | 4.5089977109 | 8549638067756568638 | 0.3212413061 | 145.733133347 |
632.841456021 | 7.6335261473 | 1350557531382740441 | 0.9377645329 | 82.9029001555 |
611.299064406 | 2.0678519336 | 5313811588146274193 | 0.2647522822 | 295.620326807 |
609.301266741 | 2.9650661713 | 1206726236350085204 | 0.7304810219 | 205.493311627 |
604.548679831 | 5.2278832686 | 8866922596291283429 | 0.3670788092 | 115.639284346 |
604.536961043 | 8.9226037444 | 8822969636149864445 | 0.3408999518 | 67.7534247131 |
600.036844954 | 14.4611439824 | 8756341907271364135 | 0.8570719153 | 41.4930413309 |
565.04976877 | 7.9398355369 | 2288129847777631722 | 0.7253843985 | 71.1664323708 |
564.906612628 | 6.9041441396 | 2753015974462392416 | 0.4065791685 | 81.8213816522 |
551.510692806 | 24.4233806014 | 3328874598121181772 | 0.9786042431 | 22.5812594008 |
541.099965987 | 13.2977611147 | 2696691412564814741 | 0.9666719962 | 40.6910577893 |
531.493880215 | 9.5699346876 | 8285822360184666751 | 0.9378220826 | 55.5378795743 |
529.848287026 | 23.4640715835 | 4793786268025713476 | 1.0 | 22.5812594008 |
522.245375252 | 8.3113481354 | 1245183828784782831 | 0.9859232763 | 62.835218396 |
521.694823783 | 2.538743571 | 52397172176561414 | 0.6379484539 | 205.493311627 |
521.167055339 | 16.8828408302 | 6538629281919960102 | 0.9942161182 | 30.8696303294 |
518.930593389 | 12.7529393823 | 2554055647449356373 | 1.0 | 40.6910577893 |
514.528952384 | 4.8152205988 | 3453413582959396247 | 0.4085710688 | 106.854699973 |
513.434349485 | 7.6388436835 | 2582785894872606989 | 1.0 | 67.213621689 |
502.996749606 | 22.2749644153 | 5022473798860953467 | 0.9922723249 | 22.5812594008 |
501.8766927 | 25.5225401455 | 585465543470709776 | 1.0 | 19.6640573328 |
487.164900483 | 8.7048387959 | 1891929457439464460 | 0.2740114749 | 55.9648388562 |
485.537446821 | 43.8306190029 | 2030210718423412879 | 0.8140669343 | 11.0775858947 |
484.288947817 | 37.3546524756 | 7810193081915533024 | 0.9364742187 | 12.9646219606 |
480.626255638 | 18.1267300525 | 1175418463087949908 | 0.9464449228 | 26.514779789 |
479.20162052 | 30.6658345558 | 5453922256089579074 | 0.9561655121 | 15.6265638115 |
478.993136292 | 30.6524929005 | 4081303446191497751 | 1.0 | 15.6265638115 |
477.969139654 | 11.7462942873 | 8051780950736933753 | 0.9352836813 | 40.6910577893 |
475.25142639 | 23.105940493 | 9157427544173252340 | 0.8639962719 | 20.5683653749 |
473.525862483 | 5.3318937926 | 8556980614996812424 | 0.4899104124 | 88.8100702865 |
472.603390307 | 36.4533105355 | 239755057356520534 | 0.9862038284 | 12.9646219606 |
471.991101315 | 39.5069205321 | 7260957843834274686 | 0.956008508 | 11.9470486426 |
463.657862862 | 7.2686786918 | 7907955526939561361 | 0.1726828812 | 63.7884658987 |
462.98291121 | 23.5446278137 | 3348282228186199514 | 0.9065474459 | 19.6640573328 |
462.320978067 | 19.3306873047 | 1465186725294156708 | 0.916997862 | 23.9164273251 |
462.092170224 | 16.8734532361 | 3716941579262021090 | 0.9880393865 | 27.3857498971 |
457.85537603 | 17.2679305532 | 5508608129049039409 | 0.9639462657 | 26.514779789 |
452.980690946 | 5.9529971262 | 4045994824993976576 | 0.2539406109 | 76.0928791575 |
451.912699839 | 97.2048784364 | 2239193514829931282 | 0.9412787318 | 4.649074276 |
450.968290188 | 8.1200127489 | 8573148529756678623 | 0.7996240154 | 55.5378795743 |
448.6285591 | 19.8672957578 | 7059597055304393047 | 0.957008795 | 22.5812594008 |
447.708881841 | 90.4180297535 | 4897687230632499917 | 0.9468814123 | 4.9515443221 |
446.544370421 | 8.0403568491 | 5336369422104386303 | 0.8255080092 | 55.5378795743 |
446.415202877 | 29.5586147374 | 6556353297689965583 | 0.9329889697 | 15.1027105581 |
438.871238071 | 4.4360732567 | 4526189125217579369 | 0.8325448942 | 98.932369389 |
434.937899549 | 15.9804640784 | 7540355297618155344 | 0.2475094934 | 27.2168503627 |
430.262957593 | 38.8408595234 | 6740204990613572967 | 0.8058064652 | 11.0775858947 |
426.108103298 | 7.6723869648 | 6244393828523734087 | 0.9367262461 | 55.5378795743 |
424.58710485 | 7.6450002792 | 4470908021329903950 | 0.8255070416 | 55.5378795743 |
412.330376519 | 7.4243089524 | 6223822399759817283 | 0.7723220947 | 55.5378795743 |
411.512661825 | 26.3341747289 | 6455901855247949162 | 0.7812373276 | 15.6265638115 |
411.229428587 | 26.3160496157 | 5915226431149561046 | 0.8233093466 | 15.6265638115 |
405.249547571 | 4.6344474572 | 8779326922972989430 | 0.9003446313 | 87.4429047498 |
392.938141551 | 22.5640405721 | 4239286374996574286 | 0.8374307467 | 17.4143518442 |
388.39052585 | 6.9932544927 | 7059391970616204992 | 0.7817292837 | 55.5378795743 |
385.051134998 | 6.9331263266 | 6262348462563098041 | 0.714976213 | 55.5378795743 |
380.003829156 | 4.3457365723 | 260680850986952112 | 0.9120615032 | 87.4429047498 |
378.678294435 | 33.5541783913 | 1908925935814785690 | 0.571331661 | 11.2855779098 |
377.286713742 | 6.7933222628 | 7790272173176969698 | 0.8573429134 | 55.5378795743 |
370.709218184 | 9.1103362342 | 6168283657963837157 | 0.6215231659 | 40.6910577893 |
369.578368312 | 9.1151991478 | 1547227021618445660 | 0.662332851 | 40.5452873073 |
359.475569647 | 8.8342645578 | 1635752903254561432 | 0.7619126596 | 40.6910577893 |
349.844386922 | 3.8320280499 | 1573754517760710526 | 0.4920147206 | 91.2948398004 |
347.029019806 | 3.9686355434 | 8137441479310363673 | 0.8329174213 | 87.4429047498 |
329.920463302 | 18.945319714 | 4406952470770343104 | 0.6417677323 | 17.4143518442 |
325.239141913 | 10.1296118592 | 2176422954346395906 | 0.7216791741 | 32.1077595502 |
322.950274658 | 2.5273626382 | 1540799444135403059 | 0.3317555974 | 127.781533909 |
322.407162731 | 17.5894196838 | 175986913381835565 | 0.6021968767 | 18.3296077146 |
You can then choose your top customers and learn things about them. The example query below lists all of the products purchased by the top 200 customers (as ranked by projected_future_revenue
).
%%bigquery sql
SELECT
top_products,
COUNT(top_products) AS quantity
FROM (
SELECT
$visitor_id,
hits.item.productName AS top_products
FROM
TABLE_DATE_RANGE([$cloud_project_number:$ga_view_id.ga_sessions_],
TIMESTAMP($start_date),
TIMESTAMP($cohort_end_date))
) a
JOIN (
SELECT
visitorId
FROM
[outputs.jan_june_2014]
ORDER BY
projected_future_revenue DESC
LIMIT
200
) b
ON
a.$visitor_id=b.visitorId
GROUP BY
top_products
ORDER BY
quantity DESC;
This notebook is open source and available on GitHub. You may report bugs or feature requests by submitting an issue on GitHub.