用RStudio做“漂亮”的用户旅程分析

说明

这里提供RStudio的两种出图方法,分别用于制作可交互的Sankey图和总揽全局的Chord图。

话不多说,上干货。

方法

事先准备

需先根据每个用户使用App各功能和按钮的时间戳顺序获取表格。

sql如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SELECT 
user_pseudo_id,
event_timestamp,
event_name
FROM `YourAppChartID.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20210801' AND '20210824'
AND app_info. version = "1.0.11"
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name like '%001ToPlay%'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name = 'app_remove'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name = 'first_open'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
GROUP BY
user_pseudo_id,
event_timestamp,
event_name
ORDER BY
user_pseudo_id,
event_timestamp

主体部分其实不多,只是做了一些数据筛选,以确保同一时间用户仅触发了一个数据点,避免分析旅程时的混乱。以下部分都是在根据命名等信息作筛选,可根据自身情况删减。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
_TABLE_SUFFIX BETWEEN '20210801' AND '20210824'
AND app_info. version = "1.0.11"
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name like '%001ToPlay%'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name = 'app_remove'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
AND user_pseudo_id in (
SELECT user_pseudo_id
FROM `YourAppChartID.events_*`
WHERE event_name = 'first_open'
AND _TABLE_SUFFIX BETWEEN '20210801' AND '20210824')
GROUP BY

制作Sankey图

准备好两张分别代表节点和连线信息的表格,在Rstuido内输入:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
library(networkD3)

links <- read.table('D:\\Rdata\\sankey0501Archive_links.csv', header=T, sep=',')
nodes <- read.table('D:\\Rdata\\sankey0501Archive_nodes.csv', header=T, sep=',')

setcolor <- 'd3.scaleOrdinal().domain(["B&E","Play","Expect","Ad","Revive","Exit","Download"]).range(["#9ADBC5","#A0DDE0","#DFDD6C","#FE8D6F","#FDC453","#F886A8","#FE8D6F"])'

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
Target = 'target', Value = 'value', NodeID = 'name',
colourScale = setcolor, LinkGroup = 'linkgroup', NodeGroup = 'nodegroup',
sinksRight = FALSE,
#fontFamily = 'futura', fontSize = 10, nodeWidth = 20,
height = 1080, width =1920
)

这里需要替换为事先准备好的表格的位置:

1
2
links <- read.table('D:\\Rdata\\sankey0501Archive_links.csv', header=T, sep=',')
nodes <- read.table('D:\\Rdata\\sankey0501Archive_nodes.csv', header=T, sep=','

这里是在定义数据分类和对应颜色:

1
setcolor <- 'd3.scaleOrdinal().domain(["B&E","Play","Expect","Ad","Revive","Exit","Download"]).range(["#9ADBC5","#A0DDE0","#DFDD6C","#FE8D6F","#FDC453","#F886A8","#FE8D6F"])'

看看效果:

最左侧为版本玩家总数,最右侧为最终卸载数,中间为各阶段玩家流失具体情况。

制作Chord图

这里只需要准备一张表格即可,数据分类需要根据条目命名来区分,故准备表格时需要在取名字前缀上下功夫:

1
2
3
4
5
library(chorddiag)

mat <- read.table('D:\\Rdata\\chord0405ExpertMdx_links.csv', header=T, sep=',')

chorddiag(mat, type = "bipartite", showTicks = F, groupnameFontsize = 14, groupnamePadding = 10, margin = 90)

同样替换表格位置:

1
mat <- read.table('D:\\Rdata\\chord0405ExpertMdx_links.csv', header=T, sep=',')

看看效果:

左下的部分为各关卡的继续游玩和流失情况,右上为总玩家、新玩家以及卸载情况,环绕四周的有下载新关卡、复活、结算等关键节点。


用RStudio做“漂亮”的用户旅程分析
https://rainyiris.cc/2021/02/15/RStudio-Analytics-Google/
作者
姚麟
发布于
2021年2月15日
许可协议