从数据到图表
有什么样的数据做什么样的图
作者提供了一张树状图,帮助并引导我们找到合适自己数据的的可视化方式
What kind of data do you have? Pick the main type using the buttons below.
Then let the decision tree guide you toward your graphic possibilities.
这便是有名的网站:https://www.data-to-viz.com/。
Yan Holtz 和Conor Healys两个人关系很好,一起在业余时间开发了这个网站。基于R和Python做的源代码,这里我们不仅可以得到大量优秀的源代码,同时我们可以得到一张决策树,用于知道如何使用代码。这两个人相当厉害了,不仅仅给大家了工具,还叫大家如何使用。作为无私的分享,如果对大家有用,请在文章中致谢他们。如果我们需要交流代码,和谁交流呢?那必须是Yan Holtz,这位主要负责代码部分。Conor Healys负责图形设计工作。
可视化架构
原图地址:https://www.data-to-viz.com/img/poster/poster_big.png
基于网站我们来做一个示例
基于有顺序的二维数据框的出图
# Libraries library(tidyverse) ## -- Attaching packages ----------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 -- ## √ ggplot2 3.2.0 √ purrr 0.3.2 ## √ tibble 2.1.3 √ dplyr 0.8.3 ## √ tidyr 0.8.3 √ stringr 1.4.0 ## √ readr 1.3.1 √ forcats 0.4.0 ## -- Conflicts -------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library(hrbrthemes) ## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes. ## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and ## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow library(plotly) ## ## Attaching package: 'plotly' ## The following object is masked from 'package:ggplot2': ## ## last_plot ## The following object is masked from 'package:stats': ## ## filter ## The following object is masked from 'package:graphics': ## ## layout library(patchwork) # install.packages("babynames") library(babynames) library(viridis) ## Loading required package: viridisLite # ?as.Date # Load dataset from github data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T) data$date <- as.Date(data$date)
这里仅仅提取最后的十个数据进行点线图的可视化
# Plot data %>% tail(10) %>% ggplot( aes(x=date, y=value)) + geom_line(color="#69b3a2") + geom_point(color="#69b3a2", size=4) + ggtitle("Evolution of Bitcoin price") + ylab("bitcoin price ($)") + theme_ipsum()
这里使用最后的60个数据进行可视化
# Plot p1 <- data %>% tail(60) %>% ggplot( aes(x=date, y=value)) + geom_line(color="#69b3a2") + ggtitle("Line chart") + ylab("bitcoin price ($)") + theme_ipsum() p2 <- data %>% tail(60) %>% ggplot( aes(x=date, y=value)) + geom_line(color="#69b3a2") + geom_point(color="#69b3a2", size=2) + ggtitle("Connected scatterplot") + ylab("bitcoin price ($)") + theme_ipsum() p = p1 + p2 p
散点图展示时间序列
# Plot data %>% tail(60) %>% ggplot( aes(x=date, y=value)) + geom_point(color="#69b3a2", size=2) + ggtitle("Line chart") + ylab("bitcoin price ($)") + theme_ipsum()
分组时间序列可视化
library(babynames) # Load dataset data <- babynames %>% filter(name %in% c("Ashley", "Amanda")) %>% filter(sex=="F") #plot data %>% ggplot( aes(x=year, y=n, group=name, color=name)) + geom_line() + scale_color_viridis(discrete = TRUE, name="") + theme(legend.position="none") + ggtitle("Popularity of American names in the previous 30 years") + theme_ipsum()
geom_segment函数突出展示变化趋势
library(grid) # needed for arrow function library(ggrepel) # data tmp <- data %>% filter(year>1970) %>% select(year, name, n) %>% spread(key = name, value=n, -1) # data for date tmp_date <- tmp %>% sample_frac(0.3) tmp%>% ggplot(aes(x=Amanda, y=Ashley, label=year)) + geom_point(color="#69b3a2") + geom_text_repel(data=tmp_date) + geom_segment(color="#69b3a2", aes( xend=c(tail(Amanda, n=-1), NA), yend=c(tail(Ashley, n=-1), NA) ), arrow=arrow(length=unit(0.3,"cm")) ) + theme_ipsum()
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T) data$date <- as.Date(data$date) p1 <- data %>% tail(10) %>% ggplot( aes(x=date, y=value)) + geom_line(color="#69b3a2") + geom_point(color="#69b3a2", size=4) + ggtitle("Not cuting") + ylab("bitcoin price ($)") + theme_ipsum() + ylim(0,10000) p2 <- data %>% tail(10) %>% ggplot( aes(x=date, y=value)) + geom_line(color="#69b3a2") + geom_point(color="#69b3a2", size=4) + ggtitle("Cuting") + ylab("bitcoin price ($)") + theme_ipsum() p1 + p2
reference
https://www.data-to-viz.com/graph/connectedscatter.html