从数据到图表
有什么样的数据做什么样的图
作者提供了一张树状图,帮助并引导我们找到合适自己数据的的可视化方式
What kind of data do you have? Pick the main type using the buttons below.
Then let the decision tree guide you toward your graphic possibilities.
这便是有名的网站:https://www.data-to-viz.com/。
Yan Holtz 和Conor Healys两个人关系很好,一起在业余时间开发了这个网站。基于R和Python做的源代码,这里我们不仅可以得到大量优秀的源代码,同时我们可以得到一张决策树,用于知道如何使用代码。这两个人相当厉害了,不仅仅给大家了工具,还叫大家如何使用。作为无私的分享,如果对大家有用,请在文章中致谢他们。如果我们需要交流代码,和谁交流呢?那必须是Yan Holtz,这位主要负责代码部分。Conor Healys负责图形设计工作。
可视化架构
原图地址:https://www.data-to-viz.com/img/poster/poster_big.png
基于网站我们来做一个示例
基于有顺序的二维数据框的出图
- # Libraries
- library(tidyverse)
- ## -- Attaching packages ----------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
- ## √ ggplot2 3.2.0 √ purrr 0.3.2
- ## √ tibble 2.1.3 √ dplyr 0.8.3
- ## √ tidyr 0.8.3 √ stringr 1.4.0
- ## √ readr 1.3.1 √ forcats 0.4.0
- ## -- Conflicts -------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
- ## x dplyr::filter() masks stats::filter()
- ## x dplyr::lag() masks stats::lag()
- library(hrbrthemes)
- ## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
- ## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
- ## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
- library(plotly)
- ##
- ## Attaching package: 'plotly'
- ## The following object is masked from 'package:ggplot2':
- ##
- ## last_plot
- ## The following object is masked from 'package:stats':
- ##
- ## filter
- ## The following object is masked from 'package:graphics':
- ##
- ## layout
- library(patchwork)
- # install.packages("babynames")
- library(babynames)
- library(viridis)
- ## Loading required package: viridisLite
- # ?as.Date
- # Load dataset from github
- data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
- data$date <- as.Date(data$date)
这里仅仅提取最后的十个数据进行点线图的可视化
- # Plot
- data %>%
- tail(10) %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- geom_point(color="#69b3a2", size=4) +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
这里使用最后的60个数据进行可视化
- # Plot
- p1 <- data %>%
- tail(60) %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- ggtitle("Line chart") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
- p2 <- data %>%
- tail(60) %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- geom_point(color="#69b3a2", size=2) +
- ggtitle("Connected scatterplot") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
- p = p1 + p2
- p
散点图展示时间序列
- # Plot
- data %>%
- tail(60) %>%
- ggplot( aes(x=date, y=value)) +
- geom_point(color="#69b3a2", size=2) +
- ggtitle("Line chart") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
分组时间序列可视化
- library(babynames)
- # Load dataset
- data <- babynames %>%
- filter(name %in% c("Ashley", "Amanda")) %>%
- filter(sex=="F")
- #plot
- data %>%
- ggplot( aes(x=year, y=n, group=name, color=name)) +
- geom_line() +
- scale_color_viridis(discrete = TRUE, name="") +
- theme(legend.position="none") +
- ggtitle("Popularity of American names in the previous 30 years") +
- theme_ipsum()
geom_segment函数突出展示变化趋势
- library(grid) # needed for arrow function
- library(ggrepel)
- # data
- tmp <- data %>%
- filter(year>1970) %>%
- select(year, name, n) %>%
- spread(key = name, value=n, -1)
- # data for date
- tmp_date <- tmp %>% sample_frac(0.3)
- tmp%>%
- ggplot(aes(x=Amanda, y=Ashley, label=year)) +
- geom_point(color="#69b3a2") +
- geom_text_repel(data=tmp_date) +
- geom_segment(color="#69b3a2",
- aes(
- xend=c(tail(Amanda, n=-1), NA),
- yend=c(tail(Ashley, n=-1), NA)
- ),
- arrow=arrow(length=unit(0.3,"cm"))
- ) +
- theme_ipsum()
- data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
- data$date <- as.Date(data$date)
- p1 <- data %>%
- tail(10) %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- geom_point(color="#69b3a2", size=4) +
- ggtitle("Not cuting") +
- ylab("bitcoin price ($)") +
- theme_ipsum() +
- ylim(0,10000)
- p2 <- data %>%
- tail(10) %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- geom_point(color="#69b3a2", size=4) +
- ggtitle("Cuting") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
- p1 + p2
reference
https://www.data-to-viz.com/graph/connectedscatter.html