我们一般会熟悉sam/bam格式文件,就是把测序reads比对到参考基因组后的文件!bam或者bed格式的文件主要是为了追踪我们的reads到底比对到了参加基因组的什么区域,而UCSC规定的这几个文件格式(wig、bigWig和bedgraph)用处不一样,仅仅是为了追踪参考基因组的各个区域的覆盖度,测序深度!而且这些定义好的文件,可以无缝连接到UCSC的Genome Browser工具里面进行可视化!
- Wiggle Track Format (WIG):http://genome.ucsc.edu/goldenPath/help/wiggle.html
- bigWig Track Format :http://genome.ucsc.edu/goldenPath/help/bigWig.html
- BedGraph Track Format :http://genome.ucsc.edu/goldenPath/help/bedgraph.html
- bigWigToBedGraph — this program converts a bigWig file to ASCII bedGraph format.
- bigWigToWig — this program converts a bigWig file to wig format.
- bigWigSummary — this program extracts summary information from a bigWig file.
- bigWigAverageOverBed — this program computes the average score of a bigWig over each bed, which may have introns.
- bigWigInfo — this program prints out information about a bigWig file.
samtools depth -r chr12:126073855-126073965 Ip.sorted.bam chr12 126073855 5 chr12 126073856 15 chr12 126073857 31 chr12 126073858 40 chr12 126073859 44 chr12 126073860 52
首先需要设置这个wig文件在UCSC的Genome Browser工具里面显示的属性:
track type=wiggle_0 name=track_labeldescription=center_labelvisibility=display_modecolor=r,g,baltColor=r,g,bpriority=priorityautoScale=on|offalwaysZero=on|offgridDefault=on|offmaxHeightPixels=max:default:mingraphType=bar|pointsviewLimits=lower:upperyLineMark=real-valueyLineOnOff=on|offwindowingFunction=mean+whiskers|maximum|mean|minimumsmoothingWindow=off|2-16
type=wiggle_0 这个是默认的, 而且到目前为止,必须是这样的!其余的都是可选参数,自己读官网说明
这些参数一般不用管,除非你很熟悉了UCSC的Genome Browser工具
track type=print wiggle_0 name=hek description=hek variableStep chrom=chr1 span=10 10008 7 10018 14 10028 27 10038 37 10048 45 10058 43 10068 37 10078 26
Save this wiggle file to your machine (this satisfies steps 1 and 2 above).
Save this text file to your machine. It contains the chrom.sizes for the human (hg19) assembly (this satisfies step 4 above).
Download the wigToBigWig utility (see step 3).
Run the utility to create the bigWig output file (see step 5):
wigToBigWig wigVarStepExample.gz hg19.chrom.sizes myBigWig.bw
最后我们讲一下BedGraph格式文件,它是BED文件的扩展,是4列的BED格式,但是需要添加UCSC的Genome Browser工具里面显示的属性,但是一般就定义有限的几个属性即可。
track type=bedGraph name=track_labeldescription=center_label visibility=display_modecolor=r,g,baltColor=r,g,b priority=priorityautoScale=on|offalwaysZero=on|off gridDefault=on|offmaxHeightPixels=max:default:min graphType=bar|pointsviewLimits=lower:upper yLineMark=real-valueyLineOnOff=on|off windowingFunction=maximum|mean|minimumsmoothingWindow=off|2-16
- These coordinates are zero-based, half-open.
- Chromosome positions are specified as 0-relative.
- The first chromosome position is 0.
- The last position in a chromosome of length N would be N – 1.
- Only positions specified have data.
- Positions not specified do not have data and will not be graphed.
- All positions specified in the input data must be in numerical order.
我这里有一个MACS对CHIP-seq数据call peaks附带的BedGraph文件,也可以用工具直接从bam格式文件得到:
track type=bedGraph name=”hek_treat_all” description=”Extended tag pileup from MACS version 1.4.2 20120305″ chr1 9997 9999 1 chr1 9999 10000 2 chr1 10000 10001 4 chr1 10001 10003 5 chr1 10003 10007 6 chr1 10007 10010 7 chr1 10010 10012 8 chr1 10012 10015 9 chr1 10015 10016 10 chr1 10016 10017 11 chr1 10017 10018 12