linux下用awk计算fasta序列的长度
fasta序列文件data.fa
>Gorai.004G111100.1 ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCACGCCAAATTTGATAACAGAACTGAGTTTTC >Gorai.004G111100.2 ATGTTTTTCATGCTCCGGTGGACAAGATACTCTGGGATGCCGGGGAACAGTTTTTCCTTTTCTTGGCAGACATATGCACATAAAATTCTT >Gorai.004G111100.3 ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCAC >Gorai.004G111100.4 ATGGGAATGCATGAACTAGCAGCCAAAGTTGATGAGT
首先将fasta序列转换成一行显示,命令如下:
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0"%":$0 }' data.fa >data2.fa
结果:
>Gorai.004G111100.1%ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCACGCCAAATTTGATAACAGAACTGAGTTTTC >Gorai.004G111100.2%ATGTTTTTCATGCTCCGGTGGACAAGATACTCTGGGATGCCGGGGAACAGTTTTTCCTTTTCTTGGCAGACATATGCACATAAAATTCTT >Gorai.004G111100.3%ATGGGTACTGCTCCAACCCAGTGCCCTTCTGGAATCACTGCAAATTTCCAC >Gorai.004G111100.4%ATGGGAATGCATGAACTAGCAGCCAAAGTTGATGAGT
长度计算:
awk -F"%" '{print $1"\t"length($2)}' data2.fa >data3.fa
结果:
>Gorai.004G111100.1 80 >Gorai.004G111100.2 90 >Gorai.004G111100.3 51 >Gorai.004G111100.4 37
1F
原创地址: http://tiramisutes.github.io/2015/11/08/fa-length.html
2F
不错 哈哈