用awk和sed快速将fasta格式的序列改成一行显示

2011/03/271 6,105

Some time when you want to change the fasta seq into one line, just as following. Then it will help you to do other process.
I know that perl and other script will do that, however, I would introduce two simple and fast way do achieve that with awk and sed.

[code lang="text"]
> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
paxpaxpax
pax
[/code]

[code lang="text"]
> sq1 foofoofoobarfoofoofoo
> sq2 quxquxquxbarquxquxquxbarquxx
> sq3 paxpaxpaxpax
[/code]

For awk:
[code lang="bash"]
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' YourFile
[/code]

For sed:
[code lang="bash"]
sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' YourFile
[/code]

Today, I want to extract contig which is more 500bp from my aseembly result, So I do that as following:
[code lang="bash"]
sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' |awk '{if (length($5)>500 ) print ">contig-"FNR"\n"$5}'

[/code]

用awk和sed快速将fasta格式的序列改成一行显示

来自外部的引用

发表评论