perl - Searching FASTA file for motif and returning title line for each sequence containing the motif -
below code have searching fasta file entered @ command line user-provided motif. when run , enter motif know in file returns 'motif not found'. i'm beginner in perl, , can't fugure out how print motif found, let alone return title line. appreciate in resolving this.
thanks.
use warnings; use strict; $motif; $filename; @seq; #my $motif_found; $scalar; $filename = $argv[0]; open (dnafile,$filename) || die "cannot open file\n"; @seq = split(/[>]/, $filename); print "enter motif search for; "; $motif = <stdin>; chomp $motif; foreach $scalar(@seq) { if ($scalar =~ m/$motif/ig) { print "motif found in following sequences\n"; print $scalar; } else { print "motif not found\n"; } } close dnafile;
there no point in "rolling own" fasta parser. bioperl has spent years developing one, , silly not use it.
use strict; use bio::seqio; $usage = "perl dnamotif.pl <fasta file> <motif>"; $fasta_filename = shift(@argv) or die("usage: $usage $!"); $motif = shift(@argv) or die("usage: $usage $!"); $fasta_parser = bio::seqio->new(-file => $fasta_filename, -format => 'fasta'); while(my $seq_obj = $fasta_parser->next_seq()) { printf("searching sequence '%s'...", $seq_obj->id); if((my $pos = index($seq_obj->seq(), $motif)) != -1) { printf("motif found @ position %d!\n", $pos + 1); } else { printf("motif not found.\n"); } }
this program finds (1-based) position of first motif match in each sequence. can edited find position of each match. may not print things in format want/need. i'll leave these issues "an exercise reader." :)
if need download bioperl, try this link. let me know if have issues.
for bioinformatics questions this, i've found biostar forum helpful.
Comments
Post a Comment