Write your question here.
Hello guys, first of all i am really sorry for inconvenience i am very new around here.
i have a huge text file that looks like this
I would like to compare Seq1 to with the rest of the sequences, if the letter matches it does nothing but if it doesnt then puts a "-" so
the output: A--D--G since only A,D and G are matching. I have no idea how to do it. I am not familiar with programming so i need suggestions and also how can i do it.
In other words you have a table (aka 2D array) and you want to highlight the conserved columns. A column is conserved, if same character is on every line.
The real question is: Why? You say that you are not familiar with programming. Why do you try that approach? Is it so that you have to program or do you actually want that output, no matter how?
@keskiverto
I am working on my thesis, there is this data and i have to do this by hand actually. since my data is huge, its going to take too much time. So i thought with programming it would make it easier for me as i was told from a friend, it would be a simple coding but unfortunately he doesnt know how to do it. I was advised to do it in C++ so here i am, trying to figure out.
Hi minomic, i really appreciate ur help and thank you for your concern, no i couldnt solve the problem, i have run the code you gave but i am getting an error. By the way where should i get the output? In the same file? I am still in the process of understanding your coding. Thank you so much again.
Try it on lines of different size.
It would be trivial to fix existing code, but as you need only one line of output I decided to write another variant.
Note that it operates on these assumptions:
1) Your file does not actually contain words Sequence1: , etc. It looks as minomic shown. If it does contain them, you will need to skip those parts first.
2) Your sequences are separated by corect newline symbol and do not have any excess trailing spaces.
3) It has a very primitive handling of string of diferent length (simply does not process characters after length of shortest string)
4) Reads from file "input.txt" and outputs to screen
If this is okay, here it is:
The screenshot looks like "aligned" sequences due to the "gap" characters. What is the format of it? Fasta? Interstitial collagenases?
I am working on my thesis, there is this data and i have to do this by hand actually.
Use brain, not hand. You definitely should, as part of that learning process, find out the available tools and learn their merits and pitfalls. Your thesis supervisor should have introduced you to at least something at start.
Handling sequences is a "basic wheel" in bioinformatics and thus no thesis should reinvent it. Besides, it seems clear that you simply need to analyze data rather than to develop new algorithms. If it would be the latter, then you would have programming background.
Yes i found out just like 30 min ago. Anyway, since i am already involved with it. Downloaded c++ and started to learn. Yes i just need to analyze data rather than to develop an algorithm. But i dont see how does this harm you. I am trying to learn something in here. Thanks for the advice tho.
Yes my file actually contains words (Organism Names) so i need to skip those parts and yes again some strings are longer. Now i am going to take a look on your coding and understand.
you can extract the lines and keep them in a array (i.e. vector)
then string s= array[0]; s is initialized as first string.
now use a loop through array[1] to array[last]
-the chars of array[i] which are not same as s[i], replace the corresponding char in s by '-'
s is the answer
EDIT: here, the main part is extracting the lines. it depends how the data is formatted.
its a easy format:
keep the names in odd lines(1,3,5....)
and "ABCDEFG" s in even lines(2,4,6...)
By the lines shown on the screenshot the format probably has entries, where one entry contains two fields: title and sequence.
The title is text on one line that starts with >.
The sequence can be divided into multiple lines. One does know that a sequence field ends if there is EOF, or > (of the next entry).
MiiNiPaa's code reads one line of text from file at lines 16 and 18. Those should read one entry each, if the format is as the screenshot hints.
Both programs shown so far write output to std::cout (screen). You can redirect the output to a file.