Hi all,
I'm currently working on a program (bioinformatics project) that involves reading multiple files, including a matrix, and outputting the results onto another file. What I'm having the most trouble with is how I would go about reading the matrix file like a coordinate system (for lack of a better term). For example, if I have the following amino acids in:
fileA: CTTNCLAPLA
fileB: CTTNSITPVA
The program would then read the two files, compare each letter, and refer to the matrix to find the number corresponding to the two letters, which in turn determines the probability of a letter in fileA mutating to a letter in fileB.
Since the first letter in each file is C, the program would read the matrix and output in a separate file:
C
.
S
The "." meaning that the number (according to the matrix, 0) was 0 but not the same letter.
Here is part of the matrix (the rest wouldn't fit):
NOTE: The matrix I must use is in a .csv file, and does not include spaces I believe.
_ A R N D C Q E G H I L K
A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1
R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3
N 0 0 2 2 -4 1 1 0 2 -2 -3 1
D 0 -1 2 4 -5 2 3 1 1 -2 -4 0
C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5
Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1
E 0 -1 1 3 -5 2 4 0 1 -2 -3 0
G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2
H -1 2 2 1 -3 3 1 -2 6 -2 -2 0
I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2
L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3
K -1 3 1 0 -5 1 0 -2 0 -2 -3 5
M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0
F -3 -4 -3 -6 -4 -5 -5 -5 -2 1 2 -5
P 1 0 0 -1 -3 0 -1 0 0 -2 -3 -1
S 1 0 1 0 0 -1 0 1 -1 -1 -3 0
T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0
W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3
Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4
V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 |
I apologize if my explanation is confusing. Please let me know if you need any clarification. Any help is greatly appreciated. Thanks in advance!