I’m adding bacterial sugars to gmml and have hit the limit of the current glycam residue nomenclature:
https://glycam.org/docs/forcefield/glycam-naming-2/index.html e.g.:
No linkage code found for possible combo: 8,7,3,4,5 in residue 0KX
Xiao maxed out the alphabet with these additions:
We knew this day was coming. The plan is to start using random permutations of alphanumeric. With 3 positions we get to 238,328 permutations (62^3). So far we’ve used up 2k for existing GLYCAM residues, but we also can’t overlap with names in use like the protein names ASN ALA etc.
So essentially I need to create a table for looking these up as I won’t be able to use the existing logic that checks for e.g. A vs B.
Some doubts are for something like this:
7,6,2,3,4 in residue 0Lh
We essentially from the old system like 0Lh, 3Lh, etc and then when we hit the one that has the 7,6,2,3,4 combos we jump to using 000, 001, etc.
I need something that handles the case that when we don’t have something in the lookup table, it needs to get assigned. So if I lookup a combination like 7,6,2,3,4,1 and it doesn’t exist, and the last entry is 005, then 006 gets inserted for that combination.