433. Minimum Genetic Mutation

A gene string can be represented by an 8-character long string, with choices from ‘A’, ‘C’, ‘G’, and ‘T’.

Suppose we need to investigate a mutation from a gene string startGene to a gene string endGene where one mutation is defined as one single character changed in the gene string.

  • For example, “AACCGGTT” --> “AACCGGTA” is one mutation.

There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.

Given the two gene strings startGene and endGene and the gene bank bank, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.

Note that the starting point is assumed to be valid, so it might not be included in the bank.
 

Example 1:
Example 2:
Constraints:
  • 0 <= bank.length <= 10
  • startGene.length == endGene.length == bank[i].length == 8
  • startGene, endGene, and bank[i] consist of only the characters [‘A’, ‘C’, ‘G’, ‘T’].

From: LeetCode
Link: 433. Minimum Genetic Mutation


Solution:

Ideas:

1. Initialization

  • visited is an array used to keep track of which genes in the bank have been visited.
  • queue is a dynamically allocated array used to perform BFS, holding the genes to be processed. It has bankSize + 1 slots to accommodate the startGene and the genes in the bank.
  • front and rear are variables used to manage the queue.
  • level is used to count the number of mutations.

2. Breadth-First Search (BFS)

  • Enqueue StartGene: The startGene is enqueued to the queue, and BFS begins.
  • Process Each Level: For each level in BFS, the code processes all genes currently in the queue. For each gene in the queue, it’s compared with each gene in the bank.
  • Enqueue Valid Mutations: If a gene in the bank is a valid mutation (differs by exactly one character) and hasn’t been visited, it’s marked as visited and enqueued to the queue for processing in the next level of BFS.
  • Check for EndGene: If the endGene is found in the queue, the function returns the level, representing the minimum number of mutations. If the queue is exhausted and the endGene hasn’t been found, the function returns -1, indicating that there is no valid mutation path from startGene to endGene.
Code:
int minMutation(char * startGene, char * endGene, char ** bank, int bankSize) {
    if (bankSize == 0) return -1;
    
    int *visited = (int *)calloc(bankSize, sizeof(int));
    char **queue = (char **)malloc((bankSize + 1) * sizeof(char *));
    int front = 0, rear = 0;
    
    queue[rear++] = startGene;
    int level = 0;
    
    while (front < rear) {
        int size = rear - front;
        for (int i = 0; i < size; ++i) {
            char *gene = queue[front++];
            if (strcmp(gene, endGene) == 0) {
                free(visited);
                free(queue);
                return level;
            }
            for (int j = 0; j < bankSize; ++j) {
                if (visited[j]) continue;
                int diff = 0;
                for (int k = 0; k < 8; ++k) {
                    if (gene[k] != bank[j][k]) ++diff;
                    if (diff > 1) break;
                }
                if (diff == 1) {
                    visited[j] = 1;
                    queue[rear++] = bank[j];
                }
            }
        }
        ++level;
    }
    
    free(visited);
    free(queue);
    return -1;
}
09-29 14:23