TheJobOverflow | Amazon , Online Assessment | Regex Generator | Password Policy

Question: Amazon , Online Assessment | Regex Generator | Password Policy | 2023

Entering edit mode

Entering edit mode

Problem-1 Regex Generator

In this problem, we are given three strings of equal length, a, b, and c, all composed of uppercase English letters. The task is to construct a special "regex" string according to a unique set of rules. This generated regex must successfully match both string a and string b, while simultaneously not matching string c. The problem defines its own regex syntax, where each position can be either a single literal character (e.g.,

A) or a character class of unique, sorted letters in square brackets (e.g., ``). The primary objective is to find the longest possible valid regex. If multiple regexes of maximum length exist, the one that is lexicographically smallest must be selected. If no such regex can be formed, the function should return "-1".

Solution 1: Greedy Approach

This solution attempts to solve the problem with a simple, greedy strategy. The core idea is to find the first possible position to invalidate the match against string c and then maximize the length at all other positions. This approach is fundamentally flawed because it makes a local decision without considering the global impact on the dual objectives of maximizing length and minimizing lexicographical value, often failing to produce the correct optimal result.

The logic is as follows:

Find Invalidation Point: Iterate through the strings from left to right and find the first index i where a[i] and b[i] are identical, but c[i] is different. This index is chosen as the single point where the regex will be forced not to match c.
Handle Failure: If no such index can be found after checking all positions, it assumes no solution is possible and returns "-1".
Construct Regex: Build the final regex string piece by piece.
- At the chosen invalidation index i, construct a character class that includes every uppercase letter except for the character c[i].
- At every other index j (where j!= i), construct a full character class `` to greedily maximize the length.
Return Result: Join the generated components into a single string and return it.

def findLongestRegex(x, y, z):
    alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    n = len(x)
    full = "[" + alpha + "]"
    idx = -1

    # Find the first position where x and y match, but z differs
    for i in range(n):
        if x[i] == y[i] and x[i]!= z[i]:
            idx = i
            break

    if idx == -1:
        return "-1"

    res =
    for i in range(n):
        if i == idx:
            # At the breaking point, match all except z[i]
            res.append("[" + alpha.replace(z[i], "") + "]")
        else:
            res.append(full)

    return "".join(res)

Complexity Analysis

Let n be the length of the input strings.

Time Complexity: O(n) The algorithm involves a single loop to find the invalidation index, which takes O(n) time. Building the result string also takes O(n) time. The total time complexity is linear.
Space Complexity: O(n) The space required is dominated by the storage for the resulting regex string, which has a length proportional to n.

Solution 2: Dynamic Programming with Memoization (Optimal)

This solution is the correct and robust approach required to solve the problem. The problem's conflicting optimization goals (length vs. lexicographical order) and the need to make globally optimal decisions make it a classic candidate for dynamic programming. By using recursion with memoization, this method explores all valid choices at each position and systematically builds the one true optimal regex.

The logic is as follows:

Define DP State: The problem is solved recursively with a function whose state is (index, has_invalidated).
- index: The current character position we are building a component for.
- has_invalidated: A boolean flag indicating if the regex built so far is already guaranteed not to match string c.
Base Case: The recursion terminates when index reaches the end of the strings. If has_invalidated is true, the path is valid, and an empty string is returned. If false, the path has failed, and a special failure signal is returned.
Recursive Step: For the current index, the function generates all valid components that match a[index] and b[index].
- Candidate Generation: If a[index] == b[index], the options are a literal character (e.g., A) or a character class (e.g., [A]). If they differ, the only option is a character class with both (e.g., ``).
- Recurse and Evaluate: For each candidate component, the function checks if it matches c[index], updates the has_invalidated flag accordingly, and makes a recursive call for the next index.
- Select Best Option: It collects all successful results from the recursive calls. It compares these potential solutions first by length (longer is better) and then lexicographically (smaller is better) to find the optimal choice for the current state.

Memoize and Return: The optimal result for each state (index, has_invalidated) is stored to avoid re-computation. The final answer is the result of the initial call solve(0, false).

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <map>

using namespace std;

string str_a, str_b, str_c;
int n;
map<pair<int, bool>, string> memo;
const string FAILURE = "!";

string get_best(const string& s1, const string& s2) {
    if (s1 == FAILURE) return s2;
    if (s2 == FAILURE) return s1;

    if (s1.length() > s2.length()) {
        return s1;
    }
    if (s2.length() > s1.length()) {
        return s2;
    }
    return min(s1, s2);
}

string solve(int i, bool invalidated) {
    if (i == n) {
        return invalidated? "" : FAILURE;
    }

    if (memo.count({i, invalidated})) {
        return memo[{i, invalidated}];
    }

    string best_res = FAILURE;
    vector<string> options;

    if (str_a[i] == str_b[i]) {
        options.push_back(string(1, str_a[i]));
        options.push_back("[" + string(1, str_a[i]) + "]");
    } else {
        char ch1 = min(str_a[i], str_b[i]);
        char ch2 = max(str_a[i], str_b[i]);
        options.push_back("[" + string(1, ch1) + string(1, ch2) + "]");
    }

    for (const auto& comp : options) {
        bool matches_c = (comp.length() == 1)? (comp == str_c[i]) : (comp.find(str_c[i])!= string::npos);
        bool new_invalidated = invalidated ||!matches_c;
        
        string suffix = solve(i + 1, new_invalidated);

        if (suffix!= FAILURE) {
            best_res = get_best(best_res, comp + suffix);
        }
    }

    return memo[{i, invalidated}] = best_res;
}

string generateRegex(const string& a, const string& b, const string& c) {
    str_a = a;
    str_b = b;
    str_c = c;
    n = a.length();
    memo.clear();

    string result = solve(0, false);

    return (result == FAILURE)? "-1" : result;
}

int main() {
    string a_in, b_in, c_in;
    cin >> a_in >> b_in >> c_in;
    cout << generateRegex(a_in, b_in, c_in) << endl;
    return 0;
}

Complexity Analysis

Let n be the length of the input strings.

Time Complexity: O(n^2) The number of DP states is 2 * n (O(n)). In each state, work is done involving string operations (concatenation, comparison) whose length is proportional to n. This results in a total time complexity of O(n) states * O(n) work/state = O(n^2).
Space Complexity: O(n^2) The space is dominated by the memoization table, which stores O(n) states. The value for each state is a string that can be up to O(n) in length, leading to a total space complexity of O(n^2).

ADD COMMENT • link 7 weeks ago ANIRUDH • 0