Question: IBM CIC, Recent Online Assessment Questions | Unordered Event Logs | 2026
0
Entering edit mode

Problem Statement: Unordered Log Processing

A distributed system generates event logs. Due to a network issue, logs arrived out of order and some were duplicated. Each log entry has the format T<timestamp>:<event_id> where the timestamp is a non-negative integer.

Given N such log strings (with duplicates possible), you must:

  1. Deduplicate: If entries have the same timestamp and the same event_id, they are duplicates. Keep only one.

  2. Sort: Sort by timestamp in ascending order. Break ties by sorting the event_id lexicographically.

  3. Output: Output only the event_ids, one per line, in their final sorted order.

Output Format:

  • Event IDs in sorted order, one per line.

Example Input:

5
T100:boot
T105:db_read
T100:boot
T102:auth_check
T105:cache_miss

Example Output:

boot
auth_check
cache_miss
db_read
ADD COMMENTlink 14 days ago Sarthak • 10
Entering edit mode
0

Unordered Log Processing Solution

Topics Involved / Prerequisites

  • String Parsing
  • Custom Comparators
  • Set Data Structure (Binary Search Trees)

Overview

We must accurately extract the numerical timestamp and the string event ID from the raw text of each log entry.

Storing these parsed entries inside a C++ Set equipped with a custom struct automatically handles both deduplication and our specific two-tier sorting rules.

Once all the logs are parsed and inserted, we simply iterate through the Set from beginning to end to print the event IDs in their perfectly resolved order.

Approach

1. Parsing the Logs

The logs are given in a strict format: T<timestamp>:<event_id>. For each string, we can find the position of the colon (:). Everything from index 1 (skipping the 'T') up to the colon is our integer timestamp. Everything after the colon is our string event_id. We extract these two pieces of data.

2. Auto-Sorting and Deduplication (The Set)

We create a custom struct to hold the timestamp and id. Inside this struct, we overload the < operator to tell C++ exactly how to compare two logs: first by timestamp, and if they tie, by the string ID. By inserting these structs into a std::set, C++ automatically discards any exact duplicates and keeps the entire collection perfectly sorted at all times!

Code Implementation

#include <iostream>
#include <string>
#include <set>
#include <vector>

using namespace std;

// Custom struct to hold our parsed log data
struct Log {
    long long ts;
    string id;
    
    // Operator overload to dictate how the Set sorts and deduplicates
    bool operator<(const Log& other) const {
        if (ts != other.ts) {
            return ts < other.ts; // Sort by timestamp first
        }
        return id < other.id;     // Tie-breaker: alphabetical order
    }
};

void processLogs(int n, vector<string>& logs) {
    set<Log> uniqueLogs;
    
    for (int i = 0; i < n; i++) {
        string s = logs[i];
        
        // Find the colon delimiter
        int colonPos = s.find(':');
        
        // Extract timestamp (skip the 'T' at index 0)
        long long ts = stoll(s.substr(1, colonPos - 1));
        
        // Extract the event ID
        string id = s.substr(colonPos + 1);
        
        // Insert into the set (handles sorting and duplicates automatically)
        uniqueLogs.insert({ts, id});
    }
    
    // Print the final sorted event IDs
    for (auto it = uniqueLogs.begin(); it != uniqueLogs.end(); ++it) {
        cout << it->id << "\n";
    }
}

int main() {
    // Sample Case
    int n = 5;
    vector<string> logs = {
        "T100:boot",
        "T105:db_read",
        "T100:boot",
        "T102:auth_check",
        "T105:cache_miss"
    };
    
    cout << "Output:\n";
    processLogs(n, logs);
    
    /* Expected Output:
    boot
    auth_check
    cache_miss
    db_read
    */
    
    return 0;
}

Time Complexity

  • Time: O(N \log N) - String parsing takes O(L) where L is the length of the string, and inserting into a balanced binary search tree (the Set) takes O(\log N) time for each of the N elements.

Space: O(N) - In the worst-case scenario where there are no duplicates, we store all N elements in our Set.

ADD REPLYlink 5 days ago
Rohit
• 30

Login before adding your answer.

Similar Posts
Loading Similar Posts