Loading Similar Posts
A distributed system generates event logs. Due to a network issue, logs arrived out of order and some were duplicated. Each log entry has the format T<timestamp>:<event_id> where the timestamp is a non-negative integer.
Given N such log strings (with duplicates possible), you must:
Deduplicate: If entries have the same timestamp and the same event_id, they are duplicates. Keep only one.
Sort: Sort by timestamp in ascending order. Break ties by sorting the event_id lexicographically.
Output: Output only the event_ids, one per line, in their final sorted order.
Output Format:
Event IDs in sorted order, one per line.
Example Input:
5
T100:boot
T105:db_read
T100:boot
T102:auth_check
T105:cache_miss
Example Output:
boot
auth_check
cache_miss
db_read
Unordered Log Processing Solution
Topics Involved / Prerequisites
Overview
We must accurately extract the numerical timestamp and the string event ID from the raw text of each log entry.
Storing these parsed entries inside a C++ Set equipped with a custom struct automatically handles both deduplication and our specific two-tier sorting rules.
Once all the logs are parsed and inserted, we simply iterate through the Set from beginning to end to print the event IDs in their perfectly resolved order.
Approach
1. Parsing the Logs
The logs are given in a strict format: T<timestamp>:<event_id>. For each string, we can find the position of the colon (:). Everything from index 1 (skipping the 'T') up to the colon is our integer timestamp. Everything after the colon is our string event_id. We extract these two pieces of data.
2. Auto-Sorting and Deduplication (The Set)
We create a custom struct to hold the timestamp and id. Inside this struct, we overload the < operator to tell C++ exactly how to compare two logs: first by timestamp, and if they tie, by the string ID. By inserting these structs into a std::set, C++ automatically discards any exact duplicates and keeps the entire collection perfectly sorted at all times!
Code Implementation
Time Complexity
Space: O(N) - In the worst-case scenario where there are no duplicates, we store all N elements in our Set.