Longest Consecutive Sequence in an Unsorted Array


Question

You are given an Array of numbers and they are unsorted/random order. You are supposed to find the longest sequence of consecutive numbers in the array. Note the sequence need not be in sorted order within the array. Here is an example :

Input :

A[] = {10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101}  

Output is :

{16,17,18,19,20,21,22}  

The solution needs to be of O(n) complexity.

I am told that the solution involves using a hash table and I did come across few implementations that used 2 hash tables. One cannot sort and solve this because sorting would take O(nlgn) which is not what is desired.

1
19
2/15/2013 6:18:29 AM

Accepted Answer

Here is a solution in Python that uses just a single hash set and doesn't do any fancy interval merging.

def destruct_directed_run(num_set, start, direction):
  while start in num_set:
    num_set.remove(start)
    start += direction
  return start

def destruct_single_run(num_set):
  arbitrary_member = iter(num_set).next()
  bottom = destruct_directed_run(num_set, arbitrary_member, -1) 
  top = destruct_directed_run(num_set, arbitrary_member + 1, 1)
  return range(bottom + 1, top)

def max_run(data_set):
  nums = set(data_set)
  best_run = []
  while nums:
    cur_run = destruct_single_run(nums)
    if len(cur_run) > len(best_run):
      best_run = cur_run
  return best_run

def test_max_run(data_set, expected):
  actual = max_run(data_set)
  print data_set, actual, expected, 'Pass' if expected == actual else 'Fail'

print test_max_run([10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101], range(16, 23))
print test_max_run([1,2,3], range(1, 4))
print max_run([1,3,5]), 'any singleton output fine'
4
9/17/2011 8:50:24 AM

You could have two tables:

  • Start table: (start-point, length)
  • End table: (ending-point, length)

When adding a new item, you would check:

  • Does value + 1 exist in start table? If so, delete it and create a new entry of (value, length + 1) where length is the "current" length. You'd also update the end table with the same end point but greater length.
  • Does value - 1 exist in the end table? If so, delete it and create a new entry of (value, length + 1), and this time update the start table (the starting position will be the same, but the length will be increased)

If both conditions hold, then you're effectively stitching two existing sequences together - replace the four existing entries with two new entries, representing the single longer sequence.

If neither condition holds, you just create a new entry of length 1 in both tables.

After all the values have been added, you can just iterate over the start table to find the key with the largest value.

I think this would work, and would be O(n) if we assume O(1) hash lookup/add/delete.

EDIT: C# implementation. It took a little while to get right, but I think it works :)

using System;
using System.Collections.Generic;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        Dictionary<int, int> starts = new Dictionary<int, int>();
        Dictionary<int, int> ends = new Dictionary<int, int>();

        foreach (var value in input)
        {
            int startLength;
            int endLength;
            bool extendsStart = starts.TryGetValue(value + 1,
                                                   out startLength);
            bool extendsEnd = ends.TryGetValue(value - 1,
                                               out endLength);

            // Stitch together two sequences
            if (extendsStart && extendsEnd)
            {
                ends.Remove(value + 1);
                starts.Remove(value - 1);
                int start = value - endLength;
                int newLength = startLength + endLength + 1;
                starts[start] = newLength;
                ends[start + newLength - 1] = newLength;
            }
            // Value just comes before an existing sequence
            else if (extendsStart)
            {
                int newLength = startLength + 1;
                starts[value] = newLength;
                ends[value + newLength - 1] = newLength;
                starts.Remove(value + 1);
            }
            else if (extendsEnd)
            {
                int newLength = endLength + 1;
                starts[value - newLength + 1] = newLength;
                ends[value] = newLength;
                ends.Remove(value - 1);
            }
            else
            {
                starts[value] = 1;
                ends[value] = 1;
            }
        }

        // Just for diagnostics - could actually pick the longest
        // in O(n)
        foreach (var sequence in starts)
        {
            Console.WriteLine("Start: {0}; Length: {1}",
                              sequence.Key, sequence.Value);
        }
    }
}

EDIT: Here's the single-hashset answer implemented in C# too - I agree, it's simpler than the above, but I'm leaving my original answer for posterity:

using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        HashSet<int> values = new HashSet<int>(input);

        int bestLength = 0;
        int bestStart = 0;
        // Can't use foreach as we're modifying it in-place
        while (values.Count > 0)
        {
            int value = values.First();
            values.Remove(value);
            int start = value;
            while (values.Remove(start - 1))
            {
                start--;
            }
            int end = value;
            while (values.Remove(end + 1))
            {
                end++;
            }

            int length = end - start + 1;
            if (length > bestLength)
            {
                bestLength = length;
                bestStart = start;
            }
        }
        Console.WriteLine("Best sequence starts at {0}; length {1}",
                          bestStart, bestLength);
    }
}

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon