codility absolute distinct count from an array

Question

so i took the codility interview test yesterday and was informed today that i failed, unfortunately i wasnt given any other information by either codility nor the employer as to where i screwed up so i would appreciate some help in knowing where i went wrong. i know codility pays alot of emphasis on how fast the program runs and how it behaves for large numbers. now i didnt copy paste the questions so this the approx of what i remember

1. count the number of elements in an array a which are absolute distinct, what it means is if an array had -3 and 3 in it these numbers are not distinct because|-3|=|3|. i think an example would clear it up better

a={-5,-3,0,1,-3} the result would be 4 because there are 4 absolute distinct elements in this array.

The question also stated that a.length would be <=10000, and most importantly it stated that assume that the array is sorted in ascending order but i didnt really understand why we would need it to be sorted

IF YOU THINK I MISSED SOMETHING ASK AND I WILL TRY TO CLEAR UP THE QUESTION FURTHER.

here is my code

``````import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;

public class test2 {

int test(int[] a){
Set<Integer> s=new HashSet<Integer>();

for(int i=0;i<a.length;i++){

}
return s.size();

}

public static void main(String[] args) {
test2 t=new test2();
int[] a={1,1,1,2,-1};
System.out.println(t.test(a));

}

}
``````
1
11
5/14/2014 10:24:15 AM

If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.

EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.

In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.

``````public static void main(String[] args) throws Exception {
for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
int[] nums = new int[len];
for (int i = 0; i < len; i++)
nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
Arrays.sort(nums);

long timeArray = 0;
long timeSet = 0;
int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
for (int i = 0; i < runs; i++) {
long time1 = System.nanoTime();
int count = countDistinct(nums);
long time2 = System.nanoTime();
int count2 = countDistinctUsingSet(nums);
long time3 = System.nanoTime();
timeArray += time2 - time1;
timeSet += time3 - time2;
assert count == count2;
}
System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
}
}

private static int countDistinct(int[] nums) {
int lastLeft = Math.abs(nums[0]);
int lastRight = Math.abs(nums[nums.length - 1]);
int count = 0;
for (int a = 1, b = nums.length - 2; a <= b;) {
int left = Math.abs(nums[a]);
int right = Math.abs(nums[b]);
if (left == lastLeft) {
a++;
lastLeft = left;
} else if (right == lastRight) {
b--;
lastRight = right;
} else if (lastLeft == lastRight) {
a++;
b--;
lastLeft = left;
lastRight = right;
count++;
} else if (lastLeft > lastRight) {
count++;
a++;
lastLeft = left;
} else {
count++;
b--;
lastRight = right;
}
}
count += (lastLeft == lastRight ? 1 : 2);
return count;
}

private static int countDistinctUsingSet(int[] nums) {
Set<Integer> s = new HashSet<Integer>();
for (int n : nums)
int count = s.size();
return count;
}
``````

prints

For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5

For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4

For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3

For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2

For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7

For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6

On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.

``````public static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

public static void main(String[] args) throws Exception {
Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
if (hash(i) % 32 == 0)
map.put(i, i);
}
System.out.println(map.keySet());
}
``````

prints

[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

The values are in reverse order because the HashMap has generated into a LinkedList.

8
7/25/2013 4:48:22 PM

You should have pay attention to the fact that the array is sorted in ascending order.

Lets assume that there are only positive numbers, or the question was not about absolute distinct.

The you could count the Number by iterating trough the list, and increment the counter by one, if the actual element is different from the last. (and +1 for the first element)

If you understand that, you can add the absolute distinct constraint. For example by improving the algorithm by two pointers, one starting from the begin, one from the end. Then you have also to take care that both pointers works like in parallel, so that both pointers will end at 0 or the absoulte lowest number (positive/negative) - This will complicate the whole stuff a bit, but it is possible.