Question: In an array with duplicated elements, how to find its distinct subsets?
Example, when arr = {1,2,3,4} we have 16 distinct subsets as {}, {1}, {1,2},{1,2,3},{1,2,3,4},{1,2,4},{1,3},{1,3,4},{1,4},{2},{2,3},{2,3,4},{2,4},{3},{3,4},{4},
However, when arr = {1,2,2,4} we only have 12 distinct subsets as {}, {1}, {1,2},{1,4},{1,2,2},{1,2,2,4},{1,2,4},{2},{2,2},{2,2,4},{2,4},{4},
Solution:
1) The most intuitive way is to find out all possible subsets and put them into hashtable, and then display only keys. It is easy to implement, however, takes lot of additional memory, assume that when we have an array with 10000 elements while all elememnt is “1”, in this way one needs to have 2^1000 space to store all possible subsets, however, we both know that there are only 1001 distinct subsets as {}, {1},{1,1},…,{1,1….1}.
2) To solve this issue, we need a better approach. This blog suggests an efficient approach that decompose the problem into either generate the subsets from all distinct elements, or generate subsets from all duplicate elements.
The algorithm goes as following:
1) For a given array Arr, find out its elements which only appear once, put them into Arr_distinct. For elements which appear multiple times, put them into hashtable where the key is this element, and the value is the times it appears.
2) Find out all subsets from Arr_distinct, denoted as SubSets
3) For the elements which appear multiple times, create their subsets SubSets_for_num. For example, if “2” appear 3 time, SubSets_for_2 ={2}, {2,2} and {2,2,2}
4) For each element subset in the SubSets, create new datasets which is combined by subset and all the SubSets_for_num created in step 3) and one element at a time. That is, supposse in step 3) we have numbers “2” and “3”, both appear 3 times, then in this iteration all ELEs are only concatenated with SubSets_for_2, the SubSets_for_3 will be added in next iteration. After that, the new geneated dataset are included into SubSets to make it larger.
5) Repeat step 4) untill all the duplicated numbers are merged.
Examples: Arr = {1,2,2,3,3,4,5}
Arr_distinct = {1,4,5}, we have two groups of subsets for duplicated elements, those are: {2},{2,2} and {3}, {3,3}
First of all, we find out all possible subsets (including empty subset) from Arr_distinct. It has 8 subsets as Subset = {{},{1},{1,4},{1,4,5},{1,5},{4},{4,5},{5}}.
public static List<List> CombinationWithUniqueArray(int[] arr,
int index, List list, List<List> solution)
{
solution.Add(list.ToList());
for (int i = index; i < arr.Length; i++)
{
list.Add(arr[i]);
CombinationWithUniqueArray(arr, i + 1, list, solution);
list.RemoveAt(list.Count - 1);
}
return solution;
}
Next, add {2} and {2,2} to all the elements in Subset, now we have the following 16 new subset
newCreatedSubset = {
{2},{1,2},{1,4,2},{1,4,5,2},{1,5,2},{4,2},{4,5,2},{5,2} //add {2}
{2,2},{1,2,2},{1,4,2,2},{1,4,5,2,2},{1,5,2,2},{4,2,2},{4,5,2,2},{5,2,2} // add {2,2}}
After we put the newCreatedSubset back into Subset, the size of Subset becomes 24 as:
Subset = {{},{1},{1,4},{1,4,5},{1,5},{4},{4,5},{5},
{2},{1,2},{1,4,2},{1,4,5,2},{1,5,2},{4,2},{4,5,2},{5,2},
{2,2},{1,2,2},{1,4,2,2},{1,4,5,2,2},{1,5,2,2},{4,2,2},{4,5,2,2},{5,2,2}}
Now, we add {3} and {3,3} to each element in Subset to get 48 new subsets:
newCreatedSubset = {{3},{1,3},{1,4,3},{1,4,5,3},{1,5,3},{4,3},{4,5,3},{5,3},
{2,3},{1,2,3},{1,4,2,3},{1,4,5,2,3},{1,5,2,3},{4,2,3},{4,5,2,3},{5,2,3},
{2,2,3},{1,2,2,3},{1,4,2,2,3},{1,4,5,2,2,3},{1,5,2,2,3},{4,2,2,3},{4,5,2,2,3},{5,2,2,3} // add{3}
{3,3},{1,3,3},{1,4,3,3},{1,4,5,3,3},{1,5,3,3},{4,3,3},{4,5,3,3},{5,3,3},
{2,3,3},{1,2,3,3},{1,4,2,3,3},{1,4,5,2,3,3},{1,5,2,3,3},{4,2,3,3},{4,5,2,3,3},{5,2,3,3},
{2,2,3,3},{1,2,2,3,3},{1,4,2,2,3,3},{1,4,5,2,2,3,3},{1,5,2,2,3,3},{4,2,2,3,3},{4,5,2,2,3,3},{5,2,2,3,3} //add {3,3}}
After we merge the Subset and the NewCreatedSubset, we have 72 distinct subset in total.
public static List<List> DistinctCombination(int[] arr)
{
var dict = new Dictionary<int, int>();
ListuniqueList = new List();
foreach(int num in arr)
{
if (dict.ContainsKey(num))
dict[num]++;
else dict.Add(num, 1);
}
foreach(int num in arr)
{
if(dict[num]==1)
uniqueList.Add(num);
}
var solution = CombinationWithUniqueArray(uniqueList.ToArray(), 0, new List(),
newList<List>());
foreach (int num in dict.Keys)
{
if (dict[num] == 1)
continue;
List<List> appendSolution = new List<List>(),
duplicateElement = new List<List>();
List temp = new List();
for (int i = 0; i < dict[num]; i++)
{
temp.Add(num);
duplicateElement.Add(temp.ToList());
}
foreach (var iter in solution)
{
foreach (var dup in duplicateElement)
{
List newIter = iter.ToList();
newIter.AddRange(dup);
appendSolution.Add(newIter);
}
}
solution.AddRange(appendSolution);
}
return solution;
}