How to iteratively generate k elements subsets from a set of size n in java?
I'm working on a puzzle that involves analyzing all size k subsets and figuring out which one is optimal. I wrote a solution that works when the number of subsets is small, but it runs out of memory for larger problems. Now I'm trying to translate an iterative function written in python to java so that I can analyze each subset as it's created and get only the value that represents how optimized it is and not the entire set so that I won't run out of memory. Here is what I have so far and it doesn't seem to finish even for very small problems:
public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
int N = set.size();
int maxsets = nCr(N, k);
LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();
int remains, thresh;
LinkedList<Integer> newset;
for (int i=0; i<maxsets; i++)
{
remains = k;
newset = new LinkedList<Integer>();
for (int val=1; val<=N; val++)
{
if (remains==0)
break;
thresh = nCr(N-val, remains-1);
if (i < thresh)
{
newset.add(set.get(val-1));
remains --;
}
else
{
i -= thresh;
}
}
toRet.add(newset);
}
return toRet;
}
Can anybody help me debug this function or suggest another algorithm for iteratively generating size k subsets?
EDIT: I finally got this function working, I had to create a new variable that was the same as i to do the i and thresh comparison because python handles for loop indexes differently.
First, if you intend to do random access on a list, you should pick a list implementation that supports that efficiently. From the javadoc on LinkedList:
All of the operations perform as could be expected for a doubly-linked list. Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.
An ArrayList is both more space efficient and much faster for random access. Actually, since you know the length beforehand, you can even use a plain array.
To algorithms: Let's start simple: How would you generate all subsets of size 1? Probably like this:
for (int i = 0; i < set.length; i++) {
int[] subset = {i};
process(subset);
}
Where process is a method that does something with the set, such as checking whether it is "better" than all subsets processed so far.
Now, how would you extend that to work for subsets of size 2? What is the relationship between subsets of size 2 and subsets of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. Put differently, each subset of size 2 can be generated by taking a subset of size 1 and adding a new element larger than all other elements in the set. In code:
processSubset(int[] set) {
int subset = new int[2];
for (int i = 0; i < set.length; i++) {
subset[0] = set[i];
processLargerSets(set, subset, i);
}
}
void processLargerSets(int[] set, int[] subset, int i) {
for (int j = i + 1; j < set.length; j++) {
subset[1] = set[j];
process(subset);
}
}
For subsets of arbitrary size k, observe that any subset of size k can be turned into a subset of size k-1 by chopping of the largest element. That is, all subsets of size k can be generated by generating all subsets of size k - 1, and for each of these, and each value larger than the largest in the subset, add that value to the set. In code:
static void processSubsets(int[] set, int k) {
int[] subset = new int[k];
processLargerSubsets(set, subset, 0, 0);
}
static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
if (subsetSize == subset.length) {
process(subset);
} else {
for (int j = nextIndex; j < set.length; j++) {
subset[subsetSize] = set[j];
processLargerSubsets(set, subset, subsetSize + 1, j + 1);
}
}
}
Test code:
static void process(int[] subset) {
System.out.println(Arrays.toString(subset));
}
public static void main(String[] args) throws Exception {
int[] set = {1,2,3,4,5};
processSubsets(set, 3);
}
But before you invoke this on huge sets remember that the number of subsets can grow rather quickly.
You can use org.apache.commons.math3.util.Combinations.
Example:
import java.util.Arrays;
import java.util.Iterator;
import org.apache.commons.math3.util.Combinations;
public class tmp {
public static void main(String[] args) {
for (Iterator<int[]> iter = new Combinations(5, 3).iterator(); iter.hasNext();) {
System.out.println(Arrays.toString(iter.next()));
}
}
}
Output: [0, 1, 2] [0, 1, 3] [0, 2, 3] [1, 2, 3] [0, 1, 4] [0, 2, 4] [1, 2, 4] [0, 3, 4] [1, 3, 4] [2, 3, 4]
Here is a combination iterator I wrote recetnly
package psychicpoker;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import static com.google.common.base.Preconditions.checkArgument;
public class CombinationIterator<T> implements Iterator<Collection<T>> {
private int[] indices;
private List<T> elements;
private boolean hasNext = true;
public CombinationIterator(List<T> elements, int k) throws IllegalArgumentException {
checkArgument(k<=elements.size(), "Impossible to select %d elements from hand of size %d", k, elements.size());
this.indices = new int[k];
for(int i=0; i<k; i++)
indices[i] = k-1-i;
this.elements = elements;
}
public boolean hasNext() {
return hasNext;
}
private int inc(int[] indices, int maxIndex, int depth) throws IllegalStateException {
if(depth == indices.length) {
throw new IllegalStateException("The End");
}
if(indices[depth] < maxIndex) {
indices[depth] = indices[depth]+1;
} else {
indices[depth] = inc(indices, maxIndex-1, depth+1)+1;
}
return indices[depth];
}
private boolean inc() {
try {
inc(indices, elements.size() - 1, 0);
return true;
} catch (IllegalStateException e) {
return false;
}
}
public Collection<T> next() {
Collection<T> result = new ArrayList<T>(indices.length);
for(int i=indices.length-1; i>=0; i--) {
result.add(elements.get(indices[i]));
}
hasNext = inc();
return result;
}
public void remove() {
throw new UnsupportedOperationException();
}
}
I've had the same problem today, of generating all k-sized subsets of a n-sized set.
I had a recursive algorithm, written in Haskell, but the problem required that I wrote a new version in Java.
In Java, I thought I'd probably have to use memoization to optimize recursion. Turns out, I found a way to do it iteratively. I was inspired by this image, from Wikipedia, on the article about Combinations.
Method to calculate all k-sized subsets:
public static int[][] combinations(int k, int[] set) {
// binomial(N, K)
int c = (int) binomial(set.length, k);
// where all sets are stored
int[][] res = new int[c][Math.max(0, k)];
// the k indexes (from set) where the red squares are
// see image above
int[] ind = k < 0 ? null : new int[k];
// initialize red squares
for (int i = 0; i < k; ++i) { ind[i] = i; }
// for every combination
for (int i = 0; i < c; ++i) {
// get its elements (red square indexes)
for (int j = 0; j < k; ++j) {
res[i][j] = set[ind[j]];
}
// update red squares, starting by the last
int x = ind.length - 1;
boolean loop;
do {
loop = false;
// move to next
ind[x] = ind[x] + 1;
// if crossing boundaries, move previous
if (ind[x] > set.length - (k - x)) {
--x;
loop = x >= 0;
} else {
// update every following square
for (int x1 = x + 1; x1 < ind.length; ++x1) {
ind[x1] = ind[x1 - 1] + 1;
}
}
} while (loop);
}
return res;
}
Method for the binomial:
(Adapted from Python example, from Wikipedia)
private static long binomial(int n, int k) {
if (k < 0 || k > n) return 0;
if (k > n - k) { // take advantage of symmetry
k = n - k;
}
long c = 1;
for (int i = 1; i < k+1; ++i) {
c = c * (n - (k - i));
c = c / i;
}
return c;
}
Of course, combinations will always have the problem of space, as they likely explode.
In the context of my own problem, the maximum possible is about 2,000,000 subsets. My machine calculated this in 1032 milliseconds.
Inspired by afsantos's answer :-)... I decided to write a C# .NET implementation to generate all subset combinations of a certain size from a full set. It doesn't need to calc the total number of possible subsets; it detects when it's reached the end. Here it is:
public static List<object[]> generateAllSubsetCombinations(object[] fullSet, ulong subsetSize) {
if (fullSet == null) {
throw new ArgumentException("Value cannot be null.", "fullSet");
}
else if (subsetSize < 1) {
throw new ArgumentException("Subset size must be 1 or greater.", "subsetSize");
}
else if ((ulong)fullSet.LongLength < subsetSize) {
throw new ArgumentException("Subset size cannot be greater than the total number of entries in the full set.", "subsetSize");
}
// All possible subsets will be stored here
List<object[]> allSubsets = new List<object[]>();
// Initialize current pick; will always be the leftmost consecutive x where x is subset size
ulong[] currentPick = new ulong[subsetSize];
for (ulong i = 0; i < subsetSize; i++) {
currentPick[i] = i;
}
while (true) {
// Add this subset's values to list of all subsets based on current pick
object[] subset = new object[subsetSize];
for (ulong i = 0; i < subsetSize; i++) {
subset[i] = fullSet[currentPick[i]];
}
allSubsets.Add(subset);
if (currentPick[0] + subsetSize >= (ulong)fullSet.LongLength) {
// Last pick must have been the final 3; end of subset generation
break;
}
// Update current pick for next subset
ulong shiftAfter = (ulong)currentPick.LongLength - 1;
bool loop;
do {
loop = false;
// Move current picker right
currentPick[shiftAfter]++;
// If we've gotten to the end of the full set, move left one picker
if (currentPick[shiftAfter] > (ulong)fullSet.LongLength - (subsetSize - shiftAfter)) {
if (shiftAfter > 0) {
shiftAfter--;
loop = true;
}
}
else {
// Update pickers to be consecutive
for (ulong i = shiftAfter+1; i < (ulong)currentPick.LongLength; i++) {
currentPick[i] = currentPick[i-1] + 1;
}
}
} while (loop);
}
return allSubsets;
}