Wednesday, April 20, 2011

How can I generate a file of random negative and positive integers in serial?

I want a file of randomly generated positive or negative serial integers. For now, I ask the file contain roughly (no guarantee required) equal numbers of negative and positive, but make it easy to change the proporotions later. By "serial", I mean the kth random negative is equal to -k, and the kth random positive is equal to +k.

This GNU Bash script one-liner would satisfy the file format, but just wouldn't be random.

$ seq -1 -1 -5 && seq 1 5
-1
-2
-3
-4
-5
1
2
3
4
5

This example shows what I'm looking for even better, but is still not random.

$ paste <(seq -1 -1 -5) <(seq 1 5) | tr '\t' '\n'
-1
1
-2
2
-3
3
-4
4
-5
5

Sending one of these through the shuf command makes them lose their serial-ness.

$ paste <(seq -1 -1 -5) <(seq 1 5) | tr '\t' '\n' | shuf-5
4
3
2
-2
1
-1
-4
5
-3

Note that I'm trying to test algorithms that sort lists/arrays of bits (zeros and ones), but if I use 0s and 1s I won't be able to analyse the sort's behaviour or tell if stability was preserved.

From stackoverflow
  • There is no set of numbers that will fit all your criterion. You can't say you want random but at the same time say that the kth negative value == -k and the kth positive value == k. You can either have it random, or not.

    As to what you're trying to do, why not separate the two concerns and test the sort on something like an array of pairs of integers length n. The first of the pair can be zero or 1 and the second of the pair will be your stability tracker (just a count from 0 to n).

    Generate the list of 0's and 1's that you want and shuffle them then add on the tracker integer. Now sort the pairs by their first element.

    The input to your sort will look something like this.

    0, 1
    1, 2
    0, 3
    1, 4
    1, 5
    0, 6
    0, 7
    1, 8
    1, 9
    1, 10
    0, 11
    0, 12
    0, 13
    

    Stable sorts will produce this

    0, 1
    0, 3
    0, 6
    0, 7
    0, 11
    0, 12
    0, 13
    1, 2
    1, 4
    1, 5
    1, 8
    1, 9
    1, 10
    

    Unstable ones will produce the 0's and 1's with the tracker integers out of order.

    ashawley : Perhaps I was unclear, but I think you didn't read it properly. Whether a number is positive or negative *is* random, it's integral value is not.
    ashawley : Concerning your idea of adding serial values for each binary "key": Yes, this is what I meant. However I prefer my hack using positive and negatives as it packs the information together for the algorithm -- requiring a change on "what is binary", but with less to keep track of for my purposes.
  • If I understand correctly, you want to interleave the positive integers and the negative integers randomly. For example: 1 2 -1 3 -2 4 5- 3.

    my $count = 10;
    my $pos   =  1;
    my $neg   = -1;
    
    my @random = map { 
        int(rand 2) 
        ? $pos++ 
        : $neg--
    } 1..$count; 
    
    print "@random\n";
    


    Update:

    To change proportions I'd do this:

    use strict;
    use warnings;
    
    my $next = get_list_generator(.5);
    
    my @random = map $next->(), 1..10; 
    print "@random\n";
    
    my $again = get_list_generator(.25);
    
    my @another = map $again->(), 1..10; 
    print "@another\n";
    
    sub get_list_generator {
        my $prob_positive = shift;
    
        my $pos = 1;
        my $neg = -1;
    
        return sub {
            return rand() <= $prob_positive ? scalar $pos++ : scalar $neg--;
        }
    
    }
    

    The get_list_generator() function returns a closure. This way you can even have multiple list generators going at once.

  • Let's start golf contest? (44)

    perl -le'print rand>.5?++$a:--$b for 1..10'
    

    Edit: daotoad's 40 chars version

    seq 1 10|perl -ple'$_=rand>.5?++$a:--$b'
    
    ashawley : Yes, more succinct is better. Looks like it would be easier to change the proportions.
    daotoad : If its golf you want try: seq 1 10|perl -pe '$_=rand>.5?++$a:--$b' (40 chars)
    ashawley : never seen -l option before. not sure how i missed it.
  • Where 15 is the total amount of numbers generated and tp is the amount of positive numbers you want (effectively indicating the ratio of pos/neg):

    tp=8
    unset p n
    for i in $(printf '%s\n' {1..15} | gsort -R); do
        (( i <= tp )) && \
            echo $((++p)) || \
            echo $((--n))
    done
    
    ashawley : +1 for a Bash soluttion!
    lhunath : (and way shorter than the perl one; that must be a history milestone)
  • #!/bin/bash
    
    pos=0 neg=0
    for i in {1..10}
    do 
        if (( ($RANDOM > 16384 ? ++pos : --neg) > 0 ))
        then echo $pos
        else echo $neg
        fi
    done
    

    I could not quite fit this into a one-liner. Anyone else?

    edit: Ah, a one liner, 65 characters (need to set a and b if you're repeatedly invoking this in the same shell):

    a=0 b=0;for i in {1..10}; do echo $(($RANDOM>16384?++a:--b));done
    
    ashawley : Nitpick: You can say "#!/bin/bash" instead of "In Bash: #!/bin/sh".
    ashawley : $RANDOM > 16384 is hideous but I get it. lhunath answer also struggles with Bash's lack of floating point arithmetic. that's why it's called hacking.
    Brian Campbell : @ashawley Thanks, bash is correct, I don't think think that POSIX sh has $RANDOM. And yes, this is an ugly hack; I was originally trying to fit it into one line, so its not exactly pretty.
  • Here's a Bash one-liner (2?) inspired by lhunath and Brian's answers.

    RANDOM=$$; pos=1; neg=-1; for i in {1..10}; do \
    echo $(( $(echo $RANDOM / 32767 \> 0.5 | bc -l) ? pos++ : neg-- )); done
    

    Here's an Awk script that competes in the golf contest (44).

    seq 1 10|awk '{print(rand()>0.5?++p:--n);}'
    

    This is the clearer idiomatic way to write it:

    seq 1 10 | awk 'BEGIN{srand(); pos=1; neg=-1;}
                    {print (rand() > 0.5 ? pos++ : neg--);}'
    

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.