Just over 24 hours before I leave London for Kathmandu! I’ll be joining The Mountain Institute‘s expedition to Imja Lake in the Khumbu. During the expedition I’ll be assisting with technical operations, getting updates out via satellite. You can follow our progress on the expedition blog.
In parallel simulations it’s invaluable to be able to be able to provide independent pseudo-random streams to each thread to guarantee reproducible results. We might try to accomplish this by instantiating generators with pseudo-random states for each thread, but this offers no guarantees about the distribution of states. Given enough threads, it quickly becomes likely that two generators are insufficiently separated across the state space. Another strategy is to provide generators with different parameters to each thread. While feasible for a small number of threads, this is not a sustainable solution for all generators.
Tina’s Random Number Generator Library (TRNG) offers two techniques for constructing parallel pseudo-random number generators. Given threads, each which will consume
pseudo-random values, it is sufficient to construct generators whose initial states are at least
steps apart. Blocksplitting works by instantiating
generators to some initial state and then advancing each by
steps for
. As long as you don’t underestimate the requisite number of pseudo-random values required, the streams will be disjoint. TRNG also utilizes Leapfrogging, a method that decomposes the initial random stream into substreams. Given
streams instantiated from consecutive states from some generator, if the state of each generator is advanced by
steps per invocation, the resulting streams are disjoint. Unfortunately this can bring statistical defects in a generator to the surface.
Linear Congruential Generators (LCG) are a simple and venerable class of PRNG of the form:
. They have some fairly abhorrent properties which were first brought to light in Marsaglia‘s 1968 paper, Random Numbers Fall Mainly in the Planes [pdf]. Nevertheless, they’re simple, fast, and widely used1. Pierre L’Ecuyer‘s 1988 paper Efficient and portable combined random number generators discusses some properties of Multiplicative Linear Congruential Generators (MLCG), a simplified variant of LCGs, with the parameter
. Usefully we can cheaply and arbitrarily advance the state of a MLCG.
Given some state , we can find the next state given the formula
. To advance it by
steps, simply compute
. For large values of
, it is impractical to do naively. Instead we precompute
for all
and advance the generator using these steps. Thus, to advance a generator for
steps we compute:
thus requiring only three additional multiplications (or approximately of the work).
The same method can be used to arbitrarily step Xorshift generators. The Xorshift transition function is just the state vector multiplied with a transition matrix in :
. Thus, to advance the state by an arbitrary step
, we find
.
is computed in the same manner that we computed
above.
I strongly recommend against using either Xorshift generators or MLCGs for any serious simulation purpose. In addition to its underlying weaknesses, a MLCG run in Leapfrog mode will yield a predictable low-order bits, while Xorshift generators fail some rudimentary statistical tests. Still, they have been successfully incorporated into more robust generators, and this ‘jumping’ property may be useful in the construction of better parallel PRNGs.
1. For example java.util.Random implements a 48-bit LCG, truncating the lower 16-bits to produce a 32-bit output. Please don’t use this for any simulation that matters.
I’ve just released libRNG v0.1.0. The library is a public domain collection of pseudo-random number generators for C++. It provides several efficient and statistically robust generators. Several of the generators, such as KISS99, pass the TestU01 [pdf] BigCrush battery and should be suitable for most non-cryptographic simulation purposes. Future releases should see sampling from additional distributions, as well as more generators. The library is largely adapted from the public domain C sources in David Jones‘s manuscript Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications [pdf]. In turn, Jones’s work draws on the prolific work of the late George Marsaglia.
Each class in the library is a subclasses of RNG, overriding either get_uint32() or get_uint64(). I examined buffering RNG output (i.e. asking for a bool consumes one bit of a 32-bit output, rather than truncating the output), but the performance hit, coupled with the extra state seems to make this approach fruitless.
The interface is inspired by Haskell’s System.Random, particularly in the use of split function. Given a generator, split() attempts to return two independent generators. This function is useful when seeding multiple threads with sources of randomness. This function is impractical for RNGs with unnecessarily large state, such as Mersenne Twister, but works splendidly for small generators. split() does not offer any rigorous guarantee of independence. I have a few nascent ideas on offering more robust behavior, but need to do a bit more reading before I can make any promises.
libRNG is used as a source of pseudo-random numbers in Dungeon Crawl Stone Soup v0.8! Crawl’s generator combines an alternating-step generator with KISS. The linear congruential generator is advanced at each step, while a 32-bit Galois linear feedback shift register alternatively advances either the multiply-with-carry generator or the Xorshift generator. The result is marginally faster than KISS, passes BigCrush, and should thwart the sort of cryptanalysis that a player could launch from within the game.
The library is released into the public domain, via Unlicense (or if you prefer, CC0). Contact me if you really need an explicit license for some unforeseen reason (see SQLite’s treatment of this). If you use libRNG, find useful modifications or enhancements, kindly contact me. Bugs should be reported on the project’s issues page.
Floating point numbers are great, I simply can’t join the chorus that maligns them. Nevertheless you need to understand your use case when you’re dealing with them. Floats are ideal if you’d like to distinguish between the mass of a boy and a battleship. Converting between Pounds Sterling and Yen where every penny matters? Bad choice. Working with probabilities? Forget it! If you’ve ever implemented a Hidden Markov Model, you know what I’m talking about.
Many of these problems can be alleviated by log-transforming your domain. Multiplication maps to addition, division to subtraction — huzzah! Precision loss is still an issue, and round-off errors will accumulate as the difference between consecutive values exceeds your multiplicative factor. Nevertheless, in most instances a floating point representation would have failed far earlier than the log transformed analogue.
I’ve prepared a C++ datatype implementing common operations on log transformed floats. I chose to explicitly maintain sign. I probably could have done this via bit-packing the underlying type, but this just seems like a bad idea. If the overhead of lugging around a sign is burdensome (or if you don’t want to risk negative probabilities), it can be easily removed.
There is no cast operator to convert a LogReal
Adding and subtracting LogReal
For many values it is the case that: exp(log(x)) != x. Sorry — there’s nothing I can do about the pigeonhole principle.
Fork the code on GitHub! The code is unlicensed. If you find it useful or want to enchance it, e-mail me or send a pull request.
[1] Know a good way to get C-style explicit casts while avoiding kludgey C++ casts and implicit casting? E-mail me.
Augmenting Viterbi’s algorithm with an A* search is a simple technique that can significantly speed parsing. Stu Geman introduced this method to me when I was working on segmentation of tumor array copy-number data.
The trellis of a Viterbi parse is a directed acyclic graph. Edges are labeled with the log emission and transition probabilities. The maximum-likelihood parse corresponds to the shortest path from the start state to the accepting state. Viterbi’s algorithm finds this path by smashing its way through the trellis, traversing every available edge. In the absence of a sharply peaked distribution, this works fine — you pay a fixed cost for every parse.
If you know something about the distribution of your path lengths, you can do a bit better. Dijkstra’s algorithm is a sensible place to start, traversing the shortest edges rather than exploring the entire trellis. However, this approach can fail spectacularly. Since it does not account for progress through the trellis, a low valued edge will be expanded without regard to its distance from the sink node. In this case, the log-factor introduced by maintaining the search’s priority queue becomes a significant factor in the overall run time.
Fortunately, Dijkstra’s algorithm is just a special case of A* Search (one with a zero-valued heuristic). If your problem is amenable to an efficient admissible heuristic, you can reap significant gains by using A* instead of an exhaustive Viterbi search.
Algorithmic shuffling is a surprisingly important subject, with applications to scientific simulations, gambling, and cryptography. Getting a shuffle right is a delicate business. You need an unbiased random number generator (pseudo-random or otherwise depending on your application), and a correct shuffling algorithm. Here I’d like to introduce a pair of naïve algorithms with subtle failures.
One naïve algorithm works by assigning a pseudo-random 32-bit number (float, int) to each element in the array. The array is then sorted by this assigned value. Runtime is the first wall we run up against. For an array of n elements, assigning floats to each element takes time
, but how long does sorting take? It may be tempting to use radix sort, which takes time
. Since we’ve used 32-bit numbers as our key,
. However, k almost surely exceeds the
factor we would have if we used merge sort or quicksort. We therefore arrive at a complexity of
. As if linearithmic time wasn’t bad enough, there’s another caveat: it’s very difficult to sensibly avoid key collisions. How many 32-bit values can we pick before the probability that two are the same exceeds 0.50?
Here we have a straightforward case of the Birthday Problem. If the number of pairs exceeds
(half of our key space) the probability of a key collision exceeds 0.50. Assuming the number of pairs is
we find that
. If we relax the collision probability to 0.05, we find
. Ack! If you’re performing multiple shuffles, for example in Monte Carlo trials, some bias is inevitable.
A second straightforward algorithm to describe is as follows: walk the array, randomly swapping each element with any element in the array. Unlike the previous approach, this algorithm is linear time. It fails for combinatoric reasons.
There are n! permutations of a collection with n elements. At each step of the swapping shuffle, we can exchange an element with any one of the n other elements. There are therefore
equally likely sequences of swaps. Let’s construct a mapping from sequences to permutations. If this algorithm was unbiased, each permutation could be generated by exactly k swaps, where k is equal to the number of swaps divided by the number of permutations. (Note that the existence of some k does not imply an absence of bias.)
is not an integer we conclude that shuffling by swaps does not yield an unbiased shuffle.
How much entropy do these algorithms require? The first requires 32-bits per element (the cost of generating a float.) To shuffle an n element array, the second needs
bits per element. This is equivalent to generating a sequence of n integers on the interval [0,n) using rejection sampling.
If you’re still looking for an unbiased, linear time shuffle check out the a
Suppose you want to simulate the number of times an unbiased coin comes up heads before tails (i.e. sample from the Geometric Distribution where
).[1] Here’s a method that works if you can generated unbiased longs. It’s limited by the size of longs on your machine, but you can easily iterate it if this is an issue.



Edit: A better way to do this is with a builtin that counts the number of consecutive zeroes from the LSB.
We start with instantiating x with a random bitfield. Adding 1 will trigger a carry if the least significant bit (LSB) is 1. Any run of ones starting at the LSB will be twiddled to zero. XORing this with the original value of x zeroes out all bits apart from the initial low-order run, while setting the carry position to one. Subtracting one offsets the bit that was set by the carry. The population count gives the length of an initial run of ones.
Why does this work? The probability that any bit is set is 0.5. The probability that the first two bits are set is 0.25 and so on.
Suppose you wanted to simulate a biased coin that came up heads with probability 0.25 (or any probability of the form
)[2]. Simply change the instantiation of x:

[1] When would you want to do this?
- To Simulate the game presented by the St. Petersburg paradox
- To determine the level of a new Skip list node.
- To impressing your friends at parties.
[2] Daniel Klein, who sorely needs to update his website, pointed out that this process can be extended to arbitrary rational probabilities with a ladder of ANDs and ORs during the instantiation of x. This approach would see diminishing performance returns.
Sortition is the process of making selections by lottery. Juries are chosen this way in the interest of taking a representative sample from a population. Some perks of sortition include equity and resistance to corruption. To realize these benefits, the random selection must be unbiased and subject to public verification. In RFC 3797 the IETF lays out the method that it uses to select the members of its Nominating Committee.
Before a drawing the sortition operator publishes an ordered list of candidates and a trusted third party random source. The RFC suggests aggregating the low order digits of trading volumes, or the results from government lotteries. The output from the random source is canonicalized and used to generate pseudo-random 128-bit integers as follows:


Their use of MD5 is reasonable: everyone has a copy and it proves that “there is nothing up my sleeve.” Then things stray from the ordinary:
[T]otally order the pool of listed volunteers as follows: If there are P volunteers, select the first by dividing the first derived random value by P and using the remainder plus one as the position of the selectee in the published list. Select the second by dividing the second derived random value by P-1 and using the remainder plus one as the position in the list with the first selected person eliminated. Etc. [RFC 3797 §4]
This produces an unbiased selection amongst a collection of
elements and a pseudo-random integer
where
. In other cases it introduces some bias in favor of those elements with a low index. The first
elements will occur with probability
, while the remaining elements occur with probability
. For very large values of
, in this case
, the difference in these probabilities is minuscule and will never be of practical concern. For a given value
, it’s unlikely that there are distinct 32-bit floating point numbers that can represent these quantities.
Nevertheless, for smaller values of
(e.g.
, the nearest power of 2 greater than n), the bias introduced by this method is significant and is compounded as a shuffle is executed. I used this method to shuffle the array [1..128] and calculated the mean value at each index over
trials. I normalized the values by subtracting the means generated by an unbiased shuffle and then normalized the data (
.) The method described above is biased so as to be sensitive to the original ordering of a set of elements. This should be enough reason to prefer rejection sampling instead.

Biased/unbiased shuffle comparison
Just how much randomness can you expect rejection sampling to use? In the worst case, generating an integer on the interval
, we should reject half of the values we generate, giving an expectation of
bits.
