registers vs stacks
Implemented in hardware, a register-based machine is going to be more efficient simply because there are fewer accesses to the slower RAM. In software, however, even a register based architecture will most likely have the "registers" in RAM. A stack based machine is going to be just as efficient in that case.
In addition a stack-based VM is going to make it a lot easier to write compilers. You don't have to deal with register allocation strategies. You have, essentially, an unlimited number of registers to work with.
Update: I wrote this answer assuming an interpreted VM. It may not hold true for a JIT compiled VM. I ran across this paper which seems to indicate that a JIT compiled VM may be more efficient using a register architecture.
This has already been answered, to a certain level, in the Parrot VM's FAQ and associated documents: A Parrot Overview The relevant text from that doc is this:
the Parrot VM will have a register architecture, rather than a stack architecture. It will also have extremely low-level operations, more similar to Java's than the medium-level ops of Perl and Python and the like.
The reasoning for this decision is primarily that by resembling the underlying hardware to some extent, it's possible to compile down Parrot bytecode to efficient native machine language.
Moreover, many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
You may also want to read this: Registers vs stacks for interpreter design Quoting it a bit:
There is no real doubt, it's easier to generate code for a stack machine. Most freshman compiler students can do that. Generating code for a register machine is a bit tougher, unless you're treating it as a stack machine with an accumulator. (Which is doable, albeit somewhat less than ideal from a performance standpoint) Simplicity of targeting isn't that big a deal, at least not for me, in part because so few people are actually going to directly target it--I mean, come on, how many people do you know who actually try to write a compiler for something anyone would ever care about? The numbers are small. The other issue there is that many of the folks with compiler knowledge already are comfortable targeting register machines, as that's what all hardware CPUs in common use are.
Traditionally, virtual machine implementors have favored stack-based architectures over register-based due to 'simplicity of VM implementation' ease of writing a compiler back-end - most VMs are originally designed to host a single language and code density and executables for stack architecture are invariably smaller than executables for register architectures. The simplicity and code density are a cost of performance.
Studies have shown that a registered-based architecture requires an average of 47% less executed VM instructions than stack-based architecture, and the register code is 25% larger than corresponding stack code but this increase cost of fetching more VM instructions due to larger code size involves only 1.07% extra real machine loads per VM instruction which is negligible. The overall performance of the register-based VM is that it takes, on average, 32.3% less time to execute standard benchmarks.