What's the method representation in memory?
While thinking a little bit about programming in Java/C# I wondered about how methods which belong to objects are represented in memory and how this fact does concern multi threading.
- Is a method instantiated for each object in memory seperately or do all objects of the same type share one instance of the method?
- If the latter, how does the executing thread know which object's attributes to use?
- Is it possible to modify the code of a method in C# with reflection for one, and only one object of many objects of the same type?
- Is a static method which does not use class attributes always thread safe?
I tried to make up my mind about these questions, but I'm very unsure about their answers.
Solution 1:
Each method in your source code (in Java, C#, C++, Pascal, I think every OO and procedural language...) has only one copy in binaries and in memory.
Multiple instances of one object have separate fields but all share the same method code. Technically there is a procedure that takes a hidden this
parameter to provide an illusion of executing a method on an object. In reality you are calling a procedure and passing structure (a bag of fields) to it along with other parameters. Here is a simple Java object and more-or-less equivalent pseudo-C code:
class Foo {
private int x;
int mulBy(int y) {
return x * y
}
}
Foo foo = new Foo()
foo.mulBy(3)
is translated to this pseude-C code (the encapsulation is forced by the compiler and runtime/VM):
struct Foo {
int x = 0;
}
int Foo_mulBy(Foo *this, int y) {
return this->x * y;
}
Foo* foo = new Foo();
Foo_mulBy(foo, 3)
You have to draw a difference between code and local variables and parameters it operates on (the data). Data is stored on call stack, local to each thread. Code can be executed by multiple threads, each thread has its own copy of instruction pointer (place in the method it currently executes). Also because this
is a parameter, it is thread-local, so each thread can operate on a different object concurrently, even though it runs the same code.
That being said you cannot modify a method of only one instance because the method code is shared among all instances.
Solution 2:
The Java specifications don't dictate how to do memory layout, and different implementations can do whatever they like, providing it meets the spec where it matters.
Having said that, the mainstream Oracle JVM (HotSpot) works off of things called oops - Ordinary Object Pointers. These consist of two words of header followed by the data which comprises the instance member fields (stored inline for primitive types, and as pointers for reference member fields).
One of the two header words - the class word - is a pointer to a klassOop. This is a special type of oop which holds pointers to the instance methods of the class (basically, the Java equivalent of a C++ vtable). The klassOop is kind-of a VM-level representation of the Class object corresponding to the Java type.
If you're curious about the low-level detail, you can find out a lot more by looking in the OpenJDK source for the definition of some of the oop types (klassOop is a good place to start).
tl;dr Java holds one blob of code for each method of each type. The blobs of code are shared among each instance of the type, and hidden this pointers are used to know which instance's members to use.
Solution 3:
I am going to try to answer this in the context of C#.There are basically 3 different types of Methods
- virtual
- non-virtual
- static
When your code is executed, you basically have two kinds of objects that are formed on the heap.
- The object corresponding to the type of the object. This is called Type Object. This holds the type object pointer, the sync block index, the static fields and the method table.
- The object corresponding to the object itself, which contains all the non static fields.
In response to your questions,
- Is a method instantiated for each object in memory seperately or do all objects of the same type share one instance of the method?
This is a wrong way of understanding objects. All methods are per type only. Look at it this way. A method is just a set of instructions. The first time you call a particular method, the IL code is JITed into native instructions and saved in memory. The next time this is called, the address is picked up from the method table and the same instructions are executed again.
2.If the latter, how does the executing thread know which object's attributes to use? Each static method call on a Type results in looking up the method table from the corresponding Type Object and finding the address of the JITed instruction. In case of methods that are not static, the the relevant object on which the method is called is maintained on the thread's local stack. Basically, you get the nearest object on the stack. That is always the object on which we want the method to be called.
3.Is it possible to modify the code of a method in C# with reflection for one, and only one object of many objects of the same type? No, It is not possible now. (And I am thankful for that). The reason is that reflection only allows code inspection. If you figure out what some method actually means, there is no way you are going to be able to change the code in the same assembly.