What is Smali Code Android
I am going to learn a little bit about Dalvik VM, dex and Smali.
I have read about smali, but still cannot clearly understand where its place in chain of compilers. And what its purpose.
Here some questions:
- As I know, dalvik as other Virtual Machines run bytecode, in case of Android it is dex byte code.
- What is smali? Does Android OS or Dalvik Vm work with it directly, or it is just the same dex bytecode but more readable for the human?
- Is it something like dissasembler for Windows (like OllyDbg) program executable consist of different machines code (D3 , 5F for example) and there is appropriate assembly command to each machine code, but Dalvik Vm also is software, so smali is readable representation of bytecodes
- There is new ART enviroment. Is it still use bytecodes or it executes directly native code?
Thank you in advance.
When you create an application code, the apk file contains a .dex file, which contains binary Dalvik bytecode. This is the format that the platform actually understands. However, it's not easy to read or modify binary code, so there are tools out there to convert to and from a human readable representation. The most common human readable format is known as Smali. This is essentially the same as the dissembler you mentioned.
For example, say you have Java code that does something like
int x = 42
Assuming this is the first variable, then the dex code for the method will most likely contain the hexadecimal sequence
13 00 2A 00
If you run baksmali on it, you'd get a text file containing the line
const/16 v0, 42
Which is obviously a lot more readable then the binary code. But the platform doesn't know anything about smali, it's just a tool to make it easier to work with the bytecode.
Dalvik and ART both take .dex files containing dalvik bytecode. It's completely transparent to the application developer, the only difference is what happens behind the scenes when the application is installed and run.
High level language programming include extra tools to make programming easier & save time for the programmer. After compiling the program, if it was to be decompiled, going back to the original source code would need a lot of code analysis, to determine structure & flow of program code, most likely a few more than 1 pass/parse. Then the decompiler would have to structure the source based on the features of the compiler that compiled the code, the version or the compiler, and the operating system it was compiled on eg. if an OS specific features or frameworks or parsers or external libraries were involved, such as .net or dome.dll, and their versions, etc
The next best result would be to output the whole program flow, as if the source code was written in one large file ie. no separate objects, libraries, dependencies, inheritances, classes or api. This is where the decompiler would spit out code which when compiled, would result in errors since there's no access to the source codes & structure of the other files/dependencies. See example here.
The 3rd & best option would be to follow what the operating system is doing based on the programmed instructions, which would be machine code, or dex (in case of Android). Unless you're sitting in the Nebuchadnezzar captained by Morpheus and don't have time to decode every opcode in the instruction set of the architecture your processor is running, you'd want something more readable than unicode characters scrolling on the screen as you monitor the program flow/execution. This is where assembly code makes the difference; it's almost the direct translation of machine code, in a human readable format. I say "almost" direct because microprocessors have helpers like microcodes, multithreaders for pipelining & hardware accelerators to give a better user experience.
If you have the source code, you'd be editing in the language the code is written in. Similarly, if you don't have the source code, and you're editing the compiled app, you'd still be editing in the language the code is written in; in this case, it's machine code, or the next best thing: smali.
Here's a diagram to illustrate "Dalvik VM, dex and Smali" and "its place in chain of compilers".