Reading Java methods in bytecode

on waitingforcode.com

Reading Java methods in bytecode

Last time we discovered the very basic elements of Java's bytecode, as methods or fields representations. This time we will treat more advanced subject, methods in general.

This article will start by real-life example where all mandatory information will be stored as comments in bytecode output. After that we'll write a summary list with all viewed bytecode instructions.

Example of bytecode methods

Before we start to talk about bytecode, let's take a look on our sample class:

/**
 * Bytecode sample class which helps to illustrate following concepts in bytecode:
 * - assigning value from method signature to local class field
 * - looping with foreach
 * - representing arrays
 * - invoking static methods
 * - constants pool
 * - representation of overriden methods
 * - invoking external methods
 *
 */
public class BytecodeMethodSample {

  private String userName;

  public void externalInvoker(String[] names) {
    for (String name : names) {
      this.invokedNamer(name);
    }
  }

  private void invokedNamer(String name) {
    System.out.println("Name is: "+name);
    this.userName = name;
    this.userName = this.userName.substring(1);
  }

  @Override
  public String toString() {
    return "Overriden String value";
  }

}

Now, take a look on output generated with javap -c -p BytecodeMethodSample.class:

public class com.waitingforcode.com.waitingforcode.bytecode.BytecodeMethodSample {
  private java.lang.String userName;

  ## This is class empty constructor. The most significant line is the 1st one (1:) 
  ## where we can observe the call of parent constructor. And because our class 
  ## extends only java.lang.Object, it's the constructor of this class which is invoked.

  public com.waitingforcode.com.waitingforcode.bytecode.BytecodeMethodSample();
    Code:
       0: aload_0       
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return        

  ## This method takes in parameter array of String. The first aload and astore instructions
  ## are responsible for loading and storing variables. The third line, arraylength, is proper
  ## to arrays and it gets the length of array. This value is after stored as integer (istore). 
  ## 
  ## iconst_0 loads the integer constant with the value 0 to stack. It'll be used as a index
  ## counter in foreach loop. Next, it's stored in 3rd variable and loaded. In this way, we 
  ## meet if_icmpge. It's conditional one which decides if given (iteration in our case) operation
  ## can be executed. In our case, this instruction compares the array length (istore_3/iload_3)
  ## with 0 (istore/iload). If it's true, it executes all instructions to the line specified
  ## after if_cimpge. In this case, it'll execute all lines until 32nd.
  ## 
  ## Next instructions (aaload, astore, aload_0, aload) represent the first line of foreach
  ## loop where we set currently iterated element to object called name. 
  ## 
  ## After the number 23 we can see an instruction called invokespecial. This instruction is
  ## present every time when we try to execute one method. The instruction is followed by the
  ## index of called method in constant pool (2 in our case). We can see it by generating the
  ## output with -v parameter (verbose). In this case, following entry will appear in the
  ## screen: 
  ## #2 = Methodref          #13.#42        
  ## //  com/waitingforcode/com/waitingforcode/bytecode/BytecodeMethodSample.invokedNamer:
  ## (Ljava/lang/String;)V
  ## 
  ## iinc instruction is responsible for incrementing local variable by the value specified 
  ## after coma (1 in our case).
  ## 
  ## The last executed instruction before return is goto. As its name indicates, it redirects
  ## the execution to another part of bytecode, represented by the number of bytecode
  ## instruction. In our case, it returns to the 8th instruction (iload) and continues the
  ## interation.

  public void externalInvoker(java.lang.String[]);
    Code:
       0: aload_1       
       1: astore_2      
       2: aload_2       
       3: arraylength   
       4: istore_3      
       5: iconst_0      
       6: istore        4
       8: iload         4
      10: iload_3       
      11: if_icmpge     32
      14: aload_2       
      15: iload         4
      17: aaload        
      18: astore        5
      20: aload_0       
      21: aload         5
      23: invokespecial #2                  // Method invokedNamer:(Ljava/lang/String;)V
      26: iinc          4, 1
      29: goto          8
      32: return        

  ## Here we ca, see our private invokedNamer method. As you can see, the first call concers
  ## System.out static method, println. From 0 to 22 instructions we can observe what happens
  ## if we try to print some String. getStatic instruction is used to get static field of 
  ## given class. In our case we System.out which returns new StringBuilder (3: new instruction).
  ## After ldc instruction pushes hard-coded String "String Name is:" onto stack from constant
  ## pool.
  ## invokevirtual is another invocation method. Unlike invokespecial, it calls non-private 
  ## instance methods. invokespecial is destined to private instance methods, instance-
  ## initialization methods and overriden methods of a superclass. The #${number} after 
  ## invocation instruction reffers to methods in constant pool. It's valable as well for 
  ## invoke* as for getstatic.
  ## If we take the analyze from 25th line, you can observe the loading of the first method
  ## parameter (aload_0) and assigning it to local field called userName. 
  ## From lines 30-32 you can see that it's still object from method signature which is loaded. 
  ## After that we get 10th field from constant pool (32: getfield #10) which corresponds to our
  ## userName field. iconst_1 means that we load an integer constant with value "1" from stack.
  ## After calling String.substring method, we override the value of userName field with, once
  ## again, the call of "putfield #10".

  private void invokedNamer(java.lang.String);
    Code:
       0: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: new           #4                  // class java/lang/StringBuilder
       6: dup           
       7: invokespecial #5                  // Method java/lang/StringBuilder."<init>":()V
      10: ldc           #6                  // String Name is: 
      12: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      15: aload_1       
      16: invokevirtual #7                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      19: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      22: invokevirtual #9                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      25: aload_0       
      26: aload_1       
      27: putfield      #10                 // Field userName:Ljava/lang/String;
      30: aload_0       
      31: aload_0       
      32: getfield      #10                 // Field userName:Ljava/lang/String;
      35: iconst_1      
      36: invokevirtual #11                 // Method java/lang/String.substring:(I)Ljava/lang/String;
      39: putfield      #10                 // Field userName:Ljava/lang/String;
      42: return        

  ## As you can see, overriden methods are the same as normal methods. They're no
  ## @Overriden annotation traces. If you're looking here, at the begin the ldc
  ## instruction is invoked. It's in charge of pushing single-word constant onto
  ## stack ("Overriden String value" of type String in our case). After that,
  ## areturn instruction returns previously pushed String.

  public java.lang.String toString();
    Code:
       0: ldc           #12                 // String Overriden String value
       2: areturn       
}

Bytecode method instructions

Below list shows some of bytecode's methods:

  • iconst_0 (_1 until _5): loads given integer onto stack
  • istore_0 (_0 - _3): stores given integer in variable indexed with _${number}.
  • iload_0 (_0 - _3): loads integer from local variable (_0 means the first local variable, _1 the second etc.)
  • if_icmpge: checks if condition is true. If it's the case, it executes all instructions until the line specified at the end of line. For example:
            11: if_icmpge     32
            
    This instruction will execute all instructions from 12 to 32 if the if clause returns true.
  • aaload: loads a reference from an array to stack
  • arraylength:calculates the length of array
  • aload_0 (_0 - _3):loads the object reference from local variable
  • astore:stores object referene into local variable
  • iinc: increment local variable with the second value specified after the signature. Following instruction will increment the 2nd local variable by 1:
    10: iinc 2, 1
    
  • invokevirtual: helps to call another methods. It calls non-private methods of present class. It's followed by the index meaning the methods place in constant pool. You can see the use of this instruction on String.substring fragment in our previous Java code:
    # bytecode representation : 36: invokevirtual #11                 // Method java/lang/String.substring:(I)Ljava/lang/String;
    this.userName.substring(1);
    
    As you can see, it invokes substring() method of given String instance through invokevirtual.
  • invokespecial: also helps to call another methods, but it's destined to private methods of present class, constructors and overriden methods. It's followed by the index meaning the methods place in constant pool. To illustrate that, take a look on given bytecode:
    ## public BytecodeMethodSample()
    1: invokespecial #1                  // Method java/lang/Object."<init>":()V
    ## invocation of this.invokedNamer(name) for private void invokedNamer(String name)
    23: invokespecial #2                  // Method invokedNamer:(Ljava/lang/String;)V
    
    As you can see, the first line referrers to constructor from java.lang.Object. The second one points to the local private method called invokedNamer(String name).
  • goto: allows to jump to another instruction. The number of instruction is defined after the name, as follow:
    # if 10th instruction is reached, goes again to the 3rd instruction
    10: goto 3
    
  • getstatic: gets the value of class static field
  • dup: duplicates the entry on the top of the stack. This instruction is invoked frequently when new object is created. In this case, used instructions can be:
    # Creates new object and leave the reference to it on the stack
    1: new com/waitingforcode/MyClass
    # Ducpliates the reference
    2: dup
    # Initialize object through its constructor
    3: invokespecial #1                  // Method java/lang/Object."<init>":()V
    # Store object in local variable, for example myClass in MyClass myClass = new MyClass(); code
    4: astore_1
    

As you can see in given list, some instructions are followed by underscore (_) and one number. In fact, these numbers mean the index of given element. For example: istore_0 means the storage of one integer value in the variable indexed with 0. And this rule can be applied for all instructions suffixed with pattern _${number}. If one method can be written with or without this suffix, and it's written without, the index is defined after the name of instruction.

Unlike the first article, this one was more concrete. We saw different ways to deal with methods invocation. We also discovered how bytecode translates code containing arrays or static fields.

Share on: