Introduction to Java's bytecode reading

When you're learning Java concurrency, you surely hear very often about barriers. These barriers are expressed explicitly in bytecode, a intermediary form between code wrote by human and interpreted by machine. In this article we'll begin to learn the interpretation of bytecode to understand better what happen with it after, when machine uses it.

At the begin of this article we'll discover how to transform compiled .class file into output composed by bytecode instructions. At the second article we'll use this technique to learn some basic instructions of bytecode.

Transform .class file into bytecode output in Java

The tool used commonly to read class bytecode is called Java Class File Disassembler and can be ran with javap command. Let's start by see which options can be specified for it:

javap --help
Usage: javap  
where possible options include:
  -help  --help  -?        Print this usage message
  -version                 Version information
  -v  -verbose             Print additional information
  -l                       Print line number and local variable tables
  -public                  Show only public classes and members
  -protected               Show protected/public classes and members
  -package                 Show package/protected/public classes
                           and members (default)
  -p  -private             Show all classes and members
  -c                       Disassemble the code
  -s                       Print internal type signatures
  -sysinfo                 Show system info (path, size, date, MD5 hash)
                           of class being processed
  -constants               Show static final constants
  -classpath         Specify where to find user class files
  -bootclasspath     Override location of bootstrap class files

As you can see in the list, the most important option for us is -c. This option "disassembles" the code, ie. converts .class code to bytecode. If we use a simple javap command without -c, we'll receive a simple list of methods contained in given .class file:

Compiled from "BytecodeSample.java"
public class com.waitingforcode.BytecodeSample {
  public com.waitingforcode.BytecodeSample();
  public void doNothing();
  public java.lang.String doNothingWithString(java.lang.String);
}

For "javaps" with -c option, we'll receive more verbose output:

Compiled from "BytecodeSample.java"
public class com.waitingforcode.BytecodeSample {
  public com.waitingforcode.BytecodeSample();
    Code:
       0: aload_0       
       1: invokespecial #1                  // Method java/lang/Object."":()V
       4: return        

  public void doNothing();
    Code:
       0: return        

  public java.lang.String doNothingWithString(java.lang.String);
    Code:
       0: aload_1       
       1: areturn       
}

Note at this stage that only public and protected properties (fields, methods) are printed.

Other options without classpath ones (-classpath and -bootclasspath) can be used to customize the verbosity of printed bytecode. Let's make our javap the most verbose as possible by invoking javap -c -sysinfo -p -version -v -l BytecodeSample.class. The output will be really generous:

show verbose javap output

Bytecode basics

We'll start by analyzing sample output composed by methods (empty and not empty signature, methods with and without return, of all 4 visibility), fields (private, public, protected and package-private), constants, enums. We'll also explore two different types of constructors: without parameters and with parameters. We'll reduce the output to simple bytecode commands by calling javap -c -p BytecodeSample.class. In this exercise, we'll try to reconstruct .java file directly from bytecode output which looks as:

Compiled from "BytecodeSample.java"
public class com.waitingforcode.BytecodeSample {

  ## As you can notice, all fields are ordered. Even if they appear in the
  ## middle of the body in .java file, they'll appear at the begin of bytecoded class.
  ## Note also that the values of the fields aren't defined. We'll come back to this after. 
  ## You can also observe that they're no "magic" replacements for 
  ## visibility modifiers and that primitive types are kept.
  
  public static final java.lang.String NAME;

  public int normalAge;

  static final java.lang.String NAME_PP;

  int normalAgePP;

  protected static final java.lang.String NAME_PRO;

  protected int normalAgePro;

  private static final java.lang.String NAME_PRI;

  private int normalAgePri;

  ## Below you can find the definition of both constructors (with and without parameters in signature).
  ## You can observe that they're no name for parameter in the second constructor. They're only the
  ## parameter's type. 
  ## 
  ## As you can see, they're some lines with names similars to methods of programming language. And that's it.
  ## This output contains all instructions passed by JVM to machine. In the constructors we can distinguish
  ## the definition of fields values with the call of putfield instructions. As you can see, we can know which
  ## field is defined thanks to "// Field ${fieldName}" fragment.
  ## 
  ## bipush instruction pushes fields objects onto stack. As you can see, all bipush invocations are followed
  ## by 30 and 30 is the value associated with all int fields of the class. This instruction can be used only
  ## for integers from -128 to 127.

  public com.waitingforcode.BytecodeSample();
    Code:
       0: aload_0       
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: aload_0       
       5: bipush        30
       7: putfield      #2                  // Field normalAge:I
      10: aload_0       
      11: bipush        30
      13: putfield      #3                  // Field normalAgePP:I
      16: aload_0       
      17: bipush        30
      19: putfield      #4                  // Field normalAgePro:I
      22: aload_0       
      23: bipush        30
      25: putfield      #5                  // Field normalAgePri:I
      28: return        

  public com.waitingforcode.BytecodeSample(java.lang.String);
    Code:
       0: aload_0       
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: aload_0       
       5: bipush        30
       7: putfield      #2                  // Field normalAge:I
      10: aload_0       
      11: bipush        30
      13: putfield      #3                  // Field normalAgePP:I
      16: aload_0       
      17: bipush        30
      19: putfield      #4                  // Field normalAgePro:I
      22: aload_0       
      23: bipush        30
      25: putfield      #5                  // Field normalAgePri:I
      28: return        

  public void doNothing();
    Code:
       0: return        

  public java.lang.String doNothingReturn();
    Code:
       0: ldc           #6                  // String text
       2: areturn       

  public java.lang.String returnString();
    Code:
       0: ldc           #7                  // String String
       2: areturn       

  ## If you are comparing methods returning void with the methods
  ## returning an object, you can see that both invoke some "return" 
  ## method at the end. 
  ## 
  ## Unlike in common programmer code, bytecode's "return" instruction sends void 
  ## result and "areturn" send a non-void one (as String in next line). 

  public java.lang.String returnStringWithParam(java.lang.String);
    Code:
       0: aload_1       
       1: areturn       

  void doNothingPP();
    Code:
       0: return        

  java.lang.String doNothingReturnPP();
    Code:
       0: ldc           #6                  // String text
       2: areturn       

  java.lang.String returnStringPP();
    Code:
       0: ldc           #7                  // String String
       2: areturn       

  ## If you compare the body of methods with and without parameters,
  ## you can observe the difference at the level of the first executed
  ## instruction. For parameter methods, aload_${NUMBER} load the
  ## object's reference onto stack from the parameter placed at
  ## ${NUMBER} position. So if we had two String parameters, we should
  ## see in method's body aload_1 and aload_2 instructions.

  java.lang.String returnStringWithParamPP(java.lang.String);
    Code:
       0: aload_1       
       1: areturn       

  protected void doNothingPro();
    Code:
       0: return        

  ## A very interesting instruction is used in this method. As you can see, it
  ## doesn't have any parameters in signature. However, it returns a String. If
  ## we want to know which String is returned, we can see at the first 
  ## instruction - ldc - which gets one constant value ("text" in our case) and
  ## push it onto stack. The values are got from constant pool and are identified
  ## by "#${NUMBER}" expression. This expression indicates the index of retrieved
  ## value in the constant pool.

  protected java.lang.String doNothingReturnPro();
    Code:
       0: ldc           #6                  // String text
       2: areturn       

  protected java.lang.String returnStringPro();
    Code:
       0: ldc           #7                  // String String
       2: areturn       

  protected java.lang.String returnStringWithParamPro(java.lang.String);
    Code:
       0: aload_1       
       1: areturn       

  private void doNothingPri();
    Code:
       0: return        

  private java.lang.String doNothingReturnPri();
    Code:
       0: ldc           #6                  // String text
       2: areturn       

  private java.lang.String returnStringPri();
    Code:
       0: ldc           #7                  // String String
       2: areturn       

  private java.lang.String returnStringWithParamPri(java.lang.String);
    Code:
       0: aload_1       
       1: areturn       
}

As you can see with commands in javap output, the printed form is explicit. Below you can find Java class used to generate bytecode output:

show printed Java class

This article introduced us into the world of Java's bytecode interpretation. At the begin we discovered how to generate bytecode from .class file. We used javap command for it. The second part was about an introduction to basic instructions of bytecode. We discovered how the variables are transmitted onto stock (bipush) and how they're defined for the class (putfield, ldc). We also saw the difference between methods returning void and returning something (areturn and return instructions).


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!