Java and mysterious byte

on waitingforcode.com

Java and mysterious byte

You're doing Java/C#/JavaScript and doing it great? But you're tired because of always facing the same problems. I was like that 4 years ago. I changed then to the data engineering field and it solved my existential problems :) If you want to follow my path, I prepared a course that will help you with that! Join the class!
As a developer who learned Java basically on web applications, understanding of byte existence in Java may be difficult. In this article we'll try to explain the reason of being of byte in Java environment.

This article will start by explaining some specificity about bytes in Java. In the second part we will see when we can use bytes in Java. The last part will be written as JUnit test case and will present some code using bytes.

Java and bytes

Official Java tutorial defines byte as a "8-bit signed two's complement integer". Let's try to decrypt this short definition. First, "8-bit signed", means that Java's byte is written in 8 bits. These bits are signed, ie. they can be both negative and positive numbers. The minimum value of byte is -128 while the maximum is 127.

Bytes are primitive types but, as all other primitives, they can be wrapped to objects. A class in charge of their wrapping is java.lang.Byte. It contains some methods that allow us to transform received byte on another primitive type, as double (doubleValue), float (floatValue), int (intValue) or short (shortValue).

What is the difference between bytes and another numeric primitive types, as ints or longs ? The main difference are the bits needed to write each of them. As we've already seen, byte needs only 8 bits to be stored. Anothertypes need more place: short and char need 16 bits, int and float 32 bits, double and long 64 bits.

By the way, what happens when we try to use a byte out of range, as for example 128 ? This value will be considered as the last negative byte (-128) and decremented until reach again the same negative number (-128, at 384th byte). Inversely, if we reach the maximum negative byte (-128) and try to access to the next one (-129), we'll start again by the last positive byte (127) and decrement until the limit is reached (-385). So even if you try hard to bypass this barrier, it won't work and maybe your code will return incoherent results.

When use Java bytes ?

As we could see, bytes are the values with small range, occupied only 8-bits (the least from all other primitive types). So, you could use byte every time when stored value is between -128 and 127 ? Not exactly. In fact, 32-bit Java Virtual Machine stores primitives at 32-bit addresses. Bytes will be, as ints, floats, shorts and chars, stored in 32-bits address blocks. But this dependency isn't applied to arrays which are managed differently. Each array takes 12 bits (8 for array definition, 4 for array length). The objects put inside the array have the same size as previously described (8 bits for bytes, 16 bits for short and char, 32 for ints and floats, 64 for longs and doubles). Arrays size is rounded to closest 32 value (12 are rounded to 32, 53 to 64 etc.). So the array with 2 ints will take more place that the same sized array with bytes (8+4+32+32=76 rounded to 96 for ints against 8+4+8+8=28 rounded to 32 for bytes). This is the first reason of bytes being: space economy in the case of arrays.

The next reason to use bytes is the code logic simplification. If you're storing, for example, the number of countries in each continent, you won't need to use an int or long because they're always less than 128 (according to worldatlas.com stats, they're 54 countries in Africa, 44 in Asia, 47 in Europe, 23 in North America, 14 in Oceania and 12 in South America). Thanks to use byte to represent this value, somebody, even if he doesn't know exact number of countries in each continent, will know that they're no more than 127 countries per continent. To resume this point, we can tell that bytes fields has two roles: programming (executive) and documentary (helps to understand the program).

The third use case of bytes in Java is the communication through network (for example for writing data on URL connections), or in general to communicate by sending the streams. Some of Java classes that allow us to connect to something external, base theirs data transmission on bytes. For example all kinds of stream classes (InputStream, OutputStream...) define read and write methods which operate on bytes. For example, we can see it by reading OutputStream JavaDoc for write method comment: "Writes len bytes from the specified byte array starting at offset off to this output stream".

Examples of Java bytes use

Below JUnit test cases show the particularities of bytes use. Through these examples, you'll able to see the memory usage, the comparison of time needed by int array and byte array to copy, byte behavior on values out of range, and also some specificity about reading bytes natively to String.


public class JavaBytesTest {

  @Test
  public void testMemoryUse() throws InterruptedException {
    System.out.println("--- memory use comparaison ---");
     /**
      * We've assumed that int arrays will take more 
      * place than bytes ones. This test shows that.
      */
    int arraySize = 9012330;
    Runtime runtime = Runtime.getRuntime();
    byte[] byteArray = new byte[arraySize];
    for (int i = 0; i < byteArray.length; i++) {
      byteArray[i] = (byte)i;
    }
    long bytesSize = runtime.totalMemory() - runtime.freeMemory();
    
    int[] intArray = new int[arraySize];
    for (int i = 0; i < intArray.length; i++) {
      intArray[i] = i;
    }
    long intSize = runtime.totalMemory() - runtime.freeMemory() - bytesSize;
    System.out.println("b[] size="+bytesSize);
    System.out.println("i[] size="+intSize);
    assertTrue("Memory taken by ints array should be bigger than the memory taken by bytes array", 
      intSize > bytesSize);
    
    // mark as garbage collectable
    intArray = null;
    byteArray = null;
    // 5 seconds sleep and call of second test: made that because 
    // we want to execute all code in the same thread 
    Thread.sleep(5000);
     /**
      * And who tells more place taken, he tells slowest 
      * copy operation too (because they're more bits to 
      * copy in source and destination arrays).
      */
    arrayCopyBenchmark();
  }

  private void arrayCopyBenchmark() throws InterruptedException {
    System.out.println("--- copy benchmark ---");
    int arraySize = 1733300;
    int[] intArray = new int[arraySize];
    for (int i = 0; i < intArray.length; i++) {
      intArray[i] = i;
    }
    long start = System.currentTimeMillis();
    int[] copiedInt = new int[arraySize];
    System.arraycopy(intArray, 0, copiedInt, 0, arraySize);
    long end = System.currentTimeMillis();
    long time = end - start;
    System.out.println("Array (int) copied under "+time+ "ms");
    copiedInt = null;
    intArray = null;

    // 5 seconds sleep before the test on bytes array
    Thread.sleep(5000);

    // test byte
    byte[] byteArray = new byte[arraySize];
    for (int i = 0; i < byteArray.length; i++) {
      byteArray[i] = (byte)i;
    }
    start = System.currentTimeMillis();
    byte[] copiedByte = new byte[arraySize];
    System.arraycopy(byteArray, 0, copiedByte, 0, arraySize);
    end = System.currentTimeMillis();
    time = end - start;
    System.out.println("Array (bytes) copied under "+time+" ms");
  }

  @Test
  public void byteSizeTest() {
    // test the last positive byte value: 127
    byte positiveByte = (byte) 127;
    assertTrue("The last positive byte should be equal to 127 but was "+(int) positiveByte, 
      (int) positiveByte == 127);
    // test the last negative byte value: -1
    byte negativeByte = (byte) 255;
    assertTrue("The last negative byte should be equal to -1 but was "+(int)negativeByte, 
      (int)negativeByte == -1);
    negativeByte = (byte) -1;
    assertTrue("The last negative byte should be equal to -1 but was "+(int)negativeByte, 
      (int)negativeByte == -1);
    // test 1 byte based on byte out of range (257 while only 256 bytes are accepted including 0)
    byte toBiggerByte = (byte) 257;
    assertTrue("toBiggerByte should be equal to 1 (range of bytes: 127 negative numbers + 128 positive numbers = 255, "+
      " so the 257th byte should be 1 (it starts to count again from 0 at 256th byte)", (int)toBiggerByte == 1);
    // test negative byte from out of range
    byte negativeToBigger = (byte)-255;
    assertTrue("negativeToBigger should be equal to 1 (last negative byte = -128, -129 byte = 127 and it decrements until -256 = 0) but was "+(int)negativeToBigger, 
      (int)negativeToBigger == 1);
    // test 0 byte
    byte zeroByte = (byte) 256;
    byte normalZeroByte = (byte) 0;
    assertTrue("zeroByte should be equal to 0", (int)zeroByte == 0);
    assertTrue("normalZeroByte should be equal to 0", (int)normalZeroByte == 0);
  }

  @Test
  public void testReadingBytesStream() {
    System.out.println("--- string reading ---");
     /**
      * Only positive bytes can be represented as a printable char. The 
      * rest is considered as unknown (represented by ?).
      * So, for given text 123àéôaeo saved in testEncoding.html file, you should see 
      * 123??????aeo by reading received bytes one by one.
      * 
      * But if you'll try to convert this array to a String with specified encoding, you'll 
      * correctly see the encoded letters (String 123àéôaeo).
      */
    HttpURLConnection connection = null;
    InputStream stream = null;
    try {
      URL url = new URL("http://localhost/testEncoding.html");
      connection = (HttpURLConnection) url.openConnection();
      stream = connection.getInputStream();
      StringBuilder bytesString = new StringBuilder();
      byte[] bytes = new byte[stream.available()];
      stream.read(bytes);
      for (byte b : bytes) {
        bytesString.append((char)b);      
      }
      System.out.println("String constructed with reading bytes one by one :"+bytesString);
      assertTrue("String constructed from bytes chars shouldn't contain neither à, é and ô but it was", !bytesString.toString().contains("à") && 
          !bytesString.toString().contains("é") &&  !bytesString.toString().contains("ô"));
      String result = new String(bytes, "UTF-8");
      String expected = "123àéôaeo";
      assertTrue("Expected result of conversion is '"+expected+"' but '"+result+"' was received", 
        result.equals(expected));
    } finally {
      connection.disconnect();
    }
  }
        
}

With the sample output as:

--- string reading ---
String constructed with reading bytes one by one :123??????aeo
--- memory use comparaison ---
b[] size=11144448
i[] size=26106616
--- copy benchmark ---
Array (int) copied under 18ms
Array (bytes) copied under 1 ms

As we can see through given test, byte consumes less space in the memory than other primitive types. In consequence of this, the arrays of bytes will be treated faster (for example, copied) than the others. We also discovered why it's no advice to transform bytes to String without defining encoding. It can lead into strange output where non-interpreted characters ((char)i < 0) are printed as ?? (two '?' because char takes 2 bytes).

This article explained a little bit mysterious presence of bytes among Java's primitive types. Bytes aren't used very often to applications with a lot of memory (as web applications) but should be used as most as possible in the programs with limited size of resources (programs on embedded devices). But even "lot resourced programs" can benefit of bytes which can help to understand the application's logic.

Share on: