Thursday, January 5, 2012

Weak and strong references in Java

Today, as I was busy implementing an id-object mapping class, I was forced to revisit the concept of weak and strong references - the topic not entirely new to me, but one where I have not had any practical experience. And surprisingly, one which is completely foreign to quite some Java developers (at least the ones that I have asked). Everyone knows what a strong reference (or just a reference, in short) is - they are the regular bread-and-butter references we all use in our everyday Java programming. What, therefore, are weak references? I will talk about them in a moment, but first - an example.

Imagine you are implementing the aforementioned id-object mapping in your application - you have a bunch of objects (dynamically created and removed throughout the lifetime of your application) which also have some kind of ids - kind of like entries in a relational database. For some reason, you want to be able to retrieve one of such objects without having a reference to it and knowing only its id (a scenario that may be useful when employed in a client-server situation, where for obvious reasons the client references server's objects by id only). The most naive way to do it is by using a map, like in the below example (the Entity class is some class specific to your application, you can substitute any class, even Object, here).

public class EntityRegistry {
    private Map<Integer, Entity> entityMap
        = new HashMap<Integer, Entity>();

    public void register(int id, Entity entity) {
        entityMap.put(id, entity);
    }

    public Entity get(int id) {
        return entityMap.get(id);
    }
}

What is wrong with this code, aside from it not being thread-safe in any way? The answer is: memory leaks. Since your objects are indefinitely kept in the registry's map (unless you call the register method with null as the second parameter), they are never garbage collected, even if they cease to exist in the rest of your application. This may be OK in some scenarios (where you do not want them to be gone unless explicitly removed from the registry), but not so much in others, where you just want the registry to hold the reference for as long as the object "lives", but not longer.

And this is where weak references come into play. A weak reference is a type of holder object (an object used to wrap another object) that only has, as its name implies, only a weak hold over the object it wraps. When the wrapped object ceases to be referenced by other non-weak references, it is eligible for garbage collection. When such garbage collection happens, all the weak references pointing to such object start to point to null instead. Therefore, lets employ such a weak reference in our revised entity registry class.
public class EntityRegistry {
    private Map<Integer, WeakReference<Entity>> entityMap
        = new HashMap<Integer, WeakReference<Entity>>();

    public void register(int id, Entity entity) {
        entityMap.put(id, new WeakReference<Entity>(entity));
    }

    public Entity get(int id) {
        WeakReference<Entity> ref = entityMap.get(id);
        if (ref != null)
            return ref.get();
        else
            return null;
    }
}

Now, this is much better - you will no longer have unneeded Entity instances lying around only because they are still kept in your EntityRegistry instance. However, this is still not perfect; even after an Entity instance gets garbage collected, the map will still keep a key-value pair for its weak reference, at least until we assign a new entity to the same id (which, depending on our id generation scheme, may never happen if we don't reuse ids). So how can we solve this? The answer is by using a reference queue.

Reference queues are employed when you want to know when a given weak reference becomes invalid. You provide a reference queue as the second constructor argument to a weak reference and when the object to which the reference points gets garbage collected, the weak reference itself is added to the provided reference queue (more precisely, it is added there "at the time of garbage collection or some time later", left at the discretion of a particular JVM). Since a reference queue functions like an ordinary blocking queue, you can poll it to retrieve any already-expired weak references and remove them from the mapping. There is one caveat, however: since the objects these references point to no longer exist, you cannot access them after retrieving the reference from its queue. Therefore, you need to store the id of the entity a reference points to in the reference itself, so that you later know which map key to clear. Confused? Lets then make it cleared with an example! First, we need to create our own subclass of WeakReference (yes, we can do that) which will also hold an entity's id.
class WeakEntityReference extends WeakReference<Entity> {
    private int id;

    public WeakEntityReference(int id, Entity referent, 
      ReferenceQueue<? super Entity> q) {
        super(referent, q);
        this.id = id;
    }
 
    public int getId() {
        return id;
    }
}

And now lets, once again, improve our EntityRegistry class.
public class EntityRegistry {
    private ReferenceQueue<Entity> queue 
        = new ReferenceQueue<Entity>();
    private Map<Integer, WeakEntityReference> entityMap
        = new HashMap<Integer, WeakEntityReference>();

    public void register(int id, Entity entity) {
        cleanUpReferences();
        entityMap.put(id, 
          new WeakEntityReference(id, entity, queue));
    }

    public Entity get(int id) {
        WeakReference<Entity> ref = entityMap.get(id);
        if (ref != null)
            return ref.get();
        else
            return null;
    }
    
    private void cleanUpReferences() {
        Reference<?> ref;
        while ((ref = queue.poll()) != null) {
            WeakEntityReference entityRef 
              = (WeakEntityReference) ref;
            entityMap.remove(entityRef.getId());
        }
    }
}

Notice that we have added a new 'queue' field, changed the map to hold WeakEntityReference objects (instead of the generic WeakReference) and created a new private method cleanUpReferences, which is then always called before registering new entities. If you want, you can also call it in the get method; I have chosen not to, because in a system where new entities are created relatively frequently, calling it in register only should be often enough, and since the get method may be called very frequently, I would prefer for it to have the lowest overhead possible.

So, lets write a simple test to see if our registry really works! Before we do, though, let us add one more method to the EntityRegistry class, which will come handy when writing a test.
public class EntityRegistry {
    (...)
    public int size() {
        cleanUpReferences();
        return entityMap.size();
    }
    (...)
}

And now off to write the test itself. We want to test three things:
  1. EntityRegistry correctly holds and returns entities - since this is what it is supposed to do.
  2. EntityRegistry correctly returns null for entities which have already been garbage collected.
  3. EntityRegistry trims its map so that it does not hold expired WeakEntityReference instances.
An example test case can look like this:
public class EntityRegistryTests {
 @Test
 public void test() {
  // Create an empty registry and 
  // check if its size is correct 
  EntityRegistry registry = new EntityRegistry();
  assertEquals(0, registry.size());
  
  // Register a single entity and
  // check if it is correctly returned 
  Entity entity1 = new Entity();
  registry.register(1, entity1);
  assertEquals(1, registry.size());
  assertSame(entity1, registry.get(1));
  
  // Register a second entity and
  // check if both entities are 
  // correctly returned 
  Entity entity2 = new Entity();
  registry.register(2, entity2);
  assertEquals(2, registry.size());
  assertSame(entity1, registry.get(1));
  assertSame(entity2, registry.get(2));
  
  // Check if the entities survive a
  // garbage collection. 
  System.gc();
  assertEquals(2, registry.size());
  assertSame(entity1, registry.get(1));
  assertSame(entity2, registry.get(2));

  // Invalidate the first entity, garbage 
  // collect it and verify it is also
  // gone from the registry 
  entity1 = null;
  System.gc();
  assertEquals(1, registry.size());
  assertNull(registry.get(1));
  assertSame(entity2, registry.get(2));
 }
}

We are done! Our (very simple) entity registry class is fully functional and does not introduce unintended memory leaks. However, this is only the tip of an iceberg regarding how references are handled by the JVM and its garbage colector. In fact, we don't have only two reference types (strong and weak), we have four (strong, soft, weak and phantom). I am not going to describe soft and phantom references in detail here (unless you want to - if so, drop a comment!), but in a nutshell:
  • Soft references are usually used in various cache mechanisms. They generally function like strong references, so the objects they point to are excluded from garbage collection. However, if the JVM gets desperate about reclaiming memory (for example, when the memory gets really low), they will be regarded as weak references and the objects they point to may be garbage collected. Thus objects referenced softly will live longer than the ones referenced weakly, but will not lead to fatal memory leaks.
  • Phantom references are almost identical to weak references. The two main differences are: phantom reference's get method always returns null (so you can never retrieve an object based only on its phantom reference) and a phantom reference is enqueued in a reference queue only after a particular object's finalize method has been called (while weak reference may be enqueued before the finalize method is invoked). Because their get method always returns null (to prevent resurrecting a dead object - though a determined developer can still retrieve it from a phantom reference by using reflection, though of course it is a terrible practice), they do not have much use, but with subclassing they can still be used to clean up after an object (by, for example, holding the object's id in addition to the object itself, kind of like we did with a weak reference in the example above).
If you want to read more on references, you can check out this blog (although it errs when describing the phantom references). You can also find a pretty good example of a complete weak-valued HashMap here. If all you need is a HashMap with weak keys (not values), then there is already one in the standard Java library (javadoc).

And what about you? Have you ever used non-strong references in the past? Can you provide a different example of where would they be useful? If so, do not hesitate to drop a comment!

No comments:

Post a Comment