The overall performance of an application can be contingent upon a variety of factors, and often performance tuning can be more of an art than a science. While performance can be a secondary benefit in some applications, in high throughput applications such as ad servers, performance bottlenecks can act as fatal blows to an otherwise well-designed application.
Given the fact that most applications have a similar purpose and are crafted by developers who have a similar line of thinking, there are certain common traps that a lot of developers can fall into. Luckily these are ubiquitous and easy to spot. So as a performance tuner, if you do not know where to start when it comes to fixing your application code, below are three great starting blocks.
#1: Reflection traps
Reflection is an awesome and powerful tool. However, if used incorrectly, it can be devastating to performance.
Consider the following:
org.apache.logging.log4j.Logger manager = org.apache.logging.log4j.LogManager.getLogger(); // This calls LoaderUtil.loadClass(element.getClassName()); // Which in turn calls // java.lang.ClassLoader.loadClass(java.lang.String, boolean) // which has a code which goes like this synchronized // (getClassLoadingLock(name)) { ... }
At the outset, it looks like perfectly harmless code, but if you delve deeper this is not the case. You will see that it locks on the entire class, which in turn locks any thread that is using this class. This happens because we did not have the will to write the class name, but there are actually many variables involved. There is now a “use fast reflection” mode that may be useful here. But at the time of writing such a line, we are not really aware of what goes on and therefore we may be contributing to such a bottleneck for no reason whatsoever.
If we take a thread dump and find a lot of threads locked because they are waiting on a class, this is generally a sign that the object at hand is using some reflection-based framework.
Solution
Dig into the code of the library and make sure it’s being used as intended. Also, run a profiler from time to time to make sure this is not a bottleneck for the code.
#2: JVM and garbage collection
(For the purpose of this article, I will keep this section confined to a Concurrent Mark and Sweep GC.)
The code at IAS is responsible for servicing a huge number of requests per second. The routines that do this are highly-iterative and tend to generate tons of temporary objects on Eden space that are used once and immediately discarded.
However, if the Eden Space is being garbage collected in the midst of a request, the temporary objects end up spilling into the survivor space.
Now, as mentioned in the previous section, there aren’t many survivor objects in our code, given that most requests end up creating objects on the fly. But that is not what a generational garbage collector is designed to do. Hence, a badly tuned GC might end up making a lot of spillover objects fall into survivor space, which may result in them staying in the memory longer than required. Repeated full GCs are a sign of such an event.
Solution
Adjusting the survivor ratio can work wonders for performance.
For example, the default survivor ratio is 6, but we don’t really need such a huge survivor space. For our application, we would, in fact, have been OK with a hypothetical fractional value. But since it cannot be a fraction, we can at the very least have a -XX:SurvivorRatio=6 JVM parameter to fine-tune the garbage collector.
#3: Database calls
Database calls are made throughout a typical application for a variety of reasons. For example, an operation as benign as saving an object in an ORM leads to a database call. All those database calls take time and can quickly tally up. Add to that the complexity associated when saving an object that involves checking foreign key constraints, which is compounded, when that foreign key is not indexed.
The sophisticated and preferred way to eliminate unnecessary database related waiting is to use a profiler to find out the exact time spent in each method and each object. But, there is a brute force method that you may find useful if, like me, you do not want to spend a long time setting up the profiler and reading the manual for VisualVM. Simply put, if you send ten requests to your app and take ten thread dumps and then four out of ten are in the same method, you have found the likely bottleneck.
Once you’ve found the problem, there are a couple of potential solutions.
Solution 1: Caching
Caching is one amazing way to avoid sending requests to the slow (and potentially unnecessary) bottleneck that is the database. Especially when it comes to objects that do not experience a lot of modification.
Here is some example code that will keep a local cache using the CacheBuilder tools made available by our friends at Google:
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder() .maximumSize(1000) .expireAfterWrite(10, TimeUnit.MINUTES) .removalListener(MY_LISTENER) .build(new CacheLoader<Key, Graph>() { public Graph load(Key key) throws AnyException { return createExpensiveGraph(key); } });
Solution 2: Joins in queries
An often overlooked solution to a lot of database call performance problems is to apply a CPM/PERT project-management approach to things. In other words, figure out all the tasks that take place in the application and find the dependencies between them. This will help you find a critical path in the application which you can then focus on optimizing.
For example, instead of going back and forth with calls to your database, use joins to replace loops where possible. This is not a “one size fits all” solution, but in our code, I was able to find a lot of instances where this made a difference.
Consider a schema where an employee belongs to an organizational department. You want to find out the sum of all the salaries of all employees in a particular department. You run a query such as the following:
select employee from department where department.name = “accounts”
Then you loop over this dataset that is returned by doing something like this:
sum = 0 for each employee_selected sum + = select sum(employee.salary) from employee where employee.id = employee_selected.id
While that is a valid approach, it could be optimized by using a join instead of a loop like this:
select sum(employee.salary) from employee join department on employee.departmentId = department.departmentId where department.name = “accounts”
Adding a join below the select may seem counterintuitive to database theory, but the outcome is that the code only makes a single round trip to the database instead of many (possibly thousands).
Solution 3: Compression
This is primarily for cases where we end up with a large amount of data that is returned as a query result. If using the useCompression=true URL parameter for MySQL, we can now trade in smaller network payload for a slightly increased CPU consumption. Given that the network is slower than the CPU, usually find faster returns from the database, thereby having better performance.
Hopefully this post has provided you with useful tips and tricks on how to avoid bottlenecks in Java applications. I tried to list a few places where I have identified code performance issues in my experience at IAS. This list is in no way exhaustive, nor is it always the best way to approach every problem. However, taking these small, non-intrusive steps has often helped me to achieve big results. I hope these steps help do the same for you.