When building high-performance software, you need to make sure you have two things to start you off: solid architecture and streamlined code. When your code runs efficiently, you are not only able to reduce resource consumption and completion time, but effectively evaluate the quality of your software. Here, I will share some ways that you can do this.
In the last article, I discussed how building streamlined software is a lot like building race cars. When we look at a fast car, it might be the sleek design and chassis that initially grabs our attention, but, as any enthusiast knows, you need to open the bonnet to see what really makes it tick. The engine is, after all, the true power behind any car. It’s a complex combination of different components, where even the smallest misplaced bolt can have a massive impact on the overall performance of the vehicle.
In a software system, the engine is represented by the code. Each little section comes together to make it operate and, if one piece of code is poorly optimised, often the whole system will feel slow as a result.
What optimising code means and why it’s important
Code optimisation is the process of trying to find the fastest way in which an operation or problem can be resolved by computation time.
It does not look for the shortest, easiest or even the simplest solution, but the one that allows for execution in the fastest amount of time, using the least amount of memory.
The optimisation of code can be quite an exhaustive discussion, but to at least get the juices flowing, I would like to suggest a few things that you can look at to help your team identify what could provide the biggest performance gains. These include:
- Using the most efficient algorithms
- Optimising your memory usage
- Using functions wisely
- Optimising arrays
- Using powers of two for multidimensional arrays
- Optimising loops
- Working on data structure optimisation
- Identifying the most appropriate search type
- Being careful with the use of operators
- Using tables versus recalculating
- Carefully considering data types
Much like car parts are not all interchangeable with each other, some of the solutions that I suggest may not work as effectively with your software systems. Different programming languages and compilers work differently in how they convert and optimise functions. So, I would suggest, like with everything, you do some research, monitor the results and stick with what works best for you.
Use the most efficient algorithm
Speed, simply put, is about CPU processing. Ideally, what you want to do when optimising any algorithm is figure out which decision tree or branch logic will require the least number of options to work through – or, more specifically, the least CPU time.
How to do this is quite complex as there are so many different ways to apply algorithms, depending on the solution you are trying to reach. This article explains how to think about algorithms quite well.
I’ve found that the easiest way to evaluate and measure this, is to break up each of the processing aspects of your code, apply the following code (I’ve used C++ here, but you can change this for whatever language you prefer), and measure which one performs the fastest:
// your code
printf("%2.3f\n",(float) ((EndTime.QuadPart - BeginTime.QuadPart) *1000000)/Freq.QuadPart);
The above will effectively output the execution speed of a given line of code, which will help you to measure it effectively. It will also provide you with a platform to experiment and find the best combination of algorithms for your code.
Optimise your code for memory
It is important to know how your compiler manages its memory and programs. Knowing this can prevent your code from utilising too much memory and, thereby, potentially slowing down other aspects of computer processing.
This is especially important for graphically heavy applications, like video games. In these cases, processors are required to work with complex algorithms to generate the CGI images and, how you utilise your memory will make a massive difference in the overall performance of the final product.
Tip: You can use monitoring tools (like Zabbix) to help achieve this.
Use functions wisely
Functions, or shared code that can be called multiple times, are utilised to make code more modular, maintainable and scalable. However, if you are not careful when using them, functions can create some performance bottlenecks, especially if applied to recursion (functions called in a repeated loop).
While functions certainly make coding shorter, repeatedly calling a function to do a similar thing ten times is unnecessary expenditure on your memory and CPU utilisation.
Tip: To do this better, you should implement the repetition directly in your function as this will require less processing. I’ve set up a few examples to show this later in the article.
Another thing to consider is how you use inline functions While these are often used to ease some of the processing restrictions on your CPU, there are other ways of reducing strain on your processor. For instance, for smaller functions, you can make use of macros instead, as these allow you to benefit from speed, better organisation and reusability.
Additionally, when passing a big object to a function, you could use pointers or references, as these provide better memory management. I personally prefer to use references because they create code that is way easier to read. They are also useful things to use when you are not worried about changing the value that is passed to the function. If you use an object that is constant, it could be useful to use
const, which will save some time.
The array is one of the most basic data structures that occupies memory space for its elements.
An array’s name is a constant pointer that points at the first element of an array. This means that you could use pointers and pointer arithmetic because of the operations with pointers.
In the example below, we have a pointer to the
int data type that takes the address from the name of the array. In this case, it is
nArray, and we increase that address for one element. The pointer is moved toward the end of the array for the size of the
int data type.
for(int i=0; i<n; i++) nArray[i]=nSomeValue;
Instead of the above code, the following is better:
for(int* ptrInt = nArray; ptrInt< nArray+n; ptrInt++) *ptrInt=nSomeValue;
If you have used double, your compiler will know how far it should move the address.
It may be harder to read code this way, but it will increase the speed of your program. It’s not the most efficient algorithm that you could use, but the syntax is better and this means the code will run faster.
If you use a matrix, and you have the chance to approach the elements of the matrix row by row, always choose this option as this is the most natural way to approach the array members.
Tip: Avoid initialisation of large portions of memory with some elements. If you can’t avoid this type of situation, consider
memset or similar commands.
Use powers of two for multidimensional arrays
A multidimensional array is used to store data that can be referenced across two or three different sets of axis. It makes storing and referencing data easier. If we can perform faster searching, we can save a lot of time in our code, especially when working with large amounts of data.
The advantage of using powers of two for all but the leftmost array size comes when accessing the array. Ordinarily, the compiled code would have to compute a ‘multiply’ to get the address of an indexed element from a multidimensional array, but most compilers will replace a constant multiply with a shift if it can. Shifts are ordinarily much faster than multiplies.
We utilise loops, or repeated sequences, to sort through or iterate on data to perform actions when required. While this sort of recursion is extremely helpful in specific scenarios, most of the time, it generates slow performing code.
Tip: If possible, try to reduce your reliance on loops. You should only really use them if they are needed multiple times, and contain multiple operations within them. Otherwise, if you need to iteratively sort through something, use another type of sorting algorithm to reduce your processing time.
Work on data structure optimisation
Not all data is equal and so, we need to structure data appropriately for our intended solution. Like with most things, data has a big impact on code performance, and so, the way you structure the data you need in your code will play a big part in enhancing its speed.
Tip: Keeping your data in a list means that you can very easily create a program that will outperform one that has been created using an array. Additionally, if you save your data in some form of a tree, you can create a program that will perform faster than one that doesn’t have adequate data structure.
Identify the most appropriate search type: Binary Search or Sequential Search
One of the most common tasks you do when programming is search for some value in data structures. However, you can’t just apply the same searching principles to different data structures. Rather, you should spend some time identifying the most appropriate approach for what we require.
For example, if you are trying to find one number in an array of numbers you could have two strategies:
- Sequential Search: The first strategy is very simple. You have your array and value you are looking for. From the beginning of the array, you start to look for the value and, if you find it, you stop the search. If you don’t find the value, you will be at the end of the array. There are many improvements to this strategy.
- Binary Search: The second strategy requires the array to be sorted. If an array is not sorted, you will not get the results that you’re looking for. If the array is sorted, you split it into two halves. In the first half, the elements of the array are smaller than the middle one. In the other half, the elements are bigger than the middle one. If you find yourself in this situation, where two markers are not situated the way they should be, you know that you don’t have the value you have been looking for.
Sorting through the elements of an array will cost you some time, but if you’re willing to do that, you’ll benefit from faster binary search.
Be careful with the use of operators
Most basic operations, like +=, -=, and * =, when applied to basic data types can slow down your program because they place unnecessary computation on your processor. To be sure that things aren’t getting slowed down, you will need to know how they get transformed into assembler on your computer.
Tip: An interesting way to do this is to replace the postfix increment and decrement with their prefix versions.
Sometimes you can use the operators >> or << instead of multiplication or division, but be careful, because you could end up with mistakes. When you attempt to fix these, you could inadvertently add some range estimations, making the code you started with way slower.
Bit operators and the tricks that go with them could increase the speed of the program, but you should be very careful because you could end up with machine dependent code, which you want to avoid.
Use tables versus recalculating
Often in coding, we need to perform some sort of complex calculation. When dealing with calculations, we can either perform the calculation directly in the code or make use of a table to reference and save on the processing time.
Tables are often easier to work with and the simplest solution to code, but they don’t always scale well.
Remember that in recalculating, you have the potential to use parallelism, and incremental calculation with the right formulations. Tables that are too large will not fit in your cache and, hence, may be slow to access and cannot be optimised further. Much like we mentioned above when discussing data structures, tables should be used with caution.
Carefully consider your data types
When we assign data to variables in code, we often allocate a size to it. This is the amount of memory space the computer makes available when working with this memory. The larger the program, the more it may utilise data. You want to try and use as little data as possible.
On modern 32 and 64-bit platforms, small data types like chars and shorts actually incur extra overhead when converting to and from the default machine word-size data type.
Tip: Be specific about the data that you are using. Utilise chars for small counters, shorts for slightly larger counters and only use longs or ints when you really have to.
On the other hand, one must be wary of cache usage. Using packed data (and in this vein, small structure fields) for large data objects may pay larger dividends in global cache coherence, than local algorithmic optimisation issues.
While finding the most optimal coding solution is ideal, it doesn’t always mean it’s the best way to go about solving problems. Some of the below points are also worth considering:
- Optimising your code for performance using all possible techniques might generate a bigger file with bigger memory footprint.
- You might have two different optimisation goals that conflict with each other. For example, to optimise the code for performance might conflict with optimising the code for less memory footprint and size. You have to find a balance.
- Performance optimisation is a never-ending process; your code might never be fully optimised. There is always more room for improvement to make your code run faster.
- Sometimes you can use certain programming tricks to make code run faster at the expense of not following best practices. Try to avoid implementing cheap tricks, though as this will not pay off long term.
An optimisation test:
To test how well you’ve understood the different optimisation techniques discussed above, here is a coding solution for you have a look at and identify what can be optimised:
Example code (in C++):
#define LEFT_MARGIN_FOR_X -100.0
#define RIGHT_MARGIN_FOR_X 100.0
#define LEFT_MARGIN_FOR_Y -100.0
#define RIGHT_MARGIN_FOR_Y 100.0
using namespace std;
//Get the constant value
cout<<"Enter the constant value b>0"<<endl;
cout<<"b->"; double dB; cin>>dB;
if(dB<=0) return EXIT_FAILURE;
if(dB>1000) return EXIT_FAILURE;
//This is the potential maximum value of the function
//and all other values could be bigger or smaller
double dMaximumValue = (LEFT_MARGIN_FOR_X*LEFT_MARGIN_FOR_X+LEFT_MARGIN_FOR_Y*LEFT_MARGIN_FOR_Y)/ (LEFT_MARGIN_FOR_Y*LEFT_MARGIN_FOR_Y+dB);
double dMaximumX = LEFT_MARGIN_FOR_X;
double dMaximumY = LEFT_MARGIN_FOR_Y;
for(double dX=LEFT_MARGIN_FOR_X; dX<=RIGHT_MARGIN_FOR_X; dX+=1.0)
for(double dY=LEFT_MARGIN_FOR_Y; dY<=RIGHT_MARGIN_FOR_Y; dY+=1.0)
cout<<"Maximum value of the function is="<< dMaximumValue<<endl;
cout<<"Value for x="<<dMaximumX<<endl
<<"Value for y="<<dMaximumY<<endl;
Optimising your code is key in ensuring that your engine is running efficiently. However, to ensure that your system stays that way, you’re going to need to test it continuously. We’ll cover this in the final installment of this series.
When not changing the world of software development, Craig can also be found writing for Critical Hit as well as his own LinkedIn blog, designing board games for his company Risci Games and running long distances for no apparent reason whatsoever. He likes to think that his many passions only add to his magnetic charm and remarkable talent – at least, that’s what he tells himself when he goes to sleep at night!