Introduction

Historically, firebird performance, like most database servers, has been mostly held back by disc and cpu performance. Modern SMP processors and PCI-E based disc sub-systems have eliminated many of the traditional bottlenecks for database servers. But as bottlenecks are removed new ones open up. And as more server deployments are moving to the cloud it is worth investigating the impact of network latency upon Firebird performance.

What is network latency?

The simple explanation is that network latency is the time it takes for packets to be transmitted across the network. The main factor is the physical distance between the two hosts. The further they are apart the longer the elapsed time. Other factors can also come into play - network and host congestion, hardware problems, and the network protocol used are typically among the most important.

Overall impact of net latency on Firebird

In general an increase in latency has a deleterious effect upon firebird. The magnitude of this effect appears to vary according to the version of Firebird used, as well as the load placed on the server.

In order to study this in more detail we have set up a simple client server test rig.

How the test bench is setup

The test bench is based on a Ryzen 1700 CPU with 8 cores and 16 threads. It has 64GB of RAM. Storage is NVME based with a direction connection to the CPU.

The tests run in docker containers. The client container runs the tests and the server container is configured to only run Firebird. Both containers have unlimited access to the hosts resources. They are connected by a virtual network which has de facto almost zero latency.

Each test is largely randomised so that no two tests with the same configuration will produce identical results. For this reason each test configuration is run at least five times to eliminate anomalies.

Simulating Network Latency

We can’t easily simulate hardware problems on the network and we don’t currently have a means to measure network congestion. However it is easy to simulate network latency and measure its effect upon performance. Under linux we can use the tc (traffic control) application to add latency:

    tc qdisc add dev eth0 root netem delay ${NET_LATENCY}ms

This command is run on both the client and the server (so for a desired latency of 100ms each host has latency set to 50ms.)

About TPC-C

These tests have been carried out using a java based version of the TPC-C benchmark. This benchmark is designed to measure Online Transaction Processing (OLTP). It models a wholesaler enterprise. The business has N warehouses each serving 10 districts. Each district has 3,000 customers. As the business expands it adds warehouses. We can change the size of the business by specifying the number of warehouse and the number of terminals that can access the database. Increasing the terminals increases the load on the server. Increasing the number of warehouses has the effect of decreasing the server load.

TPC-C runs five different workloads that simulate order entry, payments, deliveries, order status and stock checking. The main metric of the benchmark is New Orders Per Minute (NOPM).

In its original design by the TPC organisation terminals were intended to simulate actual users. In this version of the benchmark terminals should be considered more as a connection pool. It is better to use the NOPM metric as a guide to the number of users that can be supported.

Network latency at the millisecond level

Lets start with a baseline - 10 Warehouses and 10 Terminals

10 Warehouses, 10 Terminals
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 2000 2500 3000 3500 10 W 10 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

This test configuration doesn’t put much load on the server - there are no bottlenecks at any level - CPU, Disc I/O nor Network I/O.

We can clearly see the hypothesis is proven. All versions of firebird show a clear degradation in performance as network latency is increased.

It is interesting to note that the poorest performer - Super Server, also degrades the most gently.


10 Warehouse, 20 Terminals
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 2000 2500 3000 3500 10 W 20 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

The results here are similar to the first test. Adding terminals has improved overall performance for Classic and Super Classic, but performance degradation follows a largely similar trajectory. It is interesting to note that SuperServer performance remains largely unchanged. It should also be noted that Clasic and SuperClassic outperform SuperServer even at the 3ms mark.


10 Warehouses, 50 Terminals
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 2000 2500 3000 3500 10 W 50 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

With 50 terminals and only 8 Cores/ 16 threads we start to see the server itself becoming the bottle neck. Super Server performance has degraded by about 25% and network latency does not have a major role to play in this. Classic and Super Classic show some loss of performance but this is not severe as we saw with 20 terminals. In fact the network latency helps - new orders per minute at the 3 millisecond level are much higher than the previous example.


10 Warehouses, 100 Terminals
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 2000 2500 3000 3500 10 W 100 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

With 100 we see the same trend continuing. Again the system overall is overloaded, with network latency accentuating the problem for Classic and SuperClassic.


Initial Summary

It seems clear that in the absence of other bottlenecks that network latency will have an impact on Firebird performance. As server load increases this impact is lessened. This would indicate that the bottleneck(s) have moved back to CPU and disc.

Network latency at the micro-second level up to 500 micro-seconds

In this section we will look at the changes in as latency increases in 100 micro-second increments starting from zero latency. That is to say we are looking at very small amounts of latency.

10 Warehouses, 10 Terminals
0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 3000 3500 10 W 10 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

Even with a small amount of latency the same downward trend is visible. And the impact starts to become noticeable around the 300 micro-second mark. Again super server is underperforming here and is largely unaffected.


10 Warehouses, 20 Terminals
0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 3000 3500 10 W 20 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

This is interesting graph. Other tests we have carried out for this hardware and software configuration have shown that around 20 terminals is the optimum configuration for maximum performance. Here we see all three architectures give their best results with 20 terminals and network latency has only minimal impact up til the 500 micro-second mark.


10 Warehouses, 50 Terminals
0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 3000 3500 10 W 50 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

Again at the micro-second level we see limited impact of network latency. It is there but probably not noticeable. The most important factor is that too many terminals are running and overall throughput of new orders is down when compared to the throughput attained with just 20 terminals.


10 Warehouses, 100 Terminals
0.0 0.1 0.2 0.3 0.4 0.5 0 500 1000 1500 2000 2500 3000 3500 10 W 100 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

As the server is overloaded we see almost zero effect of network latency up to 500 micro-seconds. But overall performance in terms of new orders per minute is down by around 15% for all architectures.


Summary of this section

So long as network latency stays below the 500 micro-second mark it is clear that performance is not heavily impacted in this test - but see below for a discussion about blobs.

Looking at Network Latency from another angle

Let’s spin the data around and look at the the impact of network latency on the relation between New Orders and the number of terminals. First, when latency is counted in milliseconds:

10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Classic 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Classic 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Server 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Terminals NOPM

For Classic and Super Classic there appears to be a correlation between net latency and number of terminals. The impact of net latency can be offset by adding terminals (ie, increasing the size of the connection pool). But this eventually tops out and adding too many terminals will see performance start to fall. But this effect is hardly noticeable with Super Server. Although it is counter-intuitive it looks as if adding connections to the pool would be a good solution for CS and SC under these circumstances.

If we turn to look at the results when net latency can be measured in micro-seconds we see a similar picture:

10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Classic 0.0 0.1 0.2 0.3 0.4 0.5 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Classic 0.0 0.1 0.2 0.3 0.4 0.5 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Server 0.0 0.1 0.2 0.3 0.4 0.5 Terminals NOPM

Because net latency is minimal the effect of adding terminals is less dramatic. However the performance boost is sufficient to make it worth while.

What we know so far

Network latency is a thing, and in the absence of other bottlenecks will have an impact on performance. In general this impact is fairly low, as long as the latency can be measured in micro-seconds. But…​.

When latency is 100 milliseconds or more…​

It is clear that as network latency increases performance decreases. Let’s see what happens:

10 Warehouses, 10 Terminals
0 5 10 15 20 25 50 100 150 200 0 500 1000 1500 2000 2500 3000 3500 10 W 10 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

With 10 terminals performance crashes. At the 200ms mark the NOPM drops to around 20 for CS and SC, with SS hitting 40 NOPM. Is this a problem? Obviously with network latency like this high performance is out of the question. The only question that remains is whether performance would be adequate for a handful of users.

It is interesting to note that Super Server clearly outperforms classic and super classic after around the 3ms point.


10 Warehouses, 20 Terminals
0 5 10 15 20 25 50 100 150 200 0 500 1000 1500 2000 2500 3000 3500 10 W 20 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

With 20 terminals we see a similar story although overall performance is improved. Even at the 200ms mark we see double the performance. This is a good sign because we start to see that as net latency increases we can improve performance by adding more connections to the pool. This is contrary to the experience with low latency, where adding terminals degrades performance.


10 Warehouses, 50 Terminals
0 5 10 15 20 25 50 100 150 200 0 500 1000 1500 2000 2500 3000 3500 10 W 50 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

This graph is also interesting. We only see Super server start to outperform CS and SC at around the 10ms point. And the performance degradation is much less severe. When we hit the 200ms mark overall performance is still double that of 20 terminals.


10 Warehouses, 100 Terminals
0 5 10 15 20 25 50 100 150 200 0 500 1000 1500 2000 2500 3000 3500 10 W 100 T fb306-cs fb306-sc fb306-ss Latency (ms) NOPM

Here we see the overall trends continue. When running with so many connections SS only becomes the best performer after the 25ms mark. And at 200ms all architectures are still processing more new orders than with 50 terminals.


Turning the numbers around again…​

As before, lets look at what happens when we spin the numbers around.

10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Classic 0 5 10 15 20 25 50 100 150 200 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Classic 0 5 10 15 20 25 50 100 150 200 Terminals NOPM
10 20 50 100 0 500 1000 1500 2000 2500 3000 3500 Fb 3.0.6 Super Server 0 5 10 15 20 25 50 100 150 200 Terminals NOPM

These graphs show the tendancy quite clearly. As network latency increases the connection pool can and must be increased. But, in all cases, if high performance is required then net latency must be eliminated.

Network Latency and Blobs

Blobs present a problem as far as network latency is concerned. The Firebird network protocol will pull result sets in batches of rows. So while there are several exchanges between the server and client in a single sql request this will ultimately result in as many rows as can be fitted in each packet. But blobs are retrieved separately and each blob requires several exchanges to set up the transmission. This can have an exponential effect on the number of packets that are sent. A result set of ten rows, each with a single blob will see approximately ten times the number of packets exchanged across the network and this is before we even consider the size of the blobs themselves.

The benchmark we have used does not include blobs so the above graphs do not give us any idea about the performance impact of blobs on net latency. This will be the subject of a future article.

What does this mean for the user?

This benchmark only tells us that data throughput is reduced. It doesn’t tell us how the user experiences the latency. The benchmark has some data on this but it is awaiting detailed analysis. This will also be the subject of a future article.

What can be done to mitigate the impact of net latency?

The problem of net latency highlights the limits of the client server model. If network latency is a problem then a three tier structure is recommended. The middle tier should be as physically close to the server as possible and should manage user requests, rather than the server receiving them directly. This model is typically deployed as a web interface, although that is not the only way to implement a middle tier.

Firebird configuration settings

Firebird itself has limited options over tcp configuration.

TcpRemoteBufferSize

This controls the maximum size of each packet. Firebird.conf defaults to 8192 bytes. Changing it to 32767 using this particular benchmark had a statistically insignificant impact on performance. This may well be due to the relatively short row size of the tables and the small result sets returned.

Other possibilities

Host O/S configuration

This is an area which would require several articles themselves. Changes are certainly possible. However, it is debatable whether the effort required to study and test them would be repaid with significant performance improvements. Changing the host o/s network config presents many problems:

  • There are typically hundreds of options, often poorly documented.

  • There is no immutable default config which can be used as a baseline. That is to say that this is an area where the defaults will change from one O/S version to another. This is particulary a problem on linux, where values may change with each new kernel release and distros themselves tweak the values for different use cases.

  • The configurations are typically conservative

  • There are many articles which discuss this subject but they typically use examples of a specific system. The values will not apply across all systems and will often be out of date as new versions of a distro or an O/S appear.

  • Measuring the impact of a change or set of changes is difficult and time consuming. The test runs presented here took many hours to complete.

Actual Hardware Problems

If the physical distance between the server and the client is not great then it is worth eliminating the possibility that there is a hardware problem somewhere - bad cables, loose connections, perhaps. Or a switch that might be dying.

The routine disclaimer and denial of all responsibility

As usual with synthetic benchmarks this data and the conclusions should be taken with a pinch of salt. The results apply to the specific combination of hardware, software and test bench. Other configurations will provide different results. However, the results also confirm the hypothesis and are in line both with previous experience and expectations of database server behaviour.

Conclusions

  • These tests do give strong support to the premise that network latency has an impact upon Firebird performance.

  • The data collected so far indicates that performance loss will be noticeabe at around the 0.5 millisecond mark.

  • Other bottlenecks such as CPU (and probably disc i/o for mechanical disc-based subsystems) are more probably critical than network latency.