Testing
About stress testing
Why doesn't the Turms server provide a stress test report
For two simple Java functions that do the same function, we can easily test their respective performance through JMH. But for projects as large as Turms, there is no such silver bullet. Its complexity is mainly reflected in the following three aspects:
Turms supports a variety of different architectures, and many functions also support opening and closing.
For example, in configuration parameters It is mentioned in the chapter that some use cases do not even require data storage, so under the same conditions, the natural throughput of applications without storage is faster than those with storage.
Another example is our About message accessibility, order and repeatability mentions that the Turms server disables the 100% message delivery is supported. The reason is that there is a price to support 100% message delivery. It requires at least one Redis server to distribute serial numbers at the session level. Every time a message is sent, a serial number needs to be requested. The throughput is naturally not as good as the scenario that does not support 100% message delivery.
Another example is whether the server needs to push the user status when the user logs in. For scenarios that do not need to be pushed, the pressure on the server is naturally much less than that for scenarios that need to be pushed.
Another example is in the Observability System - Log chapter It is mentioned that the Turms server defaults and recommends 100% log sampling of user requests, and 100% sampling requires a large number of I/O operations, and its throughput is naturally not as good as that of operations that do not sample at all.
For the realization of most of the business requests, most of the requests sent by the Turms server to MongoDB are used for user authority verification and data verification, and only a small part are for the final execution of the business functions instructed by the users. For example, if user A bans user B in group 123, the status of user A, user B, and group 123 must be checked separately, and user B will be banned only if all checks pass. Without these checks, the throughput will naturally be much higher, but there is no real project other than toy projects that will not do checks.
The Turms server usually deletes related data through distributed transactions when deleting certain business data. Without distributed transactions, the throughput will naturally be higher than with distributed transactions, but it is easy to generate dirty data.
Turms provides many caching functions and will support more caching functions in the future. Caching is a classic example of trading space for time. Taking the group member cache as an example, when a group member sends a message, the Turms server needs to query the member list of the group. For the scenario where the cache is used, the Turms server can query based on the local Map, and its throughput is naturally much higher than the scenario where the query request is sent to the database without using the cache, but the advantage of the scenario without using the cache lies in the group list High real-time performance.
The stress test results of stand-alone and distributed are also completely different. Even the Turms server will support: In the stand-alone deployment scenario, the Turms server supports Unix Domain Socket without using TCP connections.
To sum up, if Turms only wants to write a good-looking stress test report, the Turms server does not need to store any data, guarantee that messages will arrive, push user status, or perform permission verification, data verification and logging on user requests. Sampling, all business operations do not use transactions, all data is cached for a long time, etc., the final throughput is naturally high, but such a stress test report is like a castle in the air, and not many real scenarios will use such a set of configurations . This is not only the reason why our developers of medium and large applications are reluctant to provide simple stress test reports, but also the reason why we do not trust the pressure test reports provided by other medium and large applications.
When we look at the performance of any application, whether it is fast or slow, we mainly ask "why is it so fast/slow?". For example, when we are researching why the JVM takes up so much memory, if we only see Java's extremely redundant and common object headers, we will sigh "It turns out that it is a redundant design problem, it is the author's bad design, no wonder takes up so much memory”, but if we look at the design and use of Code Cache by JVM, we will sigh again, “It turns out that space is exchanged for time, and it is the author’s good intentions. No wonder it takes up so much memory”, and the evaluation direction is completely different. .
In the final analysis, laymen watch the excitement, and experts watch the way. Not to mention medium and large applications, even if it is a small Java library, we can't believe it when we look at its performance report. For example, Log4j2 shows its excellent performance in its performance report, but if we read its source code, we will find that Log4j2 The implementation is not efficient, and it is Turms' self-developed log implementation based on Netty that really maximizes performance (for details, please refer to Self-developed Implementation design document, and the specific code im.turms.server.common.logging.core.logger.AsyncLogger#doLog
), as long as the comparison The implementation of the source code of the two will find that the two are not at the same level in terms of performance optimization. In order to facilitate users to see the doorway of Turms, the documents are written in detail and the location of key codes will also be marked, so that users can evaluate whether Turms is suitable for their own application scenarios.
turms-performance-testing project (preview document)
Although Turms does not plan to provide a ready-made stress test report, we will customize a distributed stress test platform for the Turms server in the near future. The platform's UI display and report analysis will be in charge of turms-admin, while node control and task execution will be in charge of the Controller node and Agent node in turms-performance-testing respectively.
In particular, the reason why Turms can quickly customize and develop many platforms also benefits from our reasons for secondary development based on Turms mentioned "controllability. The Turms project is 100% open source and has self-developed many basic middleware to ensure the controllability of the underlying technology. It avoids insufficient development motivation in the later stage of the project", so we will not be subject to third-party dependence when we do new projects, and we are full of motivation.