Concurrent Users Estimate?
are there general rules how many concurent users a certain hardware configuration can manage (with the default config)?
I'm trying to estimate a number for an 8-core system with 8 gig RAM.
i'm going to explain a bit why i ask:
i'm trying to load test a tigase server on a 8-core maschine. my tigase has the default configuration with the muc component. I'm using tsung to generate 20k users. According to one of the load test articles on this site a 3ghz dual core (i guess without muc) can handle 150k users. so even if muc has a big performance impact 20k sounds manageable.
Unfortunately it isn't.
--user-db = mysql
--admins = ...
--user-db-uri = jdbc:mysql://...
config-type = --gen-config-def
--virt-hosts = xxx.net
--comp-class-1 = tigase.muc.MUCComponent
--comp-name-1 = muc
JVM-Parameters: -server -Xms6G -Xmx6G -XX:+UseLargePages -XX:+UseBiasedLocking -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParalleCMSThreads=8
my tsung tests connects, waits for the global ack at 20k users. every user joins one of 200 rooms (meaning ca 100 users per room) and the send one message every 30 seconds. The CPU percentage is ok, the memory consumption is ok too.
The tsung results say that a request takes 1mn 31sec (mean). The longer the chat-phase takes the longer the requesttime is.
My currently looking into the mysql-db but the ping between the servers is extremely short so the db-server shouldn't be the bootleneck.
Beyond that, i'm clueless.....
There are no rules unfortunately. Almost every single installation I was involved in was very different from others. Maybe this is because Tigase is rarely installed on a "standard" chat service. Normally these are large installation with some specific features which affect resource usage.
Some indication should give you test reports published on my website:
- 1mln online users on cluster of 10 nodes max concurrent users I could get was 160k per machine, each user has 150 friends in the roster, 8 cores, 16GB RAM
- half million on a single machine, 500k concurrent users on a single machine, empty roster, SPARC Enterprise T5220, 32GB RAM
- service sharding, max 24k concurrent users on a single machine in a single mode, 10k in cluster mode, roster size 150, Intel ATOM N270 machines with 2GB of RAM
Normally Tigase can process up to 10k packets per second per single CPU core.
CPU is used by the traffic, memory mainly by the roster.
I hope this helps.
unfortunately this might explain some things.
my tsung client logged that he recieved (maximal) 120.32 Mbits/sec and the erlang-monitor logged that my server recieved (recvpackets:os_mon@...) 365887.00 packets / sec (max) and 2865.00 / sec (min)
so even if my cores are extremely performant and double your 10k. that seems way to much.
I don't understand, could you please explain what you mean? Do you think Tigase is too slow or the traffic is too high? For how many user's connections have you run your tests?
because i'm using muc one incoming packet to a channel with 100 users becomes 100 packets. so i guess this kills the server (the outgoing packets coun't against the 10k, right?)
but now i'm uncertain. how can my masses of packets kill the server when the cpu works mostly at 50 to 60%. I'm clueless again.
I wanted to implement a regular chat system for an online game by using xmpp and tigase. I'm not sure this is possible now.
i want an edit button.
i'm running a tsung test with 20 000 users. These users use the multi user chat and generate (according to tsung) up to 360 000 packets per second (i guess most of them are generated by the multi user chat).
If you are right, and one core can process 10 000 packets per second. my server can't handle this many packets. but then, why is my cpu not running at 100%.
Of course, the first step is to find out where the bottleneck is. Make sure you have logging switched off on the Tigase side, this slows it down significantly, secondly make sure the MUC does not log chat history to the hard drive. These 2 small tweaks can improve performance considerably.
But this is not all, user connection rate, how many logins per second are executed may also affect the overall system as this puts stress on the database.
And the final note, the MUC installed with the Tigase server by default is not designed to take full advantage of multi-cpu/multi-core systems. So this might be just another place to look for improvements.
the bottleneck seems to be the muc component itself.
Here is my theory again:
if i have a channel with 100 members one incoming message triggers 100 outgoing messages that need to be processed. In my load test with 20 000 users in 200 channels this lead to 360 000 packets per second top (one message every 30 seconds) and process times of 800 seconds and more.
I will try to verify this tomorow and i hope i'm wrong.
I hope you are right. If the MUC component is bottleneck indeed, then it is relatively easy to fix.
now i'm curious: how would i fix that? (besides removing the component, i really need muc...)
What I meant is that if the problem is identifiable then it is solvable. There are some mysterious problems happening very rarely and impossible to reproduce in test environment. The problems like this are extremely hard to solve.
So if you can confirm that the MUC is a bottleneck "we" can improve the MUC implementation. If MUC does not use all available CPU/Cores then we can fix the code to make a better use of all CPUs. If the MUC uses all the cores but still does not offer performance food enough we can look at the MUC code and see what else can be optimized and improved.
The thing is that I have never run load tests against the MUC component so this part can be least optimized in the Tigase. Which is actually good thing to know, because most likely there is lots of room to make it better.
i tested different scenarios, everything in the default config and with 20 000 users:
20 000 users distributed over 150 channels, everybody sends ONE message at the same time (as synchron as possible with tsung). Result: over one minute response time (mean)
20 000 usres distributed over 150 channels, everybody sends one message. all messages are send within a 20 seconds timeframe. Result: 46.32 sec response time (mean)
20 000 users distributed over 300 channels, everybody sends one message, all messages are send within a 20 seconds timeframe. Result: 47.87 sec response time (mean)
originaly i thought the high number of packets that are generated by the muc component would be the bottleneck, thats why i increased the number of channels for the last test. less users per channel = less generated packets.
But since the mean response time didn't change i guess the bottleneck are the number of packets that go into the component. but 1000 incoming packets per second seems pretty low. Muc seems to be very imperformant.
i'll do a last test with one of your tsung scripts that don't use muc. if the performance is much better than with muc (and i expect that to happen) i will be sure.
everything back to the begining.
the reference test with your tsung script from http://www.tigase.org/files/static/sun-tests/tsung_noroster_1mln_20090427/report.html was actually worse than my last test. i will talk to our admins tomorrow... i just don't get how i get 2 minute responste times for simple chat stuff while the cpus idle at 30 percent.
Please note, whether you distribute all users across 150 channels and you send 1000 msg/sec, then the system has to handle 1000*133 msg/sec. This is quite heavy traffic.
I wonder what is actually a goal of your tests. I can imagine 3 goals, one would be finding the system limits, the absolute max you can handle with the system, another is to find out how the system performs under your expected average load, and the last what equipment is required for your production system.
Another note, finding bottleneck in the Tigase is actually quite simple. You just have to look at the server stats during the test. Stats show you the system load, CPU, RAM, and also queues size in all components. So if one of the components (MUC) is bottleneck you should see queues growing on MUC component.
And once again, as I mentioned before, the MUC component might not be able to use all CPU cores on the machine. It wasn't tested under a heavy load, hence this might be likely the bottleneck.
If you are interested, we can work together on improving the MUC.
I am very curious on this thread. My team is working with a Tigase cluster but we haven't gotten to the point of performance / load testing the cluster yet. When we do, having any insight from these tests would be nice so please keep the thread going if possible.
well i'm trying to replace a home-brewed piece of software. in a real scenario this current chat server handles 40k users that don't chat much (1 message in 30 minutes per user) in channels with 100 to 150 users.
This server runs on my 8-core 8-gigs machine. I wanted to know how tigase compares with that. my worst cased was: everybody wrote his message at nearly the same time. I didn't realize at first that this would generate nearly 3 million packets (20k users * 133 users per channel). I can only guess that our current server would also fail in this situation.
But what i find strange is, that when i used a load test script from your page (which only sends presence information as i understand) this didn't change anything. And my users have an empty roster. i still get response times for requests of 40 seconds (mean) while the cpu runs with 60% maximum. There has to be a very strange bottleneck somewhere. I will test the speed of the database tomorrow, maybe the system spends a long time waiting.
I didn't use the server stats yet only yourkit (remote java profiling) and the tsung reports. I will ta
Shawn, you are very welcome to contribute to the discussion. I appreciate any input, ideas or suggestions. We all can benefit from exchanging thoughts on this topic. Load tests is surprisingly complex stuff and only after running several test you start realize that.
What is your user connection rate? The speed/rate at which new users connect may also affect the server performance. You try to connect users at a greater rate then your database can handle then this may cause packets queueing on the server, which, of course slow everything down. Then, when you start your load test on either MUC or anything else while your users are still being authenticated, this may blur the overall picture.
And of course, a similar load test on your own software would give you a good comparison.
You should really look at the Tigase server stats, at least JMX stats. This would give you a very clear picture what is really going on inside the Tigase server.
I continued my tests using tsung, yourkit (java profiler) and the server stats.
The main bottleneck is the muc component, because it's single threaded.
The muc-Component thread is saturated with something between 3k to 6k packets per minute to channels with 100 users. At this point the queues start to fill and the responds times go through the roof.
Basicly in this scenario (and on my machine) muc needs something like 10ms per packet (according to the server stats)
We are currently discussing if we have the time to make muc multithreaded ourself, because the current implementation wouldn't meet our requirements.
I understand your concerns. I have actually good news for you. Turning the MUC into multi-threaded component is very easy. Detailed instructions how to switch to multi-threading in the component are in this guide.
The reason that MUC does not have this implemented is that the MUC implementation predates the API and the guide. In fact the API is automatically available to all components, this is just the default number of threads is 1 for the component. This is because processing packets concurrently in the component requires careful coding to avoid resources conflicts.
I tried the simple switch to more threads for the MUC but obviously run to some conflicts. I did not have time to look at it so I had to revert back to a single thread mode.
If you have time and resources I think the fix should be simple enough to apply it with 2-3 days of work top.
we did a quick and dirty implementation with an executor pool and a runable for each muc packet. (its simple because the only connection points between the muc threads is the imucrepo; simply change the maps in the in-memory-muc-repo to concurenthashmaps) obviously this isnt stable yet but it was good enought to do some tests with it.
We found a major ressource problem that affects channel joins and channel leaves. Both operations have a big memory and cpu impact (cpu mainly through gc operations). that means, while our simple multithreaded muc handled our message load easily, a high number of presence packets to the muc killed the server. compared to message packets the presence packets needed ten times the memory during procession (wich results in a higher gc and therefore more cpu consumption)
I had to wrap up my evaluation yesterday so i dont' have time to do a closer look at this issue.
Please do not create own thread pool. Tigase has an API for this already, which is well tested and well established. You do not really have to reinvent the wheel.
For the Tigase's MUC you really have to change a single return value of one method and test it against possible resource conflicts.
Also if you decide to rewrite your own MUC implementation from scratch you should also use the Tigase API to both reduce your development effort and make sure it is fully compatible with the Tigase architecture.
There is a step by step guide for components development in the Tigase: http://www.tigase.org/content/component-development