How packets are processed by the SM and plugins

For the Tigase server plugin development it is important to understand how it all works. There are different kind of plugins responsible for processing packets at different stages of the data flow. Please read the introduction below before proceeding to the actual coding part.

Introduction

In the Tigase server plugins are pieces of code responsible for processing particular XMPP stanza. A separate plugin might be responsible for processing messages, a different one for processing presences, and there might a separate plugins responsible for iq roster, different for iq version and so on.

A plugin provides information about what exact XML element(s) name(s) with xmlns it is interested in. So you can create a plugin which is interested in all packets containing caps child.

There might be no plugin for a particular stanza element and then a default actions is used which is simple forwarding stanza to a destination address. There might be also more than one plugin for a specific XML element and then they all process the same stanza simultaneously in separate threads so there is no guarantee on the order in which the stanza is processed by a different plugins.

Each stanza goes through the Session Manager component which processes packets in a few steps. Have a look at the picture below:
The picture shows that each stanza is processed by the session manager in 4 steps:

  1. Pre-processing - all loaded pre-processors receive the packet for processing. They work within session manager thread and they have no internal queue for processing. As they work within Session Manager thread it is important that they limit processing time to absolute minimum as they may affect the Session Manager performance.
The intention for the pre-processors is to allow them for packet blocking. If the pre-processing result is ‘true’ then the packet is blocked and no further processing is performed.
  2. Processing - this is the next step the packet gets through if it wasn’t blocked by any of the pre-processors. It gets inserted to all processors queues which requested interest in this particular XML element. Each processor works in a separate thread and has own internal fixed size processing queue.
  3. If there is no processor for the stanza then the packet goes through all post-processors. The last post-processor in built into session manager post-processor which tries to apply a default action to a packet which hasn’t been processed in step 2. Normally the default action is just forwarding the packet to a destination. Most commonly it is applied to <message/> packets.
  4. Finally, if any of above 3 steps produced output/result packets all of them go through all filters which may or may not block them.

Important thing to note is that we have two kinds or two places where packets may be blocked or filtered out. One place is before packet is processed by the plugin and another place is after processing where filtering is applied to all results generated by the processor plugins.

It is also important to note that session manager and processor plugins act as packet consumers. The packet is taken for processing and once processing is finished the packet is destroyed. Therefore to forward a packet to a destination one of the processor must create a copy of the packet, set all properties and attributes and return it as a processing result. Of course processor can generate any number of packets as a result. Result packets can be generated in any of above 4 steps of the processing. Have a look at the picture below:


If the packet P1 is send outside of the server, for example to a user on another server or to some component (MUC, PubSub, transport) then one of the processor must create a copy P2 of the packet and set all attributes and destination addresses correctly. Packet P1 has been consumed by the session manager during processing and a new packet has been generated by one of the plugins.

The same of course happens on the way back from the component to the user:


The packet from the component is processed and one of the plugins must generate a copy of the packet to deliver it to the user. Of course packet forwarding is a default action which is applied when there is no plugin for the particular packet.

It is implemented this way because the input packet P1 can be processed by many plugins at the same time therefore the packet should be in fact immutable and must not change once it got to the session manager for processing.

The most obvious processing workflow is when a user sends request to the server and expects a response from the server:


This design has one surprising consequence though. If you look at the picture below showing communication between 2 users you can see that the packet is copied twice before it is delivered to a final destination:


The packet has to be processed twice by the session manager. The first time it is processed on behalf of the User A as an outgoing packet and the second time it is processed on behalf of the User B as an incoming packet.

This is to make sure the User A has permission to send a packet out and all processing is applied to the packet and also to make sure that User B has permission to receive the packet and all processing is applied. If, for example, the User B is offline there is offline message processor which should put the packet to a database.

Application: 

Comments

Hi,
This design suggests that in the application we have multiple threads creating plenty of objects (we need a large young generation). The link on oracle http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html suggests to use throughput collector in this case. Do you agree with this ?

This design, actually, allows for high reuse of allocated objects. A single object (XMPP packet) goes through all the required processing. In Tigase Packet object is kind of unmodifiable and thread safe. By kind of, I mean it is unmodifiable by convention and code implementation, it is not forced though.
Therefore, many concurrent threads can process the same XMPP packet concurrently. Not extra allocation is necessary.

However, you are partially right. If the service is under a high load, like 50k packets per second, then, there is plenty of object being allocated every second. But this is not a result of the design, rather than a result of the server load.

I was experimenting with different GC settings and it seems to me that large young generation does not work well for Tigase. It causes long delays in processing when GC goes through all the objects, this in turn causes internal queues to grow which may lead to packets loss. The Tigase performs much better when the collection occurs more frequently but takes shorter time.

I personally recommend to use concurrent Mark Sweep and incremental mode GC which runs GC in background whenever possible.

If you find a better settings I would be very grateful if you share your settings. Just be aware, testing different GC settings is a very time consuming tasks.

Hi,
Thanks for the clarification, But what we should choose in case the server works on high load most of the time ? I am finding throughput collector to perform better on very high load.

Would pooling/re-using the un-referenced packet objects help here? Put another way, rather than worrying about GC tuning wouldn't it be better just to reduce heap churn in the first place?

There are, for sure, many ways to optimise and improve stuff. Some of the ideas are very promising but also very expensive from the work effort point of view.
Reusing packet objects seems like an interesting idea, however, it might be very hard to do. First of all, in theory almost every packet is different from other packets. This is especially true for messages. Message has different content and different source and destination address from other messages.
Presences are more alike, server distributes your status to all buddies, so except destination address the rest is actually the same.
However, on a busy cluster installation traffic gets to 500k packets per second. Keeping track of the packets and checking whether data received from the network match any of already processed packet seems like quite extensive task to do.
Right now Tigase does have some specific optimisations which reduce number of objects in memory and improve performance in many cases and we are working on more improvements all the time.

In the ending part of this article there is a referene to some "offline message processor". Where should I be looking If I wanted this processor notify me whenever there was a new message for an offline user ?

thanks

Have a look at the code of the processor here:
https://projects.tigase.org/projects/tigase-server/repository/entry/trunk/src/main/java/tigase/xmpp/impl/OfflineMessages.java

hi there,
I'm developing module, in which need to update staffs status for all.
For example, when i open staff list, i can see status of all.

If i use roster list, all staffs must be friend of each other. And with amount of 10.000 or more staffs, could you answer me 1st question: is it ok.

2nd question: If i do not use roster list, i have idea to create plugin, waiting and listen presence online and offline, and send to 1 node, which broadcast to all staff about user status, also write to db.
I have tried to write a plugin, but i can't catch any presence offline packet.
Here what i did:
I tried to created plugin the same as your presence plugin, and put code:
System.out.println("====================" + packet);

But i couldn't catch any offline packet.
Could you please give me a solution for this issue?

Thank you so much.

There is a simple solution to your problem which does not require any coding in Tigase. Please have a look at the article. It allows you to setup presence packet forwarding in Tigase to a given address which can collect all users presence information.

Thankyou kobit,
nice feature, however it's still not really what i need.
Because, bot just update status to DB.
if i need to update status to all client online, bot still must send it to all staffs.
Also when create bot, i need to make sure bot always run, and connected to Tigase.
is it good solution for me?
I prefer to implement plugin, however whether it's possible? if no, i'll go on with bot as your suggestion.
Thank you anyway.

I thought you just need to collect status for all users and present it somehow online. If you want to distribute user's status to all other users on the installation, the only sensible solution is that everybody is in everybody else's roster.
Then Tigase would take care automatically for presence status distribution and update.

Otherwise, I do not see a good solution. Even if you would be able to somehow push the user's presence all other users (and you have to know who is online and who is not), then, most likely, client software would reflect presence information from users not in the roster.

On the other hand, if you have 100k online users on the installation, then this would mean everybody has 100k elements in the their contact list.
Is this really what you need?

This page refers to several pictures, but they are actually not a part of the text.

Fixed, thank you for noticing.

Thank you for fixing them so quickly! I will let you know if i find other places with the same problem.

Dear Kobit,
we want to develop a chat System using tigase. we already built your source code under eclipse and went through your source code.Our system will have a messaging portion from webpage.so any user who visit our website may click a chat option as anonymous user and should be logged in with our server automatically. can you tell me how and from which portion of your source code should be modified?

There is no need to modify any part of the code. Everything to support your web chat is already implemented and you do not need to change anything.

Please ask your questions in a topic on our forums instead of comments to articles.