Clustering with Tigase 4.2

Submitted by Anonymous on Wed, 2009-10-07 14:09
::

Hi Kobit,

Hope you are going good!

I am using tigase version 4.2 and trying to set up a cluster between two machines. I have few doubts regarding same and would appreciate if you can answer them:

1.) In a cluster, is it mandatory to have session manager on one machine and c2s,s2s on other machine or I can have
-gen-config-all on both machines.

As of now I have same following configuration on two machines(mc1 and mc2):
config-type = --gen-config-all
--cluster-mode=true
--cluster-nodes=mc1,mc2
--cluster-connect-all=true
Will it work? I get an exception when mc2 tries to connect mc1 on port 5277 i.e Cluster Component I believe.

2.) I have also written a custom component. Do I have to do anything special to make it run in cluster? Session Manager, C2S and S2S are default ones.

Also, is there any example configuration which I can use to see it working?

Cheers

Artur Hefczyc's picture

There is a bug in the Tigase

There is a bug in the Tigase server 5.0.0 which causes problems with clustering. I am sorry for this. Please update your installation as described in online documentation.
More details in this blog entry.

Artur Hefczyc's picture

Ok, it looks like a few

Ok, it looks like a few people are reporting problems with clustering in version 5.0.0. Maybe there is a bug which got undetected during tests. I will setup a cluster in my system tomorrow and check whether it works or not and will try to replicate the problem.
I am going to post some results by the end of the day tomorrow.

I'm using 5.0.0-b2135.

I'm using 5.0.0-b2135.

Artur Hefczyc's picture

What server version do you

What server version do you use?

Hi, Thank you for your reply.

Hi,

Thank you for your reply. Machine check results on host 1:

OK, DNS settings for chat1
OK, SRV record found _xmpp-server._tcp.chat1
OK, SRV record found _xmpp-client._tcp.chat1
OK, DNS settings for chat
OK, SRV record found _xmpp-server._tcp.chat
OK, SRV record found _xmpp-client._tcp.chat
OK, The chat1 host accessible through the network
OK, The chat host accessible through the network

All tests pass on host 2, too. The issue remains, however. I'd be grateful for any further advice and will provide any info you need to help troubleshoot this issue.

Artur Hefczyc's picture

Have a look at this comment

Have a look at this comment describing network configuration elements which are critical for a correct cluster setup working.

hi, I'm having the exact same

hi,

I'm having the exact same issue nara fugu has, so I'm very much interested in solving the problem. My init.properties file:

config-type=--gen-config-all
--admins=admin@chat.domain
--virt-hosts = chat.domain
--debug=net,io,xmpp,cluster
--monitoring=jmx:9050,http:9080,snmp:9060
--user-db=mysql
--user-db-uri=jdbc:mysql://chat-db/chat?user=chat&password=password
--cluster-mode=true
--cluster-nodes=chat1,chat2
--sm-cluster-strategy-class=tigase.cluster.strategy.SMNonCachingAllNodes

Cluster nodes seem to be communicating just fine according to the logs, but users connected to one node cannot communicate with users on the other node.

Artur Hefczyc's picture

I am traveling these days and

I am traveling these days and I am unable to run any tests right now. I will check it out on my tests system when I am back. But this won't happen before 26th April unfortunately.

I have already looked at that

I have already looked at that presentation and everything that is listed in there is already configured properly.

The issue seems to be the sess-man communication between servers. Looking at the logs I see messages that both servers are connected, but when I try to send a message between servers I can see that the server can not find a connection for the user:

2010-04-12 15:22:49 SessionManager.getXMPPResourceConnection() FINEST: Searching for resource connection for: test@example.net
2010-04-12 15:22:49 SessionManagerClustered.processPacket() FINEST: Ressource connection found: null

Also, because of this when I send a message between clients that are not connected to the same server I get "Service not available. (Code 503)" I would expect that the message would go into the offline store.

I'll continue investigating but any ideas you have would help. It is disappointing that clustering does not seem to work with the simple configuration advertised in the presentation.

thanks!

Artur Hefczyc's picture

There are just 2 or maybe

There are just 2 or maybe three parameters related to the cluster mode which can be used in the init.properties file. I don't think there is an example for this yet. I will try to write up a proper guide for this.
In the meantime I can recommend the presentation which also describes the configuration options: http://www.tigase.org/cluster-presentation

Is there a basic "cluster"

Is there a basic "cluster" init.properties file that I can use to make my configuration? I think it would help to have an example especially since not all of the config options are listed in the default init.properties file that comes with tigase5. Perhaps there is something I am missing or some parameters that could be tweaked for performance.

thanks!

Artur Hefczyc's picture

Hm, it is hard to tell

Hm, it is hard to tell without looking into the system. I am pretty sure the clustering mode was tested well but maybe there is some bug yet. I will look at it as soon as I can.

I am having a similar problem

I am having a similar problem with tigase 5. I have two nodes that are configured to talk to the same database. In addition to the config being the same, using Pidgin I can log in to either XMPP server with the same user, so I know the same database is being used.

If two users are connected to the same server (both to tig1 or both to tig2) everything works properly: presence, online status etc. If I have userA connected to tig1 and userB connected to tig2 then none of the presence/status information is received and sending messages between clients results in a 503 error.

From the logs it seems that the servers are clustering correctly, and as an admin I get the "cluster node tig1 connected to tig2" messages from sess-man.

If I change the init.properties to use the same session manager on both servers:
c2s/routings/tig.domain = sess-man@tig1.domain

everything works properly.

At this point I'm fairly confident it is not the database and it's not DNS. Any other areas I can investigate?

thanks!

Artur Hefczyc's picture

Hi Xue, what you did here is

Hi Xue, what you did here is redirecting all user packets to a session manager on the other cluster node.
While this works for sure, it is not a proper solution. As you pointed it out, it gives you only one working session manager which is vulnerable and not scalable.
But by this test, at least you have proven that both nodes are connected to each other and they communicate properly. In such a case network configuration is most likely correct and is not a source of your problems.

In my opinion the most likely cause of this is the fact that each node may connect to a different database. As I can see from the init.properties file you posted some time ago, database connection string points to the 'localhost'. So each node connects to a database on the local machine. Unless the database is clustered too, they use a separate databases, hence this can not work properly.

hi, I did connect to same

hi, I did connect to same database.

And I did put hosts entries in both servers' /etc/hosts files, i can ping each other using the hostname in the console.

I did some experiment, if i change the second nodes' config for routing

 <node name="routings">
   <map>
    <entry value="sess-man%40host2" type="String" key=".+"/>
    <entry value="true" type="Boolean" key="multi-mode"/>
   </map>
  </node>

If i change sess-man%40host2 to sess-man%40host1, actually it works as I want, but the problem is this kind of setup a fully distributed cluster? this still requires a central Session Manager.

Artur Hefczyc's picture

At the moment there is no

At the moment there is no notification sent to any of users that some users went offline because of the cluster node failure.

I am thinking however about a way to implement it.

Hi One more doubt here

Hi

One more doubt here regarding federation between two tigase clusters.

Suppose our deployment consists of four tigase servers/nodes (A,B,C,D). A and B form one cluster (Cluster X) and C and D form another (Cluster Y). Now Cluster X and Cluster Y are setup to federate with each other. Basically there is a s2s connection between X and Y.

The deployment contains 4 users. User A connected to A, User B connected to B, User C connected to C and User D connected to D. All the users are added in each others rosters.

Now if one node from cluster X, say Node A goes down what will happen to the presence of User A as seen by his buddies (User B, User C and User D). Will his presence go offline as seen by User C and User D. Needless to say that Cluster X is still functioning since Node B is still up.

Thanks

Artur Hefczyc's picture

Hm, I am not sure if your

Hm, I am not sure if your question is related to clustering at all. I think it is more related to s2s connection between 2 servers and what happens if one of the servers goes down. Please note in some cases it is not even possible to determine that the server went down. Servers don't keep s2s connection between them all the time. The connection is dropped if it is not used for a while. Hence if one server goes down in this time there is no way of knowing that for other servers unless they attempt to open a new s2s connection.
What happens then, whether the server who stayed alive notify all its users about other server failure depends on the implementation. I think that notifying all its users that some server went down and buddies from this server are offline might be very expensive task and I don't think any implementation does that.

Of course, in case of the clustering environment, the case is a bit different. If one node goes down, there are other nodes which know what happened and could in theory notify other servers that some users got disconnected. Tigase doesn't do that. Again the task is very expensive in term of resources needed and I think it is not even worth the effort. If one node goes down, there are other nodes still alive. So in most cases users would just reconnect to other node and be online again. This normally happens within seconds so from the users on other servers point of view this doesn't matter.

Hi, I want to understand the

Hi,

I want to understand the HA & Load Balancing behavior of Tigase for Server to Server connections.

Suppose there is a multi-node (Clustering) set up of Tigase doing xmpp federation.

Let's assume that there are two nodes in Tigase set up "A" & "B'. Some users (UserA, UserB)are connected to node "A" and some users (UserC, UserD) to node "B".

There are federated connection to the remote server (another Tigase instance in clustering mode)from both node "A" and node "B'.

UserA, UserB, UserC, UserD are added in the buddy list of the remote users (Users on other instance of Tigase).

If node A of Tigase goes down, will the buddies on remote server see UserA & UserB offline?

Thanks,

Artur Hefczyc's picture

I guess you are missing a

I guess you are missing a few things here.
First you should be aware that the default configuration is most likely not suitable for clustered deployment.
Secondly all your cluster nodes must connect to the same database in order to function properly. Ideally this is a good SQL database like MySQL or PostgreSQL.

Your problems with users not seeing each other may be caused by 2 things:
1. The cluster nodes are not connected which is usually caused by incorrect network (DNS) configuration.
2. Both cluster nodes use a different database.

Please check both above and let me know if this solves your problem. If not please let me know, I will try to help you.

Hi, I have followed your

Hi, I have followed your suggestion to set up two nodes across two machines A & B.

However, if User XXX logins to A, and User YYY logins to B, XXX cannot see YYY's presence at all, and they cannot send message to each other.

And the default generated config is using Deby, not Mysql, i have to manually change myself, the virtual domain name also didnot generate properly in tigase-server.xml, i have to add manually.

Do I miss any important steps?

My init.properties is :
--cluster-mode = true
config-type = --gen-config-all
--cluster-nodes = XXX,YYY
--debug = server,xmpp.impl,db,cluster
--user-db = mysql
--admins = admin@XXX
--user-db-uri = jdbc:mysql://localhost/tigasedb?user=tigase&password=tigase12
--virt-hosts = XXX.abc.com
--comp-class-2 = tigase.pubsub.PubSubClusterComponent
--comp-name-2 = pubsub
--comp-class-1 = tigase.muc.MUCComponent
--comp-name-1 = muc
--sm-plugins = +jabber:iq:auth,+urn:ietf:params:xml:ns:xmpp-sasl,+urn:ietf:params:xml:ns:xmpp-bind,+urn:ietf:params:xml:ns:xmpp-session,+jabber:iq:register,+roster-presence,+jabber:iq:privacy,+jabber:iq:version,+http://jabber.org/protocol/stats,+starttls,+msgoffline,+vcard-temp,-http://jabber.org/protocol/commands,+jabber:iq:private,+urn:xmpp:ping,+basic-filter,+domain-filter,+pep,+zlib

Artur Hefczyc's picture

I am afraid I won't be able

I am afraid I won't be able to prepare "in-depth documentation" for the Tigase clustering any time soon. You can have a look at 2 presentations I have prepared for the Tigase clustering: Tigase clustering presentation video and Clustering in the
Tigase server PDF
. I hope, you would find there some useful information. If not, please come back to me with more questions.

Your questions:

ad. 1. At the moment the Tigase server does not load balance users itself. The simplest and the cheapest way to load-balance users among cluster nodes is via DNS round-robin. We use it on a few installations and it is quite effective indeed. The more expensive and more complex (not sure if better) way to do load-balancing is via using a specialised hardware routers. I know that Cisco and a few others do such hardware. If you are interested I can point you to someone who knows more details about it. We are also planning at some point to add load-balancing inside the Tigase server. No timeline for that yet, however.

ad. 2. This depends... From the version 4.3.1 the Tigase supports pluggable clustering strategies. So what exactly happens depends on the clustering strategy used. The simplest - trivial strategy works as follows: if the XMPP packet can not be processed (user is not connected to that node) on one cluster node (N1), it is sent to another node (N2) for processing. If the N2 node can not process the packet it sends it to N3 node. And so on... Eventually, either one of the nodes processes the packet or the packet is being sent back to the first node where it is processed as a packet to offline user.
This works quite well for low number of cluster nodes, up to 3. For more nodes this is not effective strategy as it causes lots of network traffic between nodes. There are different strategies possible and some of them are dedicated for specific deployments.

ad. 3. If some node goes down, then all users get disconnected from that node. This node is no longer part of the cluster, hence the cluster behaves like it has never been part of the cluster. Normally all users should reconnect to a different cluster node or stay offline. The packets to all these users are processed accordingly.

ad. 4. Yes. From the Tigase version 4.3.1 there is a dedicated API in the server for pluggable clustering strategies. In fact a dedicated strategy designed for a specific deployment can lead to huge performance improvements. So this is certainly feasible, however, from my experience I know that there is no ultimate clustering implementation for all kinds of deployments. For large deployments you have to look at the traffic shape, users distribution and a few other parameters and design the best custom strategy. Please note also that to get best results you can not have a universal strategy code for all XMPP components (Session Manager, MUC, PubSub, ....). Each component needs a dedicated strategy code to make sure optimal performance is achieved.

Certainly the clustering strategy API is one of the significant additions. Apart from that, clustering implementation for session manager has been reworked to make it more robust and reliable in case of communication problems between cluster nodes.

I hope this explanation sheds some light on your questions.

Hi Artur, If you can really

Hi Artur,

If you can really explain how tigase clustering works in detail, as i could not find any in-depth documentation about tigase clustering. e.g few questions which i have in my mind right now are:

1. how clients are load balanced across the different nodes
2. If there are 3 nodes N1,N2 & N3 what is approach taken when client connected at N1 sends a message to client connected at node N2.
3. If N3 node in cluster goes down then, how all the messages directed to clients connected to N3 are handled?
4. Can we really customize clustering to decrease latency, network traffic by implementing some kind of hashtable strategy, so that N1 directly talks to N3. Don't know how feasible can this be.

Also if you can shed some light on clustering changes you have made in Tigase 4.3 and how they enhance the performance & robustness of clustering.

Artur Hefczyc's picture

Hi Sukhi, The network setup

Hi Sukhi,

The network setup required for Tigase clustering, and the DNS configuration in particular is rather operating system independent, hence it doesn't matter what system you use (although I strongly recommend using Unix like systems over MS Windows for this kind of software).

Anyway, back to your question.

There are 2 types of DNS names used in your XMPP cluster:

  1. virtual host names this is a hostname (domain) visible to your users like jabber.org, company.com and so on. Your whole cluster, regardless how many nodes you have is visible to users as one server working for this virtual domain. This is defined by --virt-hosts property in the init.properties file. And you can have as many as you like virtual hosts for your XMPP installation. If you query DNS for your virtual domain it should return an IP address of one of the cluster nodes. Every time you query DNS it may return a different IP address (IP address of a different cluster node). This is your example.com
  2. real host names are names unique to each of the cluster node. They are not related to virtual names and they can have a form - node1.internal.net, node2.spare.internal.net and so on. These are your mc1 and mc2. They are normally not visible to your users and they are only used internally by the Tigase cluster nodes. The important thing is that if you query DNS for the node hostname mc1 it must always return one and the same IP address of the proper cluster node. This is what you put to --cluster-nodes property in the init.properties file.

The simplest way to check whether your virtual names are configured properly is to ping your example.com and see whether it returns an IP of one of the cluster nodes, ideally every time you ping it, it should display an IP of a different cluster node.

The simplest way to check whether your real names are set correctly is to call 'hostname' command on each node and see what it returns and then ping from one node the other node to see if it can be contacted successfully.

To check what the Tigase sees and think is the real hostname you have to look inside the tigase.xml file and look for string 'def-hostname'. This key points to a default hostname detected on the system. See whether it is your mc1 or mc2.

Thanks for the prompt

Thanks for the prompt reply..

"**..........that DNS is set correctly your cluster nodes hostnames**"

Could you please explain this network configuration bit more (Apologies if it sounds real stupid question, I am total novice in this area).

Here I have two machines lets say with hostname mc1 and mc2 respectively. I will specify these hostnames for key --cluster-nodes in init.properties.

Now I should have a DNS say example.com and when client connects through example.com, the DNS should be able to resolve mc1 or mc2. Correct?

We don't have to configure this DNS on tigase side. Would you be having any resources which I can follow to create such setup for Windows XP machine?

Would appreciate your help..

Cheers

Artur Hefczyc's picture

First I would greatly

First I would greatly recommend to use the last Tigase 4.3.1 version as clustering has been significantly improved in this version.

ad. 1) The Tigase supports full clustering for HA and LB which means all cluster nodes can have an identical configuration and when one nodes goes down, all other nodes can carry on providing full functionality. Therefore it is recommended to use --gen-config-def or --gen-config-all on all nodes. Please note, not all Tigase components fully support clustering yet - MUC, PubSub. For such components I suggest using virtual components stuff.

Your configuration is almost correct. You just have to remove line:
--cluster-connect-all=true
It is not needed in your case.

Also please note, network configuration is a critical issue for the Tigase cluster. Therefore you have to make sure that DNS is set correctly your cluster nodes hostnames. You have mc1 and mc2. You have to make sure that 'hostname' command on each node returns the same thing and that both names are resolvable by your DNS server.

ad. 2) Yes, you have to implement clustering for your component on your own as clustering is component specific thing. Each component does clustering in a different way. Alternatively you can use Virtual components stuff mentioned above to get your system working with components which don't work in the cluster mode.

What configuration example are you asking for?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.