Galera stopped working after reboot

Question

Galera stopped working after reboot

none

I was testing Galera and it seemed to work initially. But after that when I configured keepalived, everything messed up and I had to restore an image backup I had taken before installing keepalived.

But galera isn't working still.

Since I am still experimenting, the data wasn't important.

So I cleared the contents of the datadir for mysql and copied over an old copy (taken before initially configuring galera) I had made in another directory.

I did the same for my 2 nodes - SG01 & SG02

I then configured galera on both SG01 & SG02.

Current firewall config on both environments:

forge@sg01-mllmconcepts-com:~$ sudo ufw show added
Added user rules (see 'ufw status' for running firewall):
ufw allow 22
ufw allow 80
ufw allow 443
ufw allow from 192.168.a.b
ufw allow from 192.168.c.d
ufw allow 4567/udp
ufw allow 3306,4444,4567,4568/tcp
root@sg01-mllmconcepts-com:~#

root@sg02-mllmconcepts-com:~# sudo ufw show added
Added user rules (see 'ufw status' for running firewall):
ufw allow 22
ufw allow 80
ufw allow 443
ufw allow from 192.168.a.b
ufw allow from 192.168.c.d
ufw allow 3306,4444,4567,4568/tcp
ufw allow 4567/udp
root@sg02-mllmconcepts-com:~#

SG02 Config:

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="mllm_cluster"
wsrep_cluster_address="gcomm://192.168.a.b,192.168.c.d"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="192.168.c.d"
wsrep_node_name="SG02"

SG01 Config

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="mllm_cluster"
wsrep_cluster_address="gcomm://192.168.a.b,192.168.c.d"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="192.168.a.b"
wsrep_node_name="SG01"

Then I brought up SG02 (master) using sudo galera_new_cluster and it worked.

root@sg02-mllmconcepts-com:~# mysql -u root  -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+
root@sg02-mllmconcepts-com:~#

But when I try bringing up SG01, it doesn't work:

forge@sg01-mllmconcepts-com:~$ sudo systemctl start mysql
Job for mariadb.service failed because the control process exited with error code.
See "systemctl status mariadb.service" and "journalctl -xe" for details.
forge@sg01-mllmconcepts-com:~$

Status and Journalctl isn't helping me much :(

forge@sg01-mllmconcepts-com:~$ systemctl status mariadb.service
● mariadb.service - MariaDB 10.3.21 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Thu 2020-01-09 07:32:29 IST; 42s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 10999 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
  Process: 10997 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 11624 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
  Process: 11405 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-envir
  Process: 11403 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 11392 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
 Main PID: 11624 (code=exited, status=1/FAILURE)
   Status: "MariaDB server is down"

Any help appreciated. Thanks in advance.

It works if I delete the Galera config for SG01 and start mysql. But I really need it to be part of Galera cluster.

1 Reply

mjones · Answer 1 · Jan. 13, 2020, 10:26 p.m.

mjones 4 years, 7 months ago Linode Staff

I wasn't able to find anything out of the ordinary in your config, or from systemctl's printout. There's usually a bit more info below the section of the systemctl printout that you included, did anything stand out there? In addition, have you checked over the MariadDB logs? They're usually located in /var/log/mysql. /var/log/syslog might help too.

If this is an initial setup, I'd recommend removing and reinstalling MariaDB and Galera and redoing the configuration. It's the best way to make sure everything's working and replication's happening as needed. I've had to do this a few times myself on new installs of software, and I sometimes learn more about it the second time I install it.

Other than that, you may want to reach out to the Galera community as they'll have more specialized knowledge about Galera setups.

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us

Galera stopped working after reboot

1 Reply

Reply

Tips: