Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume missing after reboot #6

Open
cron410 opened this issue Oct 31, 2017 · 35 comments
Open

Volume missing after reboot #6

cron410 opened this issue Oct 31, 2017 · 35 comments

Comments

@cron410
Copy link

cron410 commented Oct 31, 2017

I created a volume on both Docker hosts that connects to a gluster cluster hosted on 2 other servers with the following command:

docker volume create --driver sapk/plugin-gluster --opt voluri="gluster.mydomain.com:volume-app1" --name volume-app1

I had a Docker host lock up yesterday and I had to perform a hard reboot on the VM, when it came back up, the volume was mounted with the Local driver instead of sapk/plugin-gluster:latest as the other host shows with docker volume ls so I removed the volume from the troubled host and recreated it. This connected to gluster again and shows the correct data and all containers that rely on the volume work correctly.

How do I make the docker volume persist across reboots?

@sapk
Copy link
Owner

sapk commented Oct 31, 2017

The persistence is maintain via file /etc/docker-volumes/gluster/gluster-persistence.json in case of docker plugin this file is inside the container so if plugin is destroy the persistence is lost. I should mount this file on host.

@sapk
Copy link
Owner

sapk commented Oct 31, 2017

@cron410
Copy link
Author

cron410 commented Oct 31, 2017

I guess this needs to be built with make ? I'm pretty terrible at running even the simplest of compile operations so I won't be able to test this, only waste your time. Looking at your other hub repo's I assume this cannot be automated with docker hub.

@sapk
Copy link
Owner

sapk commented Oct 31, 2017

Simply upgrade the plugin: https://hub.docker.com/r/sapk/plugin-gluster/ https://docs.docker.com/engine/reference/commandline/plugin_upgrade/#examples
This will maybe erase current config of volume and you may have to re-create them again.

@cron410
Copy link
Author

cron410 commented Oct 31, 2017

That is fine. The data exists elsewhere so recreating the volume only adds the ability for docker containers to mount that volume from the gluster server.

So after following the upgrade process, it failed.

root@localhost:/# docker plugin enable sapk/plugin-gluster

Error response from daemon: rpc error: code = Unknown desc = oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"rootfs_linux.go:57: mounting \\\"/etc/docker-volumes/gluster\\\" to rootfs \\\"/var/lib/docker/plugins/dbfcaff7d50960238e085282706623ef279a1fc1c538dbdfb36a3985d066fecb/rootfs\\\" at \\\"/var/lib/docker/plugins/dbfcaff7d50960238e085282706623ef279a1fc1c538dbdfb36a3985d066fecb/rootfs/etc/docker-volumes/gluster\\\" caused \\\"no such device\\\"\""

@cron410
Copy link
Author

cron410 commented Oct 31, 2017

I performed a forced remove of the plugin with docker plugin remove -f sapk/plugin-gluster and installed again with docker plugin install sapk/plugin-gluster and get the same error.

root@localhost:/# docker plugin install sapk/plugin-gluster
Plugin "sapk/plugin-gluster" is requesting the following privileges:
 - network: [host]
 - mount: [/etc/docker-volumes/gluster]
 - device: [/dev/fuse]
 - capabilities: [CAP_SYS_ADMIN]
Do you grant the above permissions? [y/N] y
latest: Pulling from sapk/plugin-gluster
a1a7381f86a6: Download complete
Digest: sha256:75d06c6afa4c0d82839710ce8bba781d66e8bb9fdbf6509ac7971ed4fa24bd31
Status: Downloaded newer image for sapk/plugin-gluster:latest 

Error response from daemon: rpc error: code = Unknown desc = oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"rootfs_linux.go:57: mounting \\\"/etc/docker-volumes/gluster\ \\" to rootfs \\\"/var/lib/docker/plugins/8044bd305dd96e39425bb368479a3738e6f75edc1a80521315a617a429b5e5fb/rootfs\\\" at \\ \"/var/lib/docker/plugins/8044bd305dd96e39425bb368479a3738e6f75edc1a80521315a617a429b5e5fb/rootfs/etc/docker-volumes/gluste r\\\" caused \\\"no such device\\\"\""

@sapk
Copy link
Owner

sapk commented Oct 31, 2017

Can you try to create folder /etc/docker-volumes/gluster on host ?

@cron410
Copy link
Author

cron410 commented Oct 31, 2017

oh sorry I already did that after the first fail. I jus tried again after changing the owner of the folder from root to a normal user with the docker group. that didnt work either.

Error response from daemon: rpc error: code = Unknown desc = oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused \"rootfs_linux.go:57: mounting \\\"/etc/docker-volumes/gluster\\\" to rootfs \\\"/var/lib/docker/plugins/8044bd305dd96e39425bb368479a3738e6f75edc1a80521315a617a429b5e5fb/rootfs\\\" at \\\"/var/lib/docker/plugins/8044bd305dd96e39425bb368479a3738e6f75edc1a80521315a617a429b5e5fb/rootfs/etc/docker-volumes/gluster\\\" caused \\\"no such device\\\"\""

@cron410
Copy link
Author

cron410 commented Oct 31, 2017

root@localhost:/# ll /etc/docker-volumes/
bash: ll: command not found
root@localhost:/# ls -l /etc/docker-volumes/
total 4
drwxr-xr-x 2 appuser docker 4096 Oct 31 10:18 gluster

@sapk
Copy link
Owner

sapk commented Nov 2, 2017

Ok I updated the dockerfile to create the folder also in the container. I was too optimistisc and chang without testing on my computer. I will try to test this futhermore soon.

@cron410
Copy link
Author

cron410 commented Nov 2, 2017

The volume works again, but it is still missing after a reboot and I have to run docker volume create --driver sapk/plugin-gluster --opt voluri="gluster.mydomain.com:volume-app1" --name volume-app1 before starting the container. Is there some other flag I should be running to make the volume automatically restart?

@sapk
Copy link
Owner

sapk commented Nov 2, 2017

I have restore the old version without the local mount of the config file. I will test it soon to list all the need changes and maybe build a plugin with a mountpoint (write the persitence file on host) and one with the file inside the container (like now) the let the user the choice.

@sapk
Copy link
Owner

sapk commented Nov 14, 2017

After reviewing code, I found that the reading and saving part of persistence wasn't using the same file.
Should be really fix in this last version. For information : v1.0.5...master#diff-c2830bc69fcc3e78c6bd782f1a5d8920L56 vs

err = ioutil.WriteFile(CfgFolder+"/persistence.json", b, 0600)

@greetingsFromPoland
Copy link

the problem still occurs.
You need to remove and re-add the plugin.

    "Id": "5fbb180f3e90158bbc74f459383131890d2a02d0c90e4944e60445fde3712675",
    "Name": "sapk/plugin-gluster:latest",
    "PluginReference": "docker.io/sapk/plugin-gluster:latest",

@cron410
Copy link
Author

cron410 commented Nov 25, 2017 via email

@cron410
Copy link
Author

cron410 commented Nov 28, 2017 via email

@cron410
Copy link
Author

cron410 commented Nov 30, 2017

Still broken. After a host reboot, the volume is completely borked. It shows in docker volume ls but cannot be removed and is empty when mounted by the . I removed all containers and ran docker volume rm volume-openvpn and got the error message Error response from daemon: unable to remove volume: remove volume-openvpn: VolumeDriver.Remove: volume volume-openvpn is currently used by a container even though no containers exist on the host. I removed all images and get the same error removing the volume. Removing the plugin with docker plugin rm sapk/plugin-gluster shows the error Error response from daemon: plugin sapk/plugin-gluster:latest is in use which is expected after the first error. The plugin can be removed forcefully with

$ docker plugin rm -f sapk/plugin-gluster
sapk/plugin-gluster
$ docker plugin ls
ID                  NAME                DESCRIPTION         ENABLED
$_

The plugin and volume are gone at this point. I can then install the plugin, create the volume, and pull/run the container again and everything is functional. I then wait 5 minutes and reboot the host.

The host comes back up and the volume is empty, container is continually restarting. I remove the container and image and cannot remove the volume.

@sapk
Copy link
Owner

sapk commented Dec 1, 2017

It seems that the plugin doesn't take in account the removed container attached that why it block the removal of volume. For information, a volume plugin doesn't know the running container and can only keep count of request. I will look if it is now possible to pass arg to plugin to force removal. For the original problem, I think that docker (re-)start the container before the plugin is ready. Do you start the container from systemd/init.d or let docker restart running container before reboot ?

@cron410
Copy link
Author

cron410 commented Dec 1, 2017 via email

@sapk
Copy link
Owner

sapk commented Dec 15, 2017

Seems related to #12

@cron410
Copy link
Author

cron410 commented Dec 15, 2017 via email

@sapk
Copy link
Owner

sapk commented Mar 28, 2018

@cron410 I have rework the plugin to have a common base for the multiple docker volume I have developed. I hope it fix your issue.

@cron410
Copy link
Author

cron410 commented Mar 28, 2018 via email

@trajano
Copy link

trajano commented Mar 28, 2018

Hi I think something else seems off. I am putting it here in case what I write can be the same test case for this issue.

I am not getting any errors on the plugin (even without the subdir). I see it in docker volumes ls in the swarm node

However, when I actually try to mount the volume using mount -t glusterfs from another system i do not see the files. However the volume is still there and I can ls within the docker.

Perhaps we should have a unit test to include doing a mount of the glusterfs and see if the file is there.

@sapk
Copy link
Owner

sapk commented Mar 28, 2018

@trajano there is a test doing that https://github.com/sapk/docker-volume-gluster/blob/master/gluster/integration/integration_test.go. It create a cluster of gluster node (container) and try to mount them via managed and legacy plugin. It create folder, write to it and compare value of data inside.

@trajano
Copy link

trajano commented Mar 29, 2018 via email

@trajano
Copy link

trajano commented Mar 29, 2018 via email

@trajano
Copy link

trajano commented Mar 29, 2018

I saw your recent commits added the explicit mount. Did it work for you?

@trajano
Copy link

trajano commented Mar 29, 2018

Tried a few more things. The examples with docker-compose and docker run work. However, it has issues when I deploy it on the swarm. On Swarm mode it looks like it is just running a local volume.

@sapk
Copy link
Owner

sapk commented Mar 29, 2018

@trajano somethings don't work at least on travis but fuse is kind of tricky in this env (it failed to mount also from cli for exemple). I need to test it more locally.

@trajano
Copy link

trajano commented Mar 29, 2018

Okay I thought it worked since the travis build on the branch passed. I'm doing a few more tests on my side too.

@trajano
Copy link

trajano commented Apr 3, 2018

Tried to do a reboot again. This time with the legacy plugin (rather than managed) seems to have the same behavior in that if the remote store is not ready (which may take a minute or so) it creates a local volume rather than go on a loop.

I can verify that it is still using the gluster mount in the service configuration when I did a docker service inspect

  "Mounts": [
                        {
                            "Type": "volume",
                            "Source": "ci_jenkins_home",
                            "Target": "/var/jenkins_home",
                            "VolumeOptions": {
                                "Labels": {
                                    "com.docker.stack.namespace": "ci"
                                },
                                "DriverConfig": {
                                    "Name": "gluster",
                                    "Options": {
                                        "voluri": "store1,store2:trajano/jenkins"
                                    }
                                }
                            }
                        }
                    ],

@trajano
Copy link

trajano commented Apr 9, 2018

I took a crack at making my own plugin, so far so good and it seems to survive reboots but only in docker stack deployed nodes. https://hub.docker.com/r/trajano/glusterfs-volume-plugin/ It does not sustain itself when the volume is created using docker volume create (also applies to docker-compose up) and then reboot.

@trajano
Copy link

trajano commented Apr 26, 2018

I think I solved it on mine by storing the volume map data to a bolt db that is inside the rootfs and I also used "global" scope.

@cron410
Copy link
Author

cron410 commented Apr 26, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants