how to copy disk image of server to amazon aws S3

Hi,

Does anyone know how to copy a disk image of a whole linode server to a bucket in amazon aws S3?

I have tried following all the articles in the library, logged support tickets, and spoken to a few people at Linode and no one and nothing has been able to tell me how to do it successfully.

The latest I heard from Linode support was that it is not possible to do this since Amazon S3 does not support SSH.

please note that I do not want to copy the data to my local machine before pushing it to S3. I just want to copy directly from Linode to S3.

Is this possible?

Can you please give me detailed instructions of how would I do it?

Thanks

Lance

14 Replies

My usual approach is to use blueprint, etckeeper, and/or chef to summarize the differences between a standard OS image and my ideal system, and duplicity to back up data. But, this does take some planning and forethought, and restores will also take some planning. On the other hand, it is much easier to move to other providers and architectures. (See also tarsnap, for something I haven't used but that looks nice.)

For sending stuff to S3, I find it's easier to use whatever you normally use to send stuff to S3. For me, it's s3cmd, but there's probably others out there.

In the interest of science, I just deployed a fresh Linode (with a 10 GB disk image – I'm not made of money here, yo), booted up Rescue mode, and ssh'd to lish. Long story short, I couldn't make it work. My first attempt was to install s3cmd (apt-get update; apt-get install s3cmd; s3cmd --configure) and try to put the file. It returned immediately, having done nothing:

root@hvc0:~# s3cmd mb s3://awesome-bucket-of-science
Bucket 's3://awesome-bucket-of-science/' created
root@hvc0:~# s3cmd put /dev/xvda s3://awesome-bucket-of-science/disk.img
root@hvc0:~#

So I installed Boto 2.0 from the repository (apt-get install python-boto) and tried to upload that way. It, too, failed, but after doing much more:

root@hvc0:~# python
Python 2.7.2+ (default, Aug 16 2011, 07:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from boto.s3.connection import S3Connection
>>> conn = S3Connection('<aws access="" key="">', '<aws secret="" key="">')
>>> bucket = conn.create_bucket('awesome-bucket-of-science')
>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'disk.img'
>>> k.set_contents_from_filename('/dev/xvda')
... a long pause here ...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 713, in set_contents_from_filename
    policy, md5, reduced_redundancy)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 653, in set_contents_from_file
    self.send_file(fp, headers, cb, num_cb, query_args)
  File "/usr/lib/python2.7/dist-packages/boto/s3/key.py", line 535, in send_file
    query_args=query_args)
  File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 423, in make_request
    override_num_retries=override_num_retries)
  File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 618, in make_request
    return self._mexe(http_request, sender, override_num_retries)
  File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 584, in _mexe
    raise e
socket.error: [Errno 32] Broken pipe</module></stdin></aws></aws> 

I suspect it is dying when trying to find the mimetype and MD5 hash of /dev/xvda. So, I installed a newer version of Boto which has a setcontentsfrom_stream method to skip this:

root@hvc0:~# apt-get install python-pip
root@hvc0:~# pip install boto --upgrade
...
root@hvc0:~# python
>>> import boto
>>> conn = boto.connect_s3('<aws access="" key="">', '<aws secret="" key="">')
>>> bucket = conn.create_bucket('awesome-bucket-of-science')
>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'disk.img'
>>> fp = open('/dev/xvda', 'rb')
>>> k.set_contents_from_stream(fp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 757, in set_contents_from_stream
    % provider.get_provider_name())
boto.exception.BotoClientError: BotoClientError: s3 does not support chunked transfer</module></stdin></aws></aws> 

So, nope. I think it can certainly be made to work, but I've spent an hour on this and couldn't get it to upload my /dev/xvda, so it's your turn to play around with it for awhile! -rt

The actual raw bit-for-bit image? I'd probably boot into recovery mode (Finnix), then try to use s3cmd to copy /dev/xvda to S3. If it refuses to do so, I'd probably use boto directly.

For a 20 GB image with the default network profile, this should take no less than an hour.

Hi Hoopycat - thanks for the response. I don't know what you mean by bit-for-bit image. I basically want a backup offsite for my whole server so that I can easily spin up a server from it.

HOw do you use s3cmd to copy from Linode to S3? how would you issue this command? Are you sure this is possible to do in rescue mode?

thanks

Hi again hoopycat,

Thanks a lot for giving it a good shot. I've spent hours trying to get this to work and have not found a way to do it. Trust me, I tried very hard prior to putting a request on here.

Perhaps someone else may have a solution?

Still don't understand "why?" you want to do this.

Data should already be getting backed up nightly (or more often depending on the data).

Starting a fresh VPS from scratch should be scripted out (or if a once in a while process, documented THOROUGHLY).

In the case you need to spin up a brand new Linode, it will be way faster to start a fresh VPS, run the setup scripts (or do so manually from your config documentation), and restore the data then it will be to prep a new VPS, setup the empty partition, and copy back a boatload (i.e. 20GB) of image data.

I don't see why a proprietary image of Linode's VPS setup stored offsite is that much of an asset.

The proper approach from a Linode perspective is probably to store custom data (like a tarball of your web root, or a latest backup of the databases, or whatnot) somewhere that you can pull it down (like S3, or a "master" linode), and then write a stack script that gets the right packages and config settings going, then pulls down the tarball containing the necessary custom files; this is very simple to do.

When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.

There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).

@Guspaz:

The proper approach from a Linode perspective is probably to store custom data (like a tarball of your web root, or a latest backup of the databases, or whatnot) somewhere that you can pull it down (like S3, or a "master" linode), and then write a stack script that gets the right packages and config settings going, then pulls down the tarball containing the necessary custom files; this is very simple to do.

When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.

There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).

The reason is that I want to shut down my linode for a while. I'm not using it now and I'm not sure if I'm going to. But in case I do want to have it back I want an easy way to store it and bring it back at some point down the road. I don't want to keep paying $20/mo if I'm not using it.

@Guspaz:

The proper approach from a Linode perspective is probably to store custom data (like a tarball of your web root, or a latest backup of the databases, or whatnot) somewhere that you can pull it down (like S3, or a "master" linode), and then write a stack script that gets the right packages and config settings going, then pulls down the tarball containing the necessary custom files; this is very simple to do.

When that's done spinning up a new linode is as simple as just creating a new linode and selecting the stackscript, wait a few minutes and poof, out pops a fully configured and ready-to-go linode.

There are, of course, other solutions (the cat often suggests Chef, I believe), but for a relatively simple setup, writing your own stack script is probably the easiest thing since it requires no infrastructure (since Linode provides it already).

I guess if I have no idea how to write a script, I'm at a loss. I have no background in sysadmin or programming. Just simple folk.

Then do it the "old fashion" way, document it.

Make a step by step documentation (fresh vps, install Apache, install PHP, install ….)

Document the config setups (and make a copy of them to a thumb drive).

And of course your data should be backed up already (and document that process as well).

Then if you need to do a bare metal restore, you have the step by step process (with examples and copies on your thumb drive) to do so.

The key to this method is not to skip ANY step. What seems blatantly obvious at this point in time will be a muddled faint memory 6 weeks/months/etc from now.

anyone else know how to do this easily?

Tarsnap is the way to go. It uses amazon s3

has anyone at linode set up something to do this easily yet?

Was able to do this, albeit it's still hacky. After rebooting into rescue mode:

update-ca-certificates
apt install python3-pip
pip3 install s3cmd
s3cmd --configure
dd if=/dev/sda | s3cmd put - s3://bucket/linode.img

The important part being since v1.5, it supports stdin via - as file name.
https://serverfault.com/a/690328

You can also verify the checksum, albeit it's a bit involved, because Amazon calcs them per part in multipart uploads. You need to pay attention to the part size (in my case 15MB) and number of parts (in my case 201).

for i in {0..200}; do echo $i; dd bs=1M count=15 skip=$((15*$i)) if=/dev/sda | md5sum | grep -o "^[^ ]*" >> checksums.txt; done

https://stackoverflow.com/a/19896823

Then compare the final checksum locally:

# xxd -r -p checksums.txt | md5sum
fa1c909f001e2ca5e21c64e51e0a7be6  -

To the one from amazon (ignore the -201 on the end, that's the number of parts):

# s3cmd ls --list-md5 s3://bucket/linode.img
2019-04-03 22:18 3149922304   fa1c909f001e2ca5e21c64e51e0a7be6-201

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct