Writing this article on a tool that I have come across for Rackspace Cloud Files. This is a tool called 'Turbolift' which can be found in the GitHub here. I have seen issues where people want to bulk upload or even bulk delete data from any given container. I am going to run threw a few snippets using Turbolift that may help if you ever find yourself in the same situation.
Before you 'git clone' turbolift you will want to install a few tools. Turbolift is a Python written utility so a few things are required before installing turbolift. The main 2 packages you will need are 'python-dev' and 'python-setuptools'. Once you have this installed, you can proceed to installing turbolift by running the following.
1 2 3
git clone git://github.com/cloudnull/turbolift.git cd turbolift sudo python setup.py install
After this you can run the following to see all options available for this tool.
The main 2 options I will touch on is the 'upload' and 'delete' options available for this tool. You can run the following to view the switches needed for these options.
1 2 3
turbolift upload -h and turbolift delete -h
The following option will allow you to upload an entire local directory to your Rackspace Cloud Files container:
turbolift -u [CLOUD-USERNAME] -a [CLOUD-API-KEY] --os-rax-auth [REGION] upload -s [PATH-TO-DIRECTORY] -c [CONTAINER-NAME]
Now there was a bit of an issue I ran into with upload a bulk directory. By default when getting the list of files to upload Turbolift will sort the files by size. If you have a lot of files this may be a time consuming operation. I was able to add the following to the 'optionals.py' file located in directory 'turbolift/turbolift/arguments/'. You will need to add an extra argument to this file:
1 2 3 4 5 6
optionals.add_argument('--no-sort', action='store_true', help=('By default when getting the list of files to upload ' 'Turbolift will sort the files by size. If you have a lot ' 'of files this may be a time consuming operation. This flag will ' 'disable that function.'))
The --no-sort will allow the upload to go without having to sort the files by size. This will simply start turbolift without the sort check.
Now with deletes (be careful because there are no backups for these) There are a few switches you can use to make the process work as fast as possible.
turbolift -u [CLOUD-USERNAME] -a [CLOUD-API-KEY] --os-rax-auth [REGION] delete -c [CONTAINER-NAME]
Keep in mind, any operation is done via the public net, you can use the option '--internal' so that these operations happen on the service net (private network using a cloud server in the same Region). This does speed up operation as one would determine.
Recently I have attempted to delete over a TB of data (about 4 millon objects) from a container using a 2GB server. Turbolift practically laughed at me. For a deletion of that size, you will want to use something like a 15-30GB slice. I found using a 30Gb slice did the trick. Might be a little pricey for the slice but I only needed it for about 6 hours. I was able to see that turbolift/python was utilizing about 20GB of memory on the server. So keep that in mind when you come up to big deletion of that size. Luckily turbolift outputs its logs to a file named 'turbolift.log'. You know, for your enjoyment
On an extra note, the option 'turbolift clone -h works great when you want to clone a container to another container in the same Region or to a container in another Region, which ever you prefer.
turbolift -u [CLOUD-USERNAME] -a [CLOUD-API-KEY] --os-rax-auth [SOURCE-REGION] clone -sc [SOURCE-CONTAINER-NAME] -tc [DESTINATION-CONTAINER-NAME] -tr [DESTINATION-REGION]
I hope this helps as I know I needed it personally. If you have any questions, feel free to ask.