SSHFS + Linux = SFTP powered cloud storage

sshfs

Do you like cloud storage? Did you read the comparison between Dropbox, Google Drive, One Drive, and Box? Still cannot decide? Great! Then this article is for you. After reading it, you will probably decide to get yourself a Linux box and build your own custom cloud storage using Linux and SSHFS.

In computing, SSHFS (SSH Filesystem) is a filesystem client to mount and interact with directories and files located on a remote server or workstation. The client interacts with the remote file system via the SSH File Transfer Protocol (SFTP), a network protocol providing file access, file transfer, and file management functionality over any reliable data stream that was designed as an extension of the Secure Shell protocol (SSH) version 2.0. – Wikipedia

Enable file sharing over SSH (SFTP)

SFTP is the secure variant of the file transfer protocol (FTP). A (Debian-based) Linux server only needs an SSH server to allow to serve the home directory of the local users via SFTP. The following commands enable this:

sudo apt-get install openssh-server

To install and enable the firewall:

sudo apt-get install ufw
sudo ufw allow 22
sudo ufw enable

To find the public IP address:

curl ifconfig.me

You need this IP address and the default port (22) to connect to your cloud storage. Note that if you run this Linux box at home you need to forward TCP port 22 on your broadband (DSL or Cable) modem/router. You can look up how to do this on your device via portforward.com (disable your ad-blocker to make this website work correctly).

Advantages of SSHFS over public cloud storage

  • You can use your cloud server also as a web server, application server, database server, mail server, and DNS server (flexibility)
  • The cost per GB of storage and GB transferred is very low (costs)
  • More control over privacy of the data (security)

Disadvantages of SSHFS compared to public cloud storage

  • No automatic backups (data safety), but you can make a rsync cron job
  • No web interface available, but you could install one
  • No document versioning built-in, but you can use Git

SSHFS client for Linux

Linux has support for Filesystem in Userspace (FUSE). This means it supports mounting volumes without having root access. This is especially good when mounting external storage for a specific user. A cloud storage filesystem accessed over SSH is something you typically want to mount for a specific user.

To enable access to your cloud storage on Linux you must first make a directory where you want to mount the cloud storage. It is very convenient when the directory is automatically mounted as soon as it is accessed. This can be achieved by using the AutoFS tool. The AutoFS service (daemon), once installed, takes care of automatically mounting and unmounting the directory.

sudo apt-get install autofs sshfs
sudo nano /etc/auto.sshfs
sudo nano /etc/auto.master
ssh maurits@cloudserver
sudo service autofs restart

Now we have to create a autofs configuration that states in which directory the remote location is mounted. The following configuration tells AutoFS to use SSHFS to mount from “maurits@cloudserver” the directory “/home/maurits” onto the local “cloudserver” directory.

maurits@nuc:~$ cat /etc/auto.sshfs
cloudserver -fstype=fuse,rw,nodev,nonempty,noatime,allow_other,max_read=65536,IdentityFile=/home/maurits/.ssh/id_rsa,UserKnownHostsFile=/home/maurits/.ssh/known_hosts :sshfs\#maurits@cloudserver:/home/maurits

At the end of the file “/etc/auto.master” we add the following lines:

# Below are the sshfs mounts
/home/maurits/ssh /etc/auto.sshfs uid=1000,gid=1000,--timeout=30,--ghost

This means that the local directory “/home/maurits/ssh” will hold the directory “cloudserver” that we specified earlier. As you can see I also specified the user that owns the files and the seconds of inactivity after which the directory is unmounted and the SSH connection is ended.

Before everything works you must make sure you add yourself to the “fuse” group using the following command or the mounting will fail:

sudo usermod  -a -G fuse maurits

After doing this you may have to logout and login again before the changes are effective.

This setup allows me to edit the remote files as if they are locally available. Other software is not aware that the files are actually on a remote location. This also means I can use my favorite editors and/or stream the media files using my favorite media player.

I used the following sites to find out the above configuration:

  1. http://unix.stackexchange.com/questions/52262/autofs-with-sshfs-not-working-anymore
  2. http://www.mccambridge.org/blog/2007/05/totally-seamless-sshfs-under-linux-using-fuse-and-autofs/
  3. https://help.ubuntu.com/community/SSHFS
  4. http://hublog.hubmed.org/archives/001928.html
  5. https://bbs.archlinux.org/viewtopic.php?id=175257

Enhanced security using EncFS

There is a possibility to enhance the security of your cloud storage by adding EncFS to your SSH mounted filesystem. I’ll leave this as an exercise for the reader. EncFS can encrypt the files (and filenames) on the storage with AES-256. You can read some about that here and here. Using encryption may avoid the data being leaked in some cases, for instance, when a disk is broken and needs replacement. On the downside there are not many clients that support this.

SFTP in read-only mode

If you do not want to any risk corrupting files due to broken connections while writing, you can chose to run the SFTP subsystem in read-only mode. To do this you need to add the -R flag to the SFTP subsystem line in “/etc/ssh/sshd_config” so that it becomes:

Subsystem sftp /usr/lib/openssh/sftp-server -R

In my experience this type of file corruption is not happening a lot, but you better be safe than sorry. Also this will prevent you from accidentally deleting files. So if you do not need to write anyway, then you should put the system in read-only mode for safety reasons. Note that you can still use rsync when you put the SFTP system in read-only mode.

Disable password login for SSHD

Using passwords for logging in to SSH is not the most secure solution. If you open up your SSH to the Internet you should be using public key authentication. You can read about it here and here or follow these clear instructions. After doing that you can disable the password login by putting this line into the /”etc/ssh/sshd_config” file:

PasswordAuthentication no

SSHFS clients for other platforms

If you are not working on your Linux box, but you want to access your SSHFS cloud storage you can use one of the following clients (that all support private keys). I personally tested a lot of clients and although there are plenty of choices I recommend the following (none of these clients support EncFS):

Open source Windows client

Open source OSX client

Free iOS (iPhone/iPad) client (no media streaming support)

Free Android client (no media streaming support)

Final words

We have shown you how to setup your own cloud storage. Some may say it is not as good as Dropbox or Google Drive or any other commercial provider, others may argue it is better. What is good about it is the large choice in clients that are available for this kind of cloud storage, due to the open source nature of the technology.

Share

Distributed storage for HA clusters (based on GlusterFS)

Nowadays everyone is searching for HA (High Available) solutions for running more powerful applications or websites. As you can see we already have post on our blog about configuring loadbalancers on Ubuntu (http://www.leaseweb.com/labs/2011/09/setting-up-keepalived-on-ubuntu-load-balancing-using-haproxy-on-ubuntu-part-2/) i will expand that post with the possibility to run highly available and balanced system which you can use for your shared hosting or high traffic website without having a single point of failure.

The actual problem of balanced / clustered solutions often the content server where you keep all data, that can be: databases, static files, uploaded files. All this content needs to be distributed across all your servers and you’ll have to keep track on modifications, which is not really convenient.  In other case you will have a single point of failure of your balanced solution. You can use rsync or lsyncd for doing this, but to make things more simple from administrative perspective and get more advantage for future, you can use a DFS (distributed file system), nowadays there are plenty suitable open source options like: ocfs, glusterfs, mogilefs, pvfs2, chironfs, xtreemfs etc.

Why we want to use DFS?

  • Data sharing of multiple users
  • user mobility
  • location transparency
  • location independence
  • backups and centralized management

For this post, I choose GlusterFS to start with. Why? It is opensource, it has modular, plug-able interface and you can run it on any linux based server without upgrading your kernel. Maybe i will make some overview of others DFS in one of my next blog posts.

We will use 3 dedicated servers like HP120G6:

1. HP120G6 / 1 x QC X3440 CPU / 2 x 1Gbit NICs / 4GB RAM / 2 x 1TB SATA2 (disks you can add later on fly :))

Let’s assume that we already have dedicated server with CentOS 6 installed on the first 1TB hard drive.

Just ssh to your host prepare HDDs and install GlusterFS server with agent.

#ssh root@85.17.xxx.xx

Now we need to prepare HDDs, we will use second 1TB drive for distributed storage:

#parted /dev/sdb
mklabel gpt
unit TB
mkpart primary 0 1TB
print
quit
mkfs.ext4 /dev/sdb1

After that we will install all required packages for gluster:

#yum -y install wget fuse fuse-libs automake bison gcc flex libtool
#yum -y install compat-readline5 compat-libtermcap
#wget http://packages.sw.be/rsync/rsync-3.0.7-1.el5.rfx.x86_64.rpm<

Let’s download the gluster packages from their website:

#wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-core-3.2.4-1.x86_64.rpm
#wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-fuse-3.2.4-1.x86_64.rpm
#wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-geo-replication-3.2.4-1.x86_64.rpm
#wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-rdma-3.2.4-1.x86_64.rpm
#rpm -Uvh glusterfs-core-3.2.4-1.x86_64.rpm
#rpm -Uvh glusterfs-geo-replication-3.2.4-1.x86_64.rpm

We also want to run our storage on a separated network card, let’s configure eth1 for that using internal IP range:

#nano /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
HWADDR="XX:XX:XX:XX:XX:XX"
NM_CONTROLLED="yes"
BOOTPROTO="static"
IPADDR="10.0.10.X"
NETMASK="255.255.0.0"
ONBOOT="yes"

We also want to use hostnames while configuring gluster:

#nano /etc/hosts
10.0.10.1       gluster1-server
10.0.10.2       gluster2-server
10.0.10.3       gluster3-server

Ensure that TCP ports 111, 24007,24008, 24009-(24009 + number of bricks across all volumes) are open on all Gluster servers.

You can use the following chains with iptables:

#iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24047 -j ACCEPT
#iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 111 -j ACCEPT
#iptables -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 111 -j ACCEPT
#iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 38465:38467 -j ACCEPT
#service iptables save
#service iptables restart

Now check the version of installed glusterfs:

#/usr/sbin/glusterfs -V

To configure Red Hat-based systems to automatically start the glusterd daemon every time the system boots, enter the following from the command line:

#chkconfig glusterd on

GlusterFS offers a single command line utility known as the Gluster Console Manager to simplify configuration and management of your storage environment. The Gluster Console Manager provides functionality similar to the LVM (Logical Volume Manager) CLI or ZFS Command Line Interface, but across multiple storage servers. You can use the Gluster Console Manager online, while volumes are mounted and active.

You can run the Gluster Console Manager on any Gluster storage server. You can run Gluster commands either by invoking the commands directly from the shell, or by running the Gluster CLI in interactive mode.

To run commands directly from the shell, for example:

#gluster peer status

To run the Gluster Console Manager in interactive mode:

#gluster

Upon invoking the Console Manager, you will get an interactive shell where you can execute gluster commands, for example:

gluster > peer status

Before configuring a GlusterFS volume, you need to create a trusted storage pool consisting of the storage servers that will make up the volume. A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of that server alone. To add additional storage servers to the storage pool, you can use the probe command from a storage server.

To add servers to the trusted storage pool use following command for each server you have, example:

gluster peer probe gluster2-server
gluster peer probe gluster3-server

Verify the peer status from the first server using the following commands:

# gluster peer probe gluster1-server
Number of Peers: 2

Hostname: gluster2-server
Uuid: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
State: Peer in Cluster (Connected)

Hostname: gluster3-server
Uuid: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
State: Peer in Cluster (Connected)

This way, you can add additional storage servers to your storage pool on the fly, to remove a server to the storage pool, use the following command:

# gluster peer detach gluster3-server
Detach successful

Now we can create a replicated volume.
A volume is a logical collection of bricks where each brick is an export directory on a server in the trusted storage pool. Most of the gluster management operations happen on the volume.
Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.

First we ssh to each server we have to create folders and mount HDDs:

#mkdir /storage1
mount /dev/sdb1 /storage1

#mkdir /storage2
mount /dev/sdb1 /storage2

#mkdir /storage3
mount /dev/sdb1 /storage3

Create the replicated volume using the following command:

# gluster volume create test-volume replica 3 transport tcp gluster1-server:/storage1 gluster2-server:/storage2 gluster3-server:/storage3
Creation of test-volume has been successful
Please start the volume to access data.

You can optionally display the volume information using the following command:

# gluster volume info
Volume Name: test-volume
Type: Distribute
Status: Created
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster1-server:/storage1
Brick2: gluster2-server:/storage2
Brick3: gluster3-server:/storage3

You must start your volumes before you try to mount them, start the volume using the following command:

# gluster volume start test-volume
Starting test-volume has been successful

Now we need to setup GlusterFS client to access our volume, Gluster offers multiple options to access gluster volumes:

  • Gluster Native Client – This method provides high concurrency, performance and transparent failover in GNU/Linux clients. The Gluster Native Client is POSIX conformant. For accessing volumes using gluster native protocol, you need to install gluster native client.
  • NFS – This method provides access to gluster volumes with NFS v3 or v4.
  • CIFS – This method provides access to volumes when using Microsoft Windows as well as SAMBA clients. For this access method, Samba packages need to be present on the client side.

We will talk about Gluster Native Client as it is a POSIX conformant, FUSE-based, client running in user space. Gluster Native Client is recommended for accessing volumes when high concurrency and high write performance is required.

Verify that the FUSE module is installed:

# modprobe fuse
# dmesg | grep -i fuse
fuse init (API version 7.XX)

Install required prerequisites on the client using the following command:

# sudo yum -y install openssh-server wget fuse fuse-libs openib libibverbs

Ensure that TCP and UDP ports 24007 and 24008 are open on all Gluster servers. Apart from these ports, you need to open one port for each brick starting from port 24009. For example: if you have five bricks, you need to have ports 24009 to 24014 open.

You can use the following chains with iptables:

# sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT
# sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24009:24014 -j ACCEPT

Install Gluster Native Client (FUSE component) using following command:

#rpm -Uvh glusterfs-fuse-3.2.4-1.x86_64.rpm
#rpm -Uvh glusterfs-rdma-3.2.4-1.x86_64.rpm

After that we need to mount volumes we created before:

To manually mount a Gluster volume, use the following command on each server we have:

# mount -t glusterfs gluster1-server:/test-volume /mnt/glusterfs

You can configure your system to automatically mount the Gluster volume each time your system starts.
Edit the /etc/fstab file and add the following line on each server we have:

gluster1-server:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0

To test mounted volumes we can easily use command:

# df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on gluster1-server:/test-volume
1T 1.3G 1TB 1% /mnt/glusterfs

Now you can use /mnt/glusterfs mount as HA storage for your files, backups or even XEN virtualization and you don’t need to worry if 1 server is going down because of hardware problem or you planned some maintenance on it.

Next time, i will write about how to integrate Gluster with NFS without having a single point of failure, so you can use it with any solution which supports the NFS protocol.

Any suggestions and comments are appreciated,Thank you!

Share