When will Linode move to SSD?

Having used a few SSDs I am a big fan of this technology. It's quite clear to me that rotating platter hard drives will, within the next ten years, maybe even 5 years, become a thing of the past - and deservedly so; I can't imagine a more antiquated technology than a metal disk spinning around at high speeds with a mechanically actuated arm as the read element.

Given that SSDs are inevitably going to displace rotating drives from the market … when would you wager a bet that we'll see them in Linodes?

As I understand it, Linode uses two 1 TB drives per system; at today's prices, this is probably a couple of hundred bucks per Linode host. 2 TB of SSD would be something like $5,000, so clearly it's cost prohibitive at the moment. But maybe in two or three years the cost will have come down sufficiently?

The benefits of SSDs would be numerous:

  • A potential increase in reliability (although SSDs are quite new, so they haven't been proven in the market, nor refined for as long as platter drives have) - Patriot is offering 10 year warranties on their SSD products and I'd be surprised if other manufactures don't eventually do the same.

  • Greatly enhanced performance. There is no greater performance enhancement available than an upgrade from platter drive to SSD, for many workloads

  • Reduced heat, noise, power requirements

  • Possible form factor improvements - SSDs can be much smaller than hard drives without sacrificing any performance or increasing manufacturing costs significantly, so maybe a 1U server could hold 4 or 8 1.8 inch SSDs instead of 2 platter drives

So my bet is that in 2 years, we'll see SSDs used experimentally in Linodes and that in 3 years, Linode will switch over to SSDs.

Anyone else have any thoughts?

40 Replies

We complain plenty about lack of disk space here already, and with good reason. It'll be a very long time before SSDs reach parity with spinning platters on the $/GB front.

But maybe the future lies in some relatively small amount of SSD space for each of us for the main system, and then access to a much larger (spinning platter) SAN.

@bji:

I can't imagine a more antiquated technology than a metal disk spinning around at high speeds with a mechanically actuated arm as the read element.
ORLY?

@Vance:

@bji:

I can't imagine a more antiquated technology than a metal disk spinning around at high speeds with a mechanically actuated arm as the read element.
ORLY?

Nice one :lol:

Hard to say which one I hated the most… but probably cassette tapes. Punch cards had a big problem if you dropped the stack and they spilled all over, but they sure beat the heck out of paper tape and cassettes.

@bji:

HI can't imagine a more antiquated technology than a metal disk spinning around at high speeds with a mechanically actuated arm as the read element.

Flush toilet technology is even more antiquated than that.

James

@Vance:

@bji:

I can't imagine a more antiquated technology than a metal disk spinning around at high speeds with a mechanically actuated arm as the read element.
ORLY?

Sorry, I guess I wasn't clear:

I can't imagine a more antiquated computer technology still in use than a metal disk spinning around at high speeds blah blah blah …

@Xan:

We complain plenty about lack of disk space here already, and with good reason. It'll be a very long time before SSDs reach parity with spinning platters on the $/GB front.

But maybe the future lies in some relatively small amount of SSD space for each of us for the main system, and then access to a much larger (spinning platter) SAN.

Yes but in a few years, it will be economical to have 2 TB of SSD per server. Admittedly, this only gets them to where the current hard disk capacity is at on Linodes, but then again, the pace of increasing storage in Linodes has been pretty slow over the past 6 years. So I don't expect Linode will be needing 20 TB per server any time soon.

@bji:

@Xan:

We complain plenty about lack of disk space here already, and with good reason. It'll be a very long time before SSDs reach parity with spinning platters on the $/GB front.

But maybe the future lies in some relatively small amount of SSD space for each of us for the main system, and then access to a much larger (spinning platter) SAN.

Yes but in a few years, it will be economical to have 2 TB of SSD per server. Admittedly, this only gets them to where the current hard disk capacity is at on Linodes, but then again, the pace of increasing storage in Linodes has been pretty slow over the past 6 years. So I don't expect Linode will be needing 20 TB per server any time soon.
And by the time 2 TB on a single SSD is economical, the SSD technology is probably "antiquated" as well, being replaced by something even better.

bji: you're also taking into account the 100K+ erase cycles? A place with many Linodes per physical server with unknown I/O workloads is sure to hit that limit sooner than for a conventional dedicated server.

@tronic:

bji: you're also taking into account the 100K+ erase cycles? A place with many Linodes per physical server with unknown I/O workloads is sure to hit that limit sooner than for a conventional dedicated server.

With good wear levelling this is not an issue. An SSD drive will last longer than a spinning platter drive; Patriot is giving 10 year warranties on their SSD drives, and I don't see too many traditional hard drive manufacturers doing the same.

That being said, SSDs are fairly new tech so reliability may not be quite as good now as it is likely to be in a few years for these devices. It's all a bit of an unknown as the drives haven't been in the wild long enough to find out how well the wear levelling algorithms work in the real world.

Intel's original 80GB x25-m was rated to write 100GB per day for 5 years before having issues. Their 160GB model could presumably do that for 10.

Their 64GB x-25e would presumably be able to write 80GB per day for 50 years, or something in that ballpark. Point is, the SLC drives won't wear out during the lifetime of a Linode host machine. Even the MLC drives from Intel probably have a sufficient lifespan to be useful, although they're still far too small.

@bji:

I can't imagine a more antiquated computer technology still in use than a metal disk spinning around at high speeds blah blah blah …
I believe the transistor was invented before the rotating hard drive…

@bji:

I can't imagine a more antiquated computer technology still in use than a metal disk spinning around at high speeds blah blah blah …
And the winner is: Babbage's Difference Engine #2 (designed in the 19th century - Nathan Myhrvold has a working copy).

I do believe SSD's will become the future of hard drives. They are becoming cheaper and cheaper, and they are getting bigger. I saw a high-end SSD that holds 1TB of data for $5000. In a couple of years, it will go down lots. Expect SSD's in servers very soon.

The 1 TB SSDs have come down a bit, but not quite enough for my personal taste yet…

http://www.engadget.com/2009/08/03/oczs … timeframe/">http://www.engadget.com/2009/08/03/oczs-1tb-colossus-ssd-gets-a-price-and-launch-timeframe/

If space is not a concern, SSDs can already be cheaper than spindle-based disks in some environments.

I recently read an article that was testing Intel's x25-e drive in high-load environments. They concluded that one single x25-e was as fast as 18 spindle-based 15K RPM SAS disks in RAID (in IOPS).

Obviously, the SSD cost a fraction as much as 18 high-end server drives. So if you're concerned about performance and not capacity, putting a few SSDs in RAID-1 or RAID-10 is already cheaper in servers.

Probably more reliable too; SSDs have no moving parts, so they tend to fail more predictably.

Absolutely right. I run a heavily database-driven service at a dedicated provider. (It was born on Linode, and eventually had to move out, so it can definitely be considered a Linode success story.)

Anyway, we were hitting a wall on I/O performance, and were looking at having to spend much money on a massive server upgrade to get more spindles. Instead, we dropped in a single X25-E, and suddenly I/O is not the system bottleneck. It was an unbelievable upgrade. Relative to hard drives, it's cake to stick more CPU power and memory in 1U servers.

@Guspaz:

If space is not a concern

That clearly isn't the case if you read these forums.

@glg:

@Guspaz:

If space is not a concern

That clearly isn't the case if you read these forums.

If each and every Linode customer got their own 80 GB Intel X25-M drive, that would be an additional $250 or so per customer. If the drive is expected to last 3 years before being replaced (a conservative estimate), that's an extra $7 per month or so. Throw in another $3 a month just for "administrative" costs (i.e. Linode's profits on their outlay to buy the SSD up front) and that's $10 per month.

I would HAPPILY pay an extra $10 per month for 3.5x the disk space of my Linode 540 plan. The fact that the drive would be 10x faster than my existing Linode shared drive would be a nice bonus.

Of course, the big problem is that it's not possible to put 30 or 40 SSD drives in a single server. The good news is that the cost per GB of SSD drives gets lower the larger the drive gets, so buying a few large capacity drives and splitting users across them is even more economical than my $10 per month estimate. And SSD drives are smaller, quieter, cooler, and use less power than platter drives. So getting a few TB of SSD drives into a Linode host ought to be doable in the not too distant future.

Perhaps Linode should start on the high end. That would be a great way to have a high end plan - a Linode 2880 equivalent with SSD drives. Charge an extra $10/mo on top of the standard Linode 2880 plan, and be the only hosting provider that I know of that hosts a plan on SSDs. And the performance would blow the doors off of a standard Linode 2880 for the types of workloads I expect people are using Linode 2880s for …

Except they wouldn't use x25-m, they'd use x25-e. So, roughly speaking, multiply your lifespans by ten (30 years) since the cells support 10x the writes. At that point, the drive wearing out isn't the lifespan concern, but how long before the drive is too out of date to be useful.

SSDs can be a big win for some applications, and the improved I/O performance can pay for itself in power-consumption reductions, but SSDs aren't always a clear win price/performance wise.

What I'd really like right now is a way to transparently use SSDs to durably buffer writes for the tables on our PostgresDBs to reduce the # of random IOs our HDDs have to handle, but I haven't wanted to mess with Solaris to play with ZFS, and I don't know of a Linux solution.

@eas:

SSDs can be a big win for some applications, and the improved I/O performance can pay for itself in power-consumption reductions, but SSDs aren't always a clear win price/performance wise.

Just because they're not always more cost-effective does't mean that they aren't still faster.

While it's true that high sequential read speeds are easier to achieve with with magnetic disks (6 magnetic disks in a RAID array can probably match the 500-600 MB/s sequential read speeds of an $895 Fusion-IO drive, while costing about half as much), the same is not true for random read/write performance. It's impractical to match the random read/write performance of an SSD with magnetics, since you'd need so many of them, it ends up being more expensive.

Of course, whether you actually need that performance is another question. But as a user of a 160GB Intel x25-m in my home desktop, I can say that it does make sense at home, if you can afford it. It's really an amazing difference.

> What I'd really like right now is a way to transparently use SSDs to durably buffer writes for the tables on our PostgresDBs to reduce the # of random IOs our HDDs have to handle, but I haven't wanted to mess with Solaris to play with ZFS, and I don't know of a Linux solution.

See btrfs, which is Linux's answer to ZFS. Unfortunately, it's not stable yet, and won't be for some time. It's intended to be the next-gen filesystem, with ext4 acting as the intermediate solution.

However, if all you want to do is buffer writes, shouldn't the OS be able to handle that with write buffering, or writing to an in-memory table and then periodically copying that to an on-disk table? Admittedly this is less reliable since memory goes poof in a failure scenario…

@bji:

If each and every Linode customer got their own 80 GB Intel X25-M drive, that would be an additional $250 or so per customer.
And then when you migrate to a new host, we get to ticket the datacenter to pull your drive and move it.

Ick.

@jed:

@bji:

If each and every Linode customer got their own 80 GB Intel X25-M drive, that would be an additional $250 or so per customer.
And then when you migrate to a new host, we get to ticket the datacenter to pull your drive and move it.

Ick.

Or buy a new one, and leave the old one in place for another customer to upgrade to.

@Xan:

I run a heavily database-driven service at a dedicated provider. (It was born on Linode, and eventually had to move out, so it can definitely be considered a Linode success story.)
Congrats ! Very curious :D

Are you able to provide any more details ? (or even just some vague hints ?)

http://www.youtube.com/watch?v=96dWOEa4Djs

A must watch for SSD enthusiasts.

Great video - highly recommended - even for people who aren't SSD enthusiasts (yet).

@Guspaz:

@eas:

SSDs can be a big win for some applications, and the improved I/O performance can pay for itself in power-consumption reductions, but SSDs aren't always a clear win price/performance wise.

Just because they're not always more cost-effective does't mean that they aren't still faster.

While it's true that high sequential read speeds are easier to achieve with with magnetic disks (6 magnetic disks in a RAID array can probably match the 500-600 MB/s sequential read speeds of an $895 Fusion-IO drive, while costing about half as much), the same is not true for random read/write performance. It's impractical to match the random read/write performance of an SSD with magnetics, since you'd need so many of them, it ends up being more expensive.

Of course, whether you actually need that performance is another question.

This is where I get off. I am constitutionally ill-suited for any sort of technical discussion where cost is not considered an important variable, even more so when the required performance envelope isn't defined.

> > What I'd really like right now is a way to transparently use SSDs to durably buffer writes for the tables on our PostgresDBs to reduce the # of random IOs our HDDs have to handle, but I haven't wanted to mess with Solaris to play with ZFS, and I don't know of a Linux solution.

See btrfs, which is Linux's answer to ZFS. Unfortunately, it's not stable yet, and won't be for some time. It's intended to be the next-gen filesystem, with ext4 acting as the intermediate solution.

However, if all you want to do is buffer writes, shouldn't the OS be able to handle that with write buffering, or writing to an in-memory table and then periodically copying that to an on-disk table? Admittedly this is less reliable since memory goes poof in a failure scenario…

The key word in the passage you quoted was "durably" and I'll throw in "consistency" for good measure. In a perfect world, I'd rather spend money on SSDs than hardware RAID controllers.

I know btrfs is being held out as Linux's answer to ZFS, but as you say, its not well proven yet. It's also not clear to me if it can address transparently putting hot-blocks on faster storage the way I understand ZFS can. I also wonder about its future. Work on btrfs was largely funded by Oracle, and Oracle is acquiring Sun…

I just thought of something… I think they'd have to rename RAID to allow for SSDs (RAED?).. That is unless of course the price of SSDs goes down below $700 for 256gigs any time soon!

Damn I'm funny.

@Xan:

We complain plenty about lack of disk space here already, and with good reason. It'll be a very long time before SSDs reach parity with spinning platters on the $/GB front.

But maybe the future lies in some relatively small amount of SSD space for each of us for the main system, and then access to a much larger (spinning platter) SAN. What probably makes more sense is Linodes built the way they are, with SSD space made available over iSCSI. It'll be small, and it'll be expensive, so what you'd probably want to do is run just your database or other I/O intensive application on it.

The iops you can push through a Linode are probably one of the biggest limitations of using them at the moment.

Not unless SSDs become faster than active memory…

Don't forget that you can always use a RAM disk if you really want to. You might consider that to be volatile, but there are various ways to deal with that. You can use the RAM for reads and have writes go to disk (high IOPS on reads, low IOPS on writes), or use asynchronous disk backing for the RAM disk (disk may lag behind RAM).

Interestingly, Anand is claiming that he expects Intel's next series of SSDs to include a 600GB option at a bit above $500, which is currently where the 160GB drive costs. If this happens (and I'm not so optimistic as Anand), the cost per gigabyte will have dropped by a huge amount.

@Guspaz:

You can use the RAM for reads and have writes go to disk (high IOPS on reads, low IOPS on writes), or use asynchronous disk backing for the RAM disk (disk may lag behind RAM).

How? Leaving it to the kernel by tuning dirty_writebacks and other related knobs? LVM/Raid with ramdisk in the matrix?

For the RAID-1 approach, all writes would be synchronously written to the disk. To get it working, you need to mark your "real" disk with –write-mostly, which tells the kernel that that disk is to be used for writing, and the other disk should be used for reading. See here: http://lkml.indiana.edu/hypermail/linux … /1153.html">http://lkml.indiana.edu/hypermail/linux/kernel/0509.0/1153.html

The advantages of a RAID-based approach is that the RAID rebuilding process handles the automatic population of the RAM disk; the ramdisk device is re-added to the array each boot (although I think you'd have to script that?), and the system rebuilds it and keeps it in synch. Does a lot of the work for you.

This might be a valid reason to bump up the RAM in your linode, if you intend to do it. But it's not cheap, the cost for storage is $56.75/GB. But considering that Intel enterprise SSD storage costs $12.90/GB at retail, you're looking at a an effective markup of only 4.4x.

Linode marks up traditional disk storage by something like 20x (10x if you consider RAID), so 4.4x is not bad.

Thanks.

But, I'm curious. Isn't the very concept of VM on Linux geared exactly toward that, with all available memory used automatically for fs page cache? In fact, it should be even better than a RAID setup, because the nature of it allows more data in total, than what is currently worked on, especially in high read-to-write ratio situations.

For example, say you have 10GB of total data on disc, but at any given moment only 1GB is actively worked with. That would mean 1GB of hot cache data, the older pages being replaced with newer on-demand. That would also mean you need at least 1GB of RAM available for the cache. True, if your reads are spread randomly across entire pool of 10GB, there is no advantage here, over the ramdisk.

But if you have enough RAM for VM cache for entire data, then I don't see how ramdisk would provide any advantage. In fact, relying on VM cache alone you have a more stable system which will simply drop old caches in case some other process starts requiring RAM.

Or am I missing something?

Ah, but you're also arguing against the utility of SSDs ;)

The only real difference is whether things are cached before or after the first access. And you can entice the OS to cache files in advance by reading them.

Heh, no, I'm not arguing against SSDs, I'm just questioning the reasoning behind using ramdisk/HDD raid array over what the kernel already does, and does better - except, as you suggest, first access situation, against which the ramdisk/HDD array might have advantage, but such advantage is not always necessary, depending on the kind of load of course.

By the way, I haven't seen so far one very important bit of math, in the price per GB calculation. What about heat and electricity?

Observed in a period of, say, a year, there is certain cost inherent to discs:

Price per GB + kWh electricity + kWh in required cooling = total cost

I don't have numbers (google suggests no meaningful numbers on the subject), but it seems logical to me that SSDs should draw less power and produce less heat. The question is, is the difference significant enough, and when observed in a certain period of time, does that reduce the total cost for the discs?

It depends on what kinds of disks you use. Are you comparing equal performance, or equal power?

I'll perform a very lopsided comparison and compare an Intel SSD to a 15K RPM 2.5" drive, which would probably be the fastest magnetic disk:

Intel x25-m (160GB):

Read/write: 0.15W

Idle: 0.075W

Seagate Cheetah 15K.7 (300GB, calculated from amperage on +5V and +12V in 6gbit mode):

Idle: 11.62W

Clearly those 15K drives are power hogs, they use like 5x more than a 2TB WD GP drive.

Power difference: 11.545W saved

KWh per year saved: 101KWh

Cost per KWh in Montreal: $0.0545 (residential rate for first 30KWh per day, business rates are slightly higher)

Power savings per year: $5.50

Price difference between drives: ~$162

Number of years before SSD saves money on power: ~29

But if you want equivalent capacities, the SSD will take centuries to catch up.

@Guspaz:

Cost per KWh in Montreal: $0.0545 (residential rate for first 30KWh per day, business rates are slightly higher) This isn't really representative of a datacenter. Datacenters have to size their batteries, generators, and cooling for the power they supply. And electricity's cheaper in Quebec than it is in a lot of other places.

@ArbitraryConstant:

@Guspaz:

Cost per KWh in Montreal: $0.0545 (residential rate for first 30KWh per day, business rates are slightly higher) This isn't really representative of a datacenter. Datacenters have to size their batteries, generators, and cooling for the power they supply. And electricity's cheaper in Quebec than it is in a lot of other places.

True, if you go by DC charges… they're only a few times higher. I'll use UberBandwidth. $17 per month per amp. That's $17 for 86.4 KWh, or $0.197 per KWh.

That's $19.90 per year saved, or 8 years just to break even. That's too long to amortize hardware. If you were trying to match capacities, I believe the price difference would have been $662, so it would then take 33 years to break even.

My point is, the power savings from an SSD aren't really a big concern unless you're comparing IOPS. At that point, it takes 18x15K drives to match the performance of an Intel x25-e, and the power savings become tangible. But if you're talking about matching the capacity, or matching the number of drives, there's no real cost advantage.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct