Pular para o conteúdo principal
BlogLinodeLinode.com Atualização do status 04/06/04

Linode.com Atualização do status 04/06/04

b]Linux 2.6 nos Hosts[/b][/tamanho] [b]Linux 2.6 nos Hosts[/b][/tamanho] [b]Linux 2.6 nos Hosts[/b

Seis dos 20 servidores host estão agora rodando na versão 2.6 do kernel Linux, com o programador de discos em fila justa CFQ.

Agora que já estamos rodando em algumas caixas há algum tempo, tenho uma sensação muito boa de como ele funciona. Notei que 2,6 é melhor em algumas cargas de trabalho, e um pouco pior para outras cargas de trabalho, em comparação com 2,4 (determinado pela comparação entre a saída pré/pós 2,6 mrtg e vmstats). Estou otimista que há alguns ganhos adicionais a serem obtidos com algumas das opções de ajuste da VM (/proc/sys/vm/*).

No geral, acho que 2,6 é "uma coisa boa", e vamos passar o resto dos anfitriões para 2,6 eventualmente.

[b]O disco I/O Thrashing não é mais![/b][/tamanho]

A principal razão pela qual eu queria passar para 2,6 era para as melhorias de desempenho de E/S acima de 2,4.

O Linux é suscetível ao que eu chamaria de "ataque de negação de serviço no disco rígido" quando há uma alta taxa de solicitações aleatórias de leitura/gravação, preenchendo a(s) fila(s) de solicitações. Isto causa problemas de latência para outras solicitações, e essencialmente leva as coisas a um rastejamento.

Este é exatamente o tipo de carga de trabalho que acontece quando um Linode está continuamente batendo em seus dispositivos de troca (leitura e escrita rápidas) e quando o host está sob pressão para escrever aquelas páginas sujas (o que sempre será, após algum tempo). Infelizmente, a correção do CFQ para 2,6 não resolveu este problema. (Nem os programadores padrão de antecipação ou de prazos).

O CFQ ajuda um pouco com muitos fios fazendo I/O aleatórios (como durante as festas de trabalho cron), mas não elimina a possibilidade de um Linode cunhar o anfitrião inteiro. Continue lendo para a solução...

b]UML I/O Request Token-Limiter patch[/b][/tamanho] [b]UML I/O Request Token-Limiter patch[/b

Eu implementei um simples Token Bucket Filter/Limiter em torno do motorista UBD assíncrono dentro da UML. O método do Token-bucket é bastante simples. É assim que ele funciona: A cada segundo, x fichas são adicionadas ao balde. Cada pedido de E/S requer um token, então ele tem que esperar até que o balde tenha alguns tokens antes de ser permitido realizar a E/S.

Este método permite uma taxa de descarga/unrestrição até que o balde esteja vazio, e então começa a estrangular. Perfeito!

Links:
[url=http://www.theshore.net/~caker/patches/token-limiter-v1.patch]token-limiter-v1.patch[/url]
[url=http://www.theshore.net/~caker/patches/token-limiter-v1.README]token-limiter-v1.README[/url]

[b][cor=vermelho escuro] Com este remendo, um único Linode não pode mais cingir o anfitrião![/color][/b]

Isto é um grande problema, já que o único método para corrigir isto quando acontece era eu intervir, e parar a Linode ofensiva.

O remendo limitador está no kernel 2.4.25-linode24-1um (2.6 a seguir em breve).

As inadimplências são muito altas, e duvido que algum de vocês seja afetado por isso em condições normais de uso. Posso alterar os valores de recarga e o tamanho da caçamba durante o tempo de funcionamento, então serei capaz de projetar um monitor para cada host que muda dinamicamente os perfis dependendo da carga do host. Este é um grande negócio! 🙂

b]Linux 2.6 para os Linodes[/b][/tamanho] [b]Linux 2.6 para os Linodes[/b

Eu ainda não anunciei oficialmente o núcleo de 2,6-um. Ainda há alguns bugs e problemas de desempenho a serem resolvidos. Ainda não recomendo rodar o kernel 2.6-um para uso em produção, mas alguns usuários aventureiros têm testado e relatado algumas das peculiaridades envolvidas em colocá-lo em funcionamento sob cada distro. Vou tentar compilar um guia sobre a migração para 2.6 e liberá-lo assim que o kernel estiver mais estável.

[b]O que há de novo no mundo da UML?[/b][/tamanho]

Há muito tempo que estamos atrasados para os novos adesivos UML. Acho que teremos um novo lançamento de UML (tanto para 2,4 quanto para 2,6) dentro das próximas duas semanas, mais ou menos.

Além das habituais correções de erros, eu sei que Jeff tem trabalhado no suporte AIO para o motorista IO dentro da UML. AIO é uma nova funcionalidade implementada em 2.6 (nos hosts). Alguns benefícios são:
[lista][*] A capacidade de submeter múltiplas solicitações de E/S com uma única chamada ao sistema.
[*] A capacidade de submeter uma solicitação de E/S sem esperar sua conclusão e de sobrepor a solicitação com outro processamento.
[*] Otimização da atividade do disco pelo kernel através da combinação ou reordenação das solicitações individuais de uma E/S em lote.
[*] Melhor utilização da CPU e rendimento do sistema, eliminando threads extras e reduzindo interruptores de contexto.
[/list]
Mais sobre a AIO:
http://lse.sourceforge.net/io/aio.html
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf

Isso é tudo!
-Chris


Comentários (12)

  1. Author Photo

    This may be naive, but wouldn’t it help tremendously to have all the swap partitions for a given linode on a different drive?

  2. Author Photo

    [quote:9a75d3e3be=”diN0″]This may be naive, but wouldn’t it help tremendously to have all the swap partitions for a given linode on a different drive?[/quote]
    It might, but that’s not the point, really. Before this patch, a single UML could consume all of the I/O (say, for a given device, like you suggested). It would still cause the same problem when other Linodes tried to access the device. The same effect can be had with “swap files” that exist on your filesystem (rather than actual ubd images) or heavy I/O on any filesystem.

    With this patch, I am able to guarantee a minimum level of service. Previously that wasn’t possible.

    -Chris

  3. Author Photo

    Great work chris, I genuinely can’t think of anything else you can improve upon! 😉

  4. Author Photo

    Chris,

    I tried the 2.6 kernel of Redhat 9 (large) a few days ago. It failed to boot & I had to switch back to 2.4.

    Another forum thread had the same problem.
    dev/ubd/disc0: unknown partition table
    /dev/ubd/disc1: unknown partition table

  5. Author Photo

    I am really excited about this. As you know I have been one of the most vocal proponents of some system of throttling disk I/O so that an overzealous Linode cannot DOS the host.

    It sounds like this solution will require everyone to upgrade to a 2.6 kernel, which means that it cannot be applied until everyone is ready to go to 2.6 (and it will only be effective when *everyone* has upgraded to this fixed kernel). So I guess the solution is months away. But at least there is a plan in the works to solve this problem for good.

    Great job man! Keep up the good work!

  6. Author Photo

    Just curious – why not solve this problem in the host kernel instead? Can the host kernel be patched to limit any one of its processes using the I/O token system that you have devised? Then the Linode themselves can run any kernel they want to and the host system will prevent any one from thrashing the disk.

    Ideally this would be some kind of rlimit option, so that it could be applied just to the Linode processes themselves and not to the other processes of the host system.

    I don’t know if the I/O layer that’s deeper in the kernel than the UML ubd driver is harder to work with though … perhaps it would be too complex to modify the fundamental Linux I/O code than it is to modify the ubd driver?

  7. Author Photo

    caker, thanks for all the hard work you’ve put in to keep the linode hosts in top shape.

    It’s rather surprising that CFQ didn’t solve the I/O scheduling problem, though. The algorithm is supposed to be [i]completely fair[/i] towards each thread requesting I/O. 😛

  8. Author Photo

    [quote:52760ef410=”Quik”]Great work chris, I genuinely can’t think of anything else you can improve upon! :wink:[/quote]
    Thanks Quik 🙂

    [quote:52760ef410=”gmt”]Chris,

    I tried the 2.6 kernel of Redhat 9 (large) a few days ago. It failed to boot & I had to switch back to 2.4.

    Another forum thread had the same problem.
    dev/ubd/disc0: unknown partition table
    /dev/ubd/disc1: unknown partition table[/quote]
    You can always ignore this warning message — it’s just telling you that the ubd devices are not partitioned. You’re using the entire block device as one giant ‘partition’.

    To get 2.6 to work under RedHat, first rename /lib/tls to something else (since 2.6-um and NPTL don’t mix yet).

    -Chris

  9. Author Photo

    [quote:2eaacf3890=”bji”]I am really excited about this. As you know I have been one of the most vocal proponents of some system of throttling disk I/O so that an overzealous Linode cannot DOS the host.

    It sounds like this solution will require everyone to upgrade to a 2.6 kernel, which means that it cannot be applied until everyone is ready to go to 2.6[/quote]
    Not sure where you read that from my post. I’ve already patched the 2.4.25-linode24-1um kernel with the token-limiter patch, and 2.6-um to follow shortly.

    [quote:2eaacf3890=”bji”](and it will only be effective when *everyone* has upgraded to this fixed kernel). So I guess the solution is months away. But at least there is a plan in the works to solve this problem for good.[/quote]
    Most/all of the repeat offenders have already been rebooted into the “linode24″ kernel (with the limiter patch). So the solution is in effect right now. But, you are correct — there are still many Linodes running un-limited.

    [quote:2eaacf3890=”bji”]Great job man! Keep up the good work![/quote]
    Thanks!

    -Chris

  10. Author Photo

    [quote:f066e66db0=”bji”]Just curious – why not solve this problem in the host kernel instead? Can the host kernel be patched to limit any one of its processes using the I/O token system that you have devised? Then the Linode themselves can run any kernel they want to and the host system will prevent any one from thrashing the disk.

    Ideally this would be some kind of rlimit option, so that it could be applied just to the Linode processes themselves and not to the other processes of the host system.

    I don’t know if the I/O layer that’s deeper in the kernel than the UML ubd driver is harder to work with though … perhaps it would be too complex to modify the fundamental Linux I/O code than it is to modify the ubd driver?[/quote]
    I agree — the correct solution is to get Linux fixed, or perhaps to get UML to use the host more efficiently. Some of the UML I/O rework is already under way (the AIO stuff), but that kind of thing *is* months away…

    One interesting “feature” of the CFQ scheduler is an ionice priority level. But, I wasn’t able to get the syscalls working to test it.

    -Chris

  11. Author Photo

    [quote:01c9cda963=”griffinn”]caker, thanks for all the hard work you’ve put in to keep the linode hosts in top shape.

    It’s rather surprising that CFQ didn’t solve the I/O scheduling problem, though. The algorithm is supposed to be [i]completely fair[/i] towards each thread requesting I/O. :P[/quote]
    I’m not sure where the bottleneck is — but as far as I can tell, CFQ and the standard scheduler in 2.4 appear equally (non)responsive in the worst-case scenario. Go figure…

    One interesting thing is that UML uses the no-op elevator. Jeff and I got into a discussion about this, and he says there’s no point to UML doing any request merging, but I disagree. I’d rather have UML do some of it’s own request merging and reordering than force the host to do it all. Plus, it makes UML appear to the host as more of a streaming type load than a random load…

    Think back to the last set of tiobench benchmark results you’ve seen — look how poorly the random-i/o results are compared to “streaming-read” and “streaming-write”…

    So .. another hack to the UML code (one-liner) to test…

    -Chris

  12. Author Photo

    Thanks, Caker. I have a tiny linode and I make almost no demands on the system, so far at least. However, fairness is part of what you sell. It sounds like the leaky bucket in the UM kernel solves most of the problem with a minimum of effort. I’ve been implementing fairness algorithms for at least 30 years, so I have a few theoretical observations and questions:

    You appear to be issueing tokens independently to each process at an absolute rate, independent of the actual resource availability. This means that a UML may get limited even if nobody else wants the resource, yes? It might be better for the host kernel to issue tokens at an over-all rate to the UMLs.That way a particular UML can use the whole resource if nobody else wants it. since everybody’s buckets are full, the instant anyone else wants to use the resource the original user is instantly throttled to 50% as the tokens are returned equally to the two users, and so on as more users are added. That is, the main kernel returns tokens to each UML with a non-full bucket equally, but does not add tokens to a bucket that is already full. The host kernel should dyamically adjust its token generation rate to just keep the resource occupied. I’ve successfully done this in the past by watching the resource: if the resource goes idle when thre are any empty buckets, slightly increase the token rate. If the resource never goes idle, slightly decrease the token rate.

    Next issue: Do you “oversubscribe” the host memory? That is, does the sum of the UML memory sizes exceed the size of the host’s real application space? If so, the host swapspace is used, causing disk activity at this level. This is independent of the swap activity within each UML as the user exceeds its “real” space and begins to use its swap partition. I’m guessing that host-level swapping does not count against any UML’s bucket. but that UML-level swapping does. This would be tha fair way to do this. However, host-level swapping will reduce the overall amount of IO resource that is available to the users. The algorithm above will account for this.

    Next issue: Do we have fairness issues with network bandwidth? do you intend to add a token system to address this?

    Again: I’m a happy camper. These are purely theoretical questions for me.

Deixe uma resposta

Seu endereço de e-mail não será publicado. Os campos obrigatórios estão marcados com *