10 Lessons Learned
Using CoreOS


Gabriel Monroy / CTO, Engine Yard / @gabrtv

About Me


  • Early contributor to Docker and CoreOS
  • Deis project creator and BDFL
  • CTO at Engine Yard

What is Deis?


  • Leading Docker-based PaaS
  • Opinionated developer workflow
  • Consumer of orchestration APIs

How does Deis work?

How did we choose CoreOS?

Winter of 2014

Early Prototypes


  • Mesos / Marathon
  • Flynn / Sampi
  • CoreOS / Fleet

Why CoreOS?


  • Best OS for running containers
  • Designed for distributed systems
  • Strong team with a clear philosophy

It works!


  • 500+ new clusters every day
  • 3,500+ new nodes every day
  • 120+ external contributors

What's Next?


  • v2 Orchestation Previews
  • Deis/CoreUpdate Integration
  • Enhanced Developer UX

Lessons Learned using CoreOS

#1

Docker + Systemd is awkward

Docker reparents containers


$ ps fax
  PID TTY      STAT   TIME COMMAND
 1051 ?        Ssl    0:37 docker --daemon --host=fd://
 1304 ?        Ssl    0:01  \_ /usr/bin/ceph-mon -d -i ip-10-21-1-30.us-west-2.compute.internal --public-addr 10.21.1.30:6789
 1535 ?        Ssl    0:14  \_ ceph-osd -d -i 0 -k /var/lib/ceph/osd/ceph-0/keyring
 1735 ?        Ssl    0:00  \_ /usr/bin/ceph-mds -d -i ip-10-21-1-30.us-west-2.compute.internal
 2304 ?        Sl     0:00  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8888 -container-ip 172.17.0.4 -container-port 8888
 2310 ?        Ss     0:00  \_ /usr/bin/radosgw -n client.radosgw.gateway
 3934 ?        Ssl    0:00  \_ /usr/local/bin/logspout
 4186 ?        Ssl    0:00  \_ /usr/local/bin/publisher
 4426 ?        Sl     0:00  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 2222 -container-ip 172.17.0.7 -container-port 2222
 4435 ?        Sl     0:00  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 443 -container-ip 172.17.0.7 -container-port 443
 4444 ?        Sl     0:00  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -container-ip 172.17.0.7 -container-port 80
					

..which breaks systemd



$ systemctl status helloworld.service
● helloworld.service - Hello World Service
   Loaded: loaded (/etc/systemd/units/helloworld.service; linked-runtime; vendor preset: disabled)
   Active: active (running) since Sun 2015-05-03 02:50:40 UTC; 7min ago
 Main PID: 4725 (docker)
   CGroup: /system.slice/system-helloworld.slice/helloworld.service
           └─4743 docker run --name helloworld --rm -p 5000:5000 helloworld
					

and can even result in orphaned containers

#2

Operating etcd is hard

Watch your disk I/O


Monitor Heartbeat Statistics


$ while true; do curl -sL http://127.0.0.1:4001/v2/stats/leader | /opt/bin/jq . ; sleep 1 ; done
{
  "leader": "023059cd43934b1181c24690231dfed1",
  "followers": {
    "1332067017d340b7b2a0af7517d5807a": {
      "latency": {
        "current": 2.024596,
        "average": 2.279488968895424,
        "standardDeviation": 1.682551806925102,
        "minimum": 1.65033,
        "maximum": 42.187369
      },
      "counts": {
        "fail": 0,
        "success": 4083
      }
    },
    "fdee920bc40e48a6842a6a120da55969": {
      "latency": {
        "current": 2.011876,
        "average": 2.426211598236163,
        "standardDeviation": 1.9611706333295913,
        "minimum": 1.805175,
        "maximum": 45.020381
      },
      "counts": {
        "fail": 0,
        "success": 4082
      }
    }
  }
}
					

Use an isolated etcd cluster when possible

#3

Fleet works best as a "distributed init"

Use Fleet to manage your cluster "control plane"

Use Constraints for granular placement


"Conflicts" for HA


[X-Fleet]
Conflicts=highly-available@*.service
					

"MachineOf" for Co-location


[X-Fleet]
MachineOf=other.service
					

"MachineMetadata" for Host Contraints


[X-Fleet]
MachineMetadata=disk=ssd
					

Use Global Units for Agents


Agents typically mount the Docker socket


$ docker run -v /var/run/docker.sock:/var/run/docker.sock [options] [image]
					

...or perform other host-level management functions

#4

Fleet unit files tend toward chaos

Pre/post container execution becomes important



[Service]
ExecStartPre=/bin/prepare-image
ExecStartPre=/bin/prepare-container
ExecStart=/bin/run-container
ExecStopPost=-/bin/cleanup-container
					

Self-contained unit files require using subshells



[Service]
ExecStartPre=/bin/sh -c "prepare the image"
ExecStartPre=/bin/sh -c "prepare the container"
ExecStartPre=/bin/sh -c "run the container"
ExecStartPre=/bin/sh -c "cleanup after the container"
					

Subshells quickly become complicated



[Service]
EnvironmentFile=/etc/environment
ExecStartPre=/bin/sh -c "IMAGE=`/run/deis/bin/get_image /deis/router` && \
                         docker history $IMAGE >/dev/null 2>&1 || \
                         docker pull $IMAGE"
					

#5

Container failover is hard

Thundering herd can easily overwhelm a cluster

Downloading images can be a huge bottleneck

#6

Stateful containers are hard

Ceph makes stateful containers possible

Ceph is complicated


  • Monitors
  • OSDs
  • Gateways
  • MDSs

Ceph adds a lot of operational overhead



$ docker exec -it deis-store-daemon bash
root@ip-10-21-1-30:/# ceph -s
    cluster ed8bf4e2-3004-4cc5-8ece-ba4f9ae47eca
     health HEALTH_OK
     monmap e3: 3 mons at {ip-10-21-1-30.us-west-2.compute.internal=10.21.1.30:6789/0,ip-10-21-2-24.us-west-2.compute.internal=10.21.2.24:6789/0,ip-10-21-2-25.us-west-2.compute.internal=10.21.2.25:6789/0}, election epoch 6, quorum 0,1,2 ip-10-21-1-30.us-west-2.compute.internal,ip-10-21-2-24.us-west-2.compute.internal,ip-10-21-2-25.us-west-2.compute.internal
     mdsmap e6: 1/1/1 up {0=ip-10-21-2-24.us-west-2.compute.internal=up:active}, 2 up:standby
     osdmap e28: 3 osds: 3 up, 3 in
      pgmap v270: 1536 pgs, 12 pools, 437 MB data, 340 objects
            22229 MB used, 244 GB / 280 GB avail
                1536 active+clean
  client io 402 B/s rd, 1208 B/s wr, 0 op/s
					

#7

Be careful with Cloud Init

Don't use cloud init for configuration management

..even though it's tempting


#cloud-config
---
coreos:
  units:
  - name: etcd.service
  - name: upgrade-fleet-091.service
  - name: stop-update-engine.service
  - name: install-deisctl.service
  - name: ntpdate.service
  - name: timedate-ntp-synchronization.service
  - name: debug-etcd.service
  - name: increase-nf_conntrack-connections.service
  - name: load-overlay-module.service
  - name: fleet.service
write_files:
  - path: /etc/deis-release
  - path: /etc/motd
  - path: /etc/systemd/system/docker.service.d/50-insecure-registry.conf
  - path: /run/deis/bin/get_image
  - path: /run/deis/bin/preseed
  - path: /run/deis/bin/deis-debug-logs
  - path: /home/core/.toolboxrc
  - path: /etc/environment_proxy
  - path: /etc/systemd/coredump.conf
  - path: /etc/systemd/system/ntpd.service.d/debug.conf
  - path: /opt/conf/fleetd-092-custom-binary.conf
					

Use proper tooling for configuration management


  • Ansible
  • CoreUpdate

#8

The CoreOS Update Engine rocks

Use CI to test all channels


Don't turn it on until you can handle failover


#cloud-config
---
coreos:
  units:
  - name: stop-update-engine.service
    command: start
    content: |
      [Unit]
      Description=stop update-engine

      [Service]
      Type=oneshot
      ExecStart=/usr/bin/systemctl stop update-engine.service
      ExecStartPost=/usr/bin/systemctl mask update-engine.service
					

Explore using CoreUpdate and updateservicectl yourself

Coming soon to Deis!


August 2014 (9 months ago)

#9

Get your ops story straight

Have crisp answers for


  • Networking
  • Logging
  • Monitoring
  • Security
  • Upgrades

#10

CoreOS is constantly evolving

Thank You!

Questions