The Funtoo Linux project has transitioned to "Hobby Mode" and this wiki is now read-only.
Difference between revisions of "Linux Containers"
(→Links) |
|||
(5 intermediate revisions by one other user not shown) | |||
Line 125: | Line 125: | ||
# Start with a Funtoo LXC template, and unpack it to a directory such as {{c|/lxc/funtoo0/rootfs/}} | # Start with a Funtoo LXC template, and unpack it to a directory such as {{c|/lxc/funtoo0/rootfs/}} | ||
# Ensure {{c|c1}} line is uncommented (enabled) and {{c|c2}} through {{c|c6}} lines are disabled in {{c|/lxc/funtoo0/rootfs/etc/inittab}} | # Ensure {{c|c1}} line is uncommented (enabled) and {{c|c2}} through {{c|c6}} lines are disabled in {{c|/lxc/funtoo0/rootfs/etc/inittab}} | ||
Line 134: | Line 133: | ||
Create the following files: | Create the following files: | ||
==== {{c|/ | ==== {{c|/var/lib/lxc/funtoo0.config}} ==== | ||
Read "man 5 lxc.conf" , to get more information about linux container configuration file. | Read "man 5 lxc.conf" , to get more information about linux container configuration file. | ||
<pre> | <pre> | ||
## Container | ## Container | ||
lxc.utsname | lxc.utsname = funtoo0 | ||
lxc.rootfs | lxc.rootfs = /lxc/funtoo0/rootfs/ | ||
lxc.arch | lxc.arch = x86_64 | ||
lxc.console = /var/log/lxc/funtoo0.console # uncomment if you want to log containers console | |||
lxc.tty | #lxc.tty = 6 # if you plan to use container with physical terminals (eg F1..F6) | ||
lxc.tty = 0 # set to 0 if you dont plan to use the container with physical terminal, also comment out in your containers /etc/inittab c1 to c6 respawns (e.g. c1:12345:respawn:/sbin/agetty 38400 tty1 linux) | |||
lxc.pts | lxc.pts = 1024 | ||
## Capabilities (man 7 capabilities) | |||
## Capabilities | lxc.cap.drop = sys_module mac_admin mac_override sys_time audit_control audit_write syslog sys_admin sys_rawio | ||
lxc.cap.drop | # note: dropping capability sys_resource, causes failure to ssh into funtoo LXC container. | ||
# | |||
## Devices | ## Devices | ||
lxc.cgroup.devices.deny = a # Deny access to all devices | |||
lxc.cgroup.devices.deny | |||
# Allow to mknod all devices (but not using them) | # Allow to mknod all devices (but not using them) | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c *:* m | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = b *:* m | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 1:3 rwm # /dev/null | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 1:5 rwm # /dev/zero | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 1:7 rwm # /dev/full | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 1:8 rwm # /dev/random | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 1:9 rwm # /dev/urandom | ||
#lxc.cgroup.devices.allow | #lxc.cgroup.devices.allow = c 4:0 rwm # /dev/tty0 # ttys not required if you have lxc.tty = 0 | ||
#lxc.cgroup.devices.allow | #lxc.cgroup.devices.allow = c 4:1 rwm # /dev/tty1 | ||
#lxc.cgroup.devices.allow | #lxc.cgroup.devices.allow = c 4:2 rwm # /dev/tty2 | ||
#lxc.cgroup.devices.allow | #lxc.cgroup.devices.allow = c 4:3 rwm # /dev/tty3 | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 5:0 rwm # /dev/tty | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 5:1 rwm # /dev/console | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 5:2 rwm # /dev/ptmx | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 10:229 rwm # /dev/fuse | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 136:* rwm # /dev/pts/* | ||
lxc.cgroup.devices.allow | lxc.cgroup.devices.allow = c 254:0 rwm # /dev/rtc0 | ||
## Limits | ## Limits | ||
lxc.cgroup.cpu.shares | lxc.cgroup.cpu.shares = 1024 | ||
lxc.cgroup.cpuset.cpus | lxc.cgroup.cpuset.cpus = 0 # limits container to CPU0 | ||
lxc.cgroup.memory.limit_in_bytes | lxc.cgroup.memory.limit_in_bytes = 1024M | ||
lxc.cgroup.memory.memsw.limit_in_bytes = | lxc.cgroup.memory.memsw.limit_in_bytes = 2048M | ||
lxc.cgroup.blkio.weight = 500 # requires cfq block scheduler | |||
## | ## Filesystems | ||
lxc.mount.entry = proc proc proc nosuid,nodev,noexec 0 0 | |||
lxc.mount.entry = sysfs sys sysfs nosuid,nodev,noexec,ro 0 0 | |||
lxc.mount.entry = shm dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,mode=1777,size=256m,create=dir 0 0 # /dev/shm size should be less then half of your container memory limit | |||
lxc.mount.entry = tmpfs run tmpfs nosuid,nodev,noexec,mode=0755,size=128m 0 0 | |||
lxc.mount.entry | lxc.mount.entry = tmpfs tmp tmpfs nosuid,nodev,noexec,mode=1777,size=128m 0 0 | ||
lxc.mount.entry | |||
lxc.mount.entry | |||
lxc.mount.entry | |||
lxc.mount.entry | |||
##Example of having /var/tmp/portage as tmpfs in container | ##Example of having /var/tmp/portage as tmpfs in container | ||
#lxc.mount.entry | #lxc.mount.entry = tmpfs var/tmp/portage tmpfs defaults,size=8g,uid=250,gid=250,mode=0775 0 0 | ||
##Example of bind mount | ##Example of bind mount | ||
#lxc.mount.entry | #lxc.mount.entry = /srv/funtoo0 /lxc/funtoo0/rootfs/srv/funtoo0 none defaults,bind 0 0 | ||
## Network | ## Network configuration | ||
lxc.network.type | lxc.network.type = veth | ||
lxc.network.flags | lxc.network.flags = up | ||
lxc.network.hwaddr | lxc.network.link = br0 | ||
lxc.network.ipv4 = 192.168.1.2/24 | |||
lxc.network.name | lxc.network.ipv4.gateway = 192.168.1.1 | ||
lxc.network.hwaddr = #put your LXC container MAC address here, otherwise you will get a random one | |||
lxc.network.name = eth0 | |||
</pre> | </pre> | ||
Line 252: | Line 219: | ||
HA=`printf "02:00:%x:%x:%x:%x" ${IP//./ }` | HA=`printf "02:00:%x:%x:%x:%x" ${IP//./ }` | ||
echo $HA | echo $HA | ||
</pre> | </pre> | ||
Line 339: | Line 296: | ||
This section is devoted to documenting issues with the current implementation of LXC and its associated tools. We will be gradually expanding this section with detailed descriptions of problems, their status, and proposed solutions. | This section is devoted to documenting issues with the current implementation of LXC and its associated tools. We will be gradually expanding this section with detailed descriptions of problems, their status, and proposed solutions. | ||
=== PID namespaces === | === PID namespaces === | ||
Line 373: | Line 325: | ||
** in your container /etc/inittab | ** in your container /etc/inittab | ||
** and also comment out other line starting with pf:powerfail (such as pf::powerwait:/etc/init.d/powerfail start) <- these are used if you have UPS monitoring daemon installed! | ** and also comment out other line starting with pf:powerfail (such as pf::powerwait:/etc/init.d/powerfail start) <- these are used if you have UPS monitoring daemon installed! | ||
=== funtoo === | === funtoo === | ||
Line 389: | Line 340: | ||
* [[LXC_Fun|Fun stuff with LXC]] | * [[LXC_Fun|Fun stuff with LXC]] | ||
* [[LXD|Try LXD which brings more features to LXC]] | |||
* There are a number of additional lxc features that can be enabled via patches: [http://lxc.sourceforge.net/patches/linux/3.0.0/3.0.0-lxc1/] | * There are a number of additional lxc features that can be enabled via patches: [http://lxc.sourceforge.net/patches/linux/3.0.0/3.0.0-lxc1/] | ||
* [https://wiki.ubuntu.com/UserNamespace Ubuntu User Namespaces page] | * [https://wiki.ubuntu.com/UserNamespace Ubuntu User Namespaces page] |
Latest revision as of 13:17, September 9, 2017
Linux Containers, or LXC, is a Linux feature that allows Linux to run one or more isolated virtual systems (with their own network interfaces, process namespace, user namespace, and power state) using a single Linux kernel on a single server.
Status
As of Linux kernel 3.1.5, LXC is usable for isolating your own private workloads from one another. It is not yet ready to isolate potentially malicious users from one another or the host system. For a more mature containers solution that is appropriate for hosting environments, see OpenVZ.
LXC containers don't yet have their own system uptime, and they see everything that's in the host's dmesg
output, among other things. But in general, the technology works.
Basic Info
- Linux Containers are based on:
- Kernel namespaces for resource isolation
- CGroups for resource limitation and accounting
app-emulation/lxc is the userspace tool for Linux containers
Control groups
- Control groups (cgroups) in kernel since 2.6.24
- Allows aggregation of tasks and their children
- Subsystems (cpuset, memory, blkio,...)
- accounting - to measure how much resources certain systems use
- resource limiting - groups can be set to not exceed a set memory limit
- prioritization - some groups may get a larger share of CPU
- control - freezing/unfreezing of cgroups, checkpointing and restarting
- No disk quota limitation ( -> image file, LVM, XFS, directory tree quota,...)
Subsystems
root # cat /proc/cgroups subsys_name hierarchy num_cgroups enabled cpuset cpu cpuacct memory devices freezer blkio perf_event hugetlb
- cpuset -> limits tasks to specific CPU/CPUs
- cpu -> CPU shares
- cpuacct -> CPU accounting
- memory -> memory and swap limitation and accounting
- devices -> device allow deny list
- freezer -> suspend/resume tasks
- blkio -> I/O priorization (weight, throttle, ...)
- perf_event -> support for per-cpu per-cgroup monitoring perf_events
- hugetlb -> cgroup resource controller for HugeTLB pages hugetlb
Configuring the Funtoo Host System
Install LXC kernel
Any kernel beyond 3.1.5 will probably work. Personally I prefer 没有结果 as these have support for all the namespaces without sacrificing the xfs, FUSE or NFS support for example. These checks were introduced later starting from kernel 3.5, this could also mean that the user namespace is not working optimally.
- User namespace (EXPERIMENTAL) depends on EXPERIMENTAL and on UIDGID_CONVERTED
- config UIDGID_CONVERTED
- True if all of the selected software components are known to have uid_t and gid_t converted to kuid_t and kgid_t where appropriate and are otherwise safe to use with the user namespace.
- Networking - depends on NET_9P = n
- Filesystems - 9P_FS = n, AFS_FS = n, AUTOFS4_FS = n, CEPH_FS = n, CIFS = n, CODA_FS = n, FUSE_FS = n, GFS2_FS = n, NCP_FS = n, NFSD = n, NFS_FS = n, OCFS2_FS = n, XFS_FS = n
- Security options - Grsecurity - GRKERNSEC = n (if applicable)
- True if all of the selected software components are known to have uid_t and gid_t converted to kuid_t and kgid_t where appropriate and are otherwise safe to use with the user namespace.
- config UIDGID_CONVERTED
- As of 3.10.xx kernel, all of the above options are safe to use with User namespaces, except for XFS_FS, therefore with kernel >=3.10.xx, you should answer XFS_FS = n, if you want User namespaces support.
- in your kernel source directory, you should check init/Kconfig and find out what UIDGID_CONVERTED depends on
Kernel configuration
These options should be enable in your kernel to be able to take full advantage of LXC.
- General setup
- CONFIG_NAMESPACES
- CONFIG_UTS_NS
- CONFIG_IPC_NS
- CONFIG_PID_NS
- CONFIG_NET_NS
- CONFIG_USER_NS
- CONFIG_CGROUPS
- CONFIG_CGROUP_DEVICE
- CONFIG_CGROUP_SCHED
- CONFIG_CGROUP_CPUACCT
- CONFIG_CGROUP_MEM_RES_CTLR (in 3.6+ kernels it's called CONFIG_MEMCG)
- CONFIG_CGROUP_MEM_RES_CTLR_SWAP (in 3.6+ kernels it's called CONFIG_MEMCG_SWAP)
- CONFIG_CPUSETS (on multiprocessor hosts)
- CONFIG_NAMESPACES
- Networking support
- Networking options
- CONFIG_VLAN_8021Q
- Networking options
- Device Drivers
- Character devices
- Unix98 PTY support
- CONFIG_DEVPTS_MULTIPLE_INSTANCES
- Unix98 PTY support
- Network device support
- Network core driver support
- CONFIG_VETH
- CONFIG_MACVLAN
- Network core driver support
- Character devices
Once you have lxc installed, you can then check your kernel config with:
root # CONFIG=/path/to/config /usr/sbin/lxc-checkconfig
Emerge lxc
root # emerge app-emulation/lxc
Configure Networking For Container
Typically, one uses a bridge to allow containers to connect to the network. This is how to do it under Funtoo Linux:
- create a bridge using the Funtoo network configuration scripts. Name the bridge something like
brwan
(using/etc/init.d/netif.brwan
). Configure your bridge to have an IP address. - Make your physical interface, such as
eth0
, an interface with no IP address (use the Funtoointerface-noip
template.) - Make
netif.eth0
a slave ofnetif.brwan
in/etc/conf.d/netif.brwan
. - Enable your new bridged network and make sure it is functioning properly on the host.
You will now be able to configure LXC to automatically add your container's virtual ethernet interface to the bridge when it starts, which will connect it to your network.
Setting up a Funtoo Linux LXC Container
Here are the steps required to get Funtoo Linux running inside a container. The steps below show you how to set up a container using an existing Funtoo Linux OpenVZ template. It is now also possible to use Metro to build an lxc container tarball directly, which will save you manual configuration steps and will provide an /etc/fstab.lxc
file that you can use for your host container config. See Metro Recipes for info on how to use Metro to generate an lxc container.
Create and Configure Container Filesystem
- Start with a Funtoo LXC template, and unpack it to a directory such as
/lxc/funtoo0/rootfs/
- Ensure
c1
line is uncommented (enabled) andc2
throughc6
lines are disabled in/lxc/funtoo0/rootfs/etc/inittab
That's almost all you need to get the container filesystem ready to start.
Create Container Configuration Files
Create the following files:
/var/lib/lxc/funtoo0.config
Read "man 5 lxc.conf" , to get more information about linux container configuration file.
## Container lxc.utsname = funtoo0 lxc.rootfs = /lxc/funtoo0/rootfs/ lxc.arch = x86_64 lxc.console = /var/log/lxc/funtoo0.console # uncomment if you want to log containers console #lxc.tty = 6 # if you plan to use container with physical terminals (eg F1..F6) lxc.tty = 0 # set to 0 if you dont plan to use the container with physical terminal, also comment out in your containers /etc/inittab c1 to c6 respawns (e.g. c1:12345:respawn:/sbin/agetty 38400 tty1 linux) lxc.pts = 1024 ## Capabilities (man 7 capabilities) lxc.cap.drop = sys_module mac_admin mac_override sys_time audit_control audit_write syslog sys_admin sys_rawio # note: dropping capability sys_resource, causes failure to ssh into funtoo LXC container. ## Devices lxc.cgroup.devices.deny = a # Deny access to all devices # Allow to mknod all devices (but not using them) lxc.cgroup.devices.allow = c *:* m lxc.cgroup.devices.allow = b *:* m lxc.cgroup.devices.allow = c 1:3 rwm # /dev/null lxc.cgroup.devices.allow = c 1:5 rwm # /dev/zero lxc.cgroup.devices.allow = c 1:7 rwm # /dev/full lxc.cgroup.devices.allow = c 1:8 rwm # /dev/random lxc.cgroup.devices.allow = c 1:9 rwm # /dev/urandom #lxc.cgroup.devices.allow = c 4:0 rwm # /dev/tty0 # ttys not required if you have lxc.tty = 0 #lxc.cgroup.devices.allow = c 4:1 rwm # /dev/tty1 #lxc.cgroup.devices.allow = c 4:2 rwm # /dev/tty2 #lxc.cgroup.devices.allow = c 4:3 rwm # /dev/tty3 lxc.cgroup.devices.allow = c 5:0 rwm # /dev/tty lxc.cgroup.devices.allow = c 5:1 rwm # /dev/console lxc.cgroup.devices.allow = c 5:2 rwm # /dev/ptmx lxc.cgroup.devices.allow = c 10:229 rwm # /dev/fuse lxc.cgroup.devices.allow = c 136:* rwm # /dev/pts/* lxc.cgroup.devices.allow = c 254:0 rwm # /dev/rtc0 ## Limits lxc.cgroup.cpu.shares = 1024 lxc.cgroup.cpuset.cpus = 0 # limits container to CPU0 lxc.cgroup.memory.limit_in_bytes = 1024M lxc.cgroup.memory.memsw.limit_in_bytes = 2048M lxc.cgroup.blkio.weight = 500 # requires cfq block scheduler ## Filesystems lxc.mount.entry = proc proc proc nosuid,nodev,noexec 0 0 lxc.mount.entry = sysfs sys sysfs nosuid,nodev,noexec,ro 0 0 lxc.mount.entry = shm dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,mode=1777,size=256m,create=dir 0 0 # /dev/shm size should be less then half of your container memory limit lxc.mount.entry = tmpfs run tmpfs nosuid,nodev,noexec,mode=0755,size=128m 0 0 lxc.mount.entry = tmpfs tmp tmpfs nosuid,nodev,noexec,mode=1777,size=128m 0 0 ##Example of having /var/tmp/portage as tmpfs in container #lxc.mount.entry = tmpfs var/tmp/portage tmpfs defaults,size=8g,uid=250,gid=250,mode=0775 0 0 ##Example of bind mount #lxc.mount.entry = /srv/funtoo0 /lxc/funtoo0/rootfs/srv/funtoo0 none defaults,bind 0 0 ## Network configuration lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.ipv4 = 192.168.1.2/24 lxc.network.ipv4.gateway = 192.168.1.1 lxc.network.hwaddr = #put your LXC container MAC address here, otherwise you will get a random one lxc.network.name = eth0
Read "man 7 capabilities" to get more information aboout Linux capabilities.
Above, use the following command to generate a random MAC for lxc.network.hwaddr
:
root # openssl rand -hex 6
It is a very good idea to assign a static MAC address to your container using lxc.network.hwaddr
. If you don't, LXC will auto-generate a new random MAC every time your container starts, which may confuse network equipment that expects MAC addresses to remain constant.
It might happen from case to case that you aren't able to start your LXC Container with the above generated MAC address so for all these who run into that problem here is a little script that connects your IP for the container with the MAC address. Just save the following code as /etc/lxc/hwaddr.sh
, make it executable and run it like /etc/lxc/hwaddr.sh xxx.xxx.xxx.xxx
where xxx.xxx.xxx.xxx represents your Container IP. /etc/lxc/hwaddr.sh
:
#!/bin/sh IP=$* HA=`printf "02:00:%x:%x:%x:%x" ${IP//./ }` echo $HA
LXC Networking
- veth - Virtual Ethernet (bridge)
- vlan - vlan interface (requires device able to do vlan tagging)
- macvlan (mac-address based virtual lan tagging) has 3 modes:
- private
- vepa (Virtual Ethernet Port Aggregator)
- bridge
- phys - dedicated host NIC
Linux Containers and Networking
Enable routing on the host: By default Linux workstations and servers have IPv4 forwarding disabled.
root # echo "1" > /proc/sys/net/ipv4/ip_forward root # cat /proc/sys/net/ipv4/ip_forward root # 1
Initializing and Starting the Container
You will probably need to set the root password for the container before you can log in. You can use chroot to do this quickly:
root # chroot /lxc/funtoo0/rootfs (chroot) # passwd New password: XXXXXXXX Retype new password: XXXXXXXX passwd: password updated successfully (chroot) # exit
Now that the root password is set, run:
root # lxc-start -n funtoo0 -d
The -d
option will cause it to run in the background.
To attach to the console:
root # lxc-console -n funtoo0
You should now be able to log in and use the container. In addition, the container should now be accessible on the network.
To directly attach to container:
root # lxc-attach -n funtoo0
To stop the container:
root # lxc-stop -n funtoo0
Ensure that networking is working from within the container while it is running, and you're good to go!
Starting LXC container during host boot
- You need to create symlink in
/etc/init.d/
to/etc/init.d/lxc
so that it reflects your container. ln -s /etc/init.d/lxc /etc/init.d/lxc.funtoo0
- now you can add
lxc.funtoo0
to default runlevel rc-update add lxc.funtoo0 default
root # rc * Starting funtoo0 ... [ ok ]
LXC Bugs/Missing Features
This section is devoted to documenting issues with the current implementation of LXC and its associated tools. We will be gradually expanding this section with detailed descriptions of problems, their status, and proposed solutions.
PID namespaces
Process ID namespaces are functional, but the container can still see the CPU utilization of the host via the system load (ie. in top
).
/dev/pts newinstance
- Some changes may be required to the host to properly implement "newinstance"
/dev/pts
. See This Red Hat bug.
lxc-create and lxc-destroy
- LXC's shell scripts are badly designed and are sure way to destruction, avoid using lxc-create and lxc-destroy.
network initialization and cleanup
- If used network.type = phys after lxc-stop the interface will be renamed to value from lxc.network.link. It supposed to be fixed in 0.7.4, happens still on 0.7.5 - http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg01760.html
- Re-starting a container can result in a failure as network resource are tied up from the already-defunct instance: [1]
graceful shutdown
- To gracefully shutdown a container, it's init system needs to properly handle kill -PWR signal
- For funtoo/gentoo make sure that you have:
- pf:12345:powerwait:/sbin/halt
- in your containers /etc/inittab
- For debian/ubuntu make sure that you have:
- pf::powerwait:/sbin/shutdown -t1 -a -h now
- in your container /etc/inittab
- and also comment out other line starting with pf:powerfail (such as pf::powerwait:/etc/init.d/powerfail start) <- these are used if you have UPS monitoring daemon installed!
funtoo
- Our udev should be updated to contain
-lxc
in scripts. (This has been done as of 02-Nov-2011, so should be resolved. But not fixed in our openvz templates, so need to regen them in a few days.) - Our openrc should be patched to handle the case where it cannot mount tmpfs, and gracefully handle this situation somehow. (Work-around in our docs above, which is to mount tmpfs to
/libexec/rc/init.d
using the container-specificfstab
file (on the host.) - Emerging udev within a container can/will fail when realdev is run, if a device node cannot be created (such as /dev/console) if there are no mknod capabilities within the container. This should be fixed.
References
man 7 capabilities
man 5 lxc.conf
Links
- Fun stuff with LXC
- Try LXD which brings more features to LXC
- There are a number of additional lxc features that can be enabled via patches: [2]
- Ubuntu User Namespaces page
- lxc-gentoo setup script on GitHub
- IBM developerWorks
- Linux Weekly News