首页 ISP 眼中的 jail(8) 虚拟专用服务器

ISP 眼中的 jail(8) 虚拟专用服务器

举报
开通vip

ISP 眼中的 jail(8) 虚拟专用服务器 One of the first significant elements of UNIX [1], was process time-sharing [2]. It’s easy to forget these early times, as we now com- monly touch relatively inexpensive multi-cpu hardware, eclipsing the power of a PDP-11; with smp and multi-threading ker...

ISP 眼中的 jail(8) 虚拟专用服务器
One of the first significant elements of UNIX [1], was process time-sharing [2]. It’s easy to forget these early times, as we now com- monly touch relatively inexpensive multi-cpu hardware, eclipsing the power of a PDP-11; with smp and multi-threading kernels. Com- puters therefore manage simultaneous proc- esses scaled to levels only the most adven- turous could dare imagine back when UNIX first appeared. Active and persistent memory have of course scaled with raw CPU power. And it continues to get faster. We all know this. We all know about machines, and have come to repeat the design intentions of time- sharing in many forms, including the FreeBSD jail(8) facility- a virtual machine. The jail(8) subsystem in FreeBSD is well known to be an incredibly secure and durable system for partitioning processes, memory, network, and disk i/o. Building on the sim- plest of core UNIX subsystems, jail is an ele- gant base for creating Virtual Private Servers (# man 8 jail) To bastardize this rich and elegant system on FreeBSD: chroot(2), bound to an IP address, minus some relevant system calls = jail (Simply add a BSD userland, and a full virtual system is born, with a confined root!) This material assumes the reader is familiar with the jail(8) utility, and generally familiar with the mechanisms of the underlying jail(2) system call. Further reading on the use and implimentation of jail(8) can be found in the paper written by jail’s original author, ‘Jails: Confining the omnipotent root.’, (PHK/ Watson, FreeBSD Core) [3]. This material aims to share real-world expe- riences running massively jailed systems, from a ISP perspective. Diverse goals and agendas can be liberated by applying modu- lar, self-contained, and disposable technolo- gies- (in short, traditional UNIX principles). An ISP Perspective, jail(8) Virtual Private Servers An ISP Perspective, jail(8) Virtual Private Servers Isaac (.ike) Levy, Materials prepared for AsiaBSDCon 2007 Proceedings, University of Tokyo, Japan. These materials are Copyright 2006 Isaac Levy, under the terms of the BSD license. The denial of complexity is the beginning of failure. - Swiss historian, Jacob Burkhardt ..with proper design, the features come cheaply. This approach is arduous, but continues to succeed. - UNIX co-creator, Dennis Ritchie ...As in all Utopias, the right to have plans of any significance belonged only to the planners in charge. - Jane Jacobs, “The Death and Life of Great American Cities” [0] Audience for these materials: - UNIX System Administrators with demand- ing users, and limited hardware resources - Internet Service Providers who wish to pro- vide robust shared hardware services - Internet Service Providers with rigorous high-availability requirements, where mutu- ally untrusted users and processes pose a threat to service reliability (uptime) - Institutions with fast-paced development, learning, or short-lived server requirements The iMeme Experience, my time at a small jailing ISP- (the first of it’s kind?) Around 2000 I became a customer at a small web hosting company called iMeme. The iMeme specialty, root-access virtual servers (using FreeBSD jail(8)). My need, was to run and further develop the behemoth web appli- cation server, Zope. I needed basics- root, a compiler, cron, logfile analysis and reporting tools- (a full server). My budget was under $70/mo usd, and back then a dedicated server was unrealistic at that rate- I needed virtual-hosting scaled prices. By 2002, iMeme hit some stiff ‘problems’ when a partner left, I was then asked to join the company- and we gave it quite a go. During my time at the company we hit a mark of 1000 domains hosted, in around 470 jailed systems. The ISP was unique in that once you paid for your jailed system online, it was ‘booted’, and you had access to your new server- no Administrator action was neces- sary. iMeme, as a company, later died based on external business problems. Mutually Untrusted Users, (and processes). 2007, it can be estimated there are 785 mil- lion people using the ipv4 internet [4], argua- bly a critical mass. Most of these users have personal computers, yet a great deal of com- puting today, again, happens on servers, of- fering services in various contexts. As the needs of users become more sophis- ticated and varied, the applications become a uniquely fragmented environment. From a birds eye view, an astounding amount of computing machinery makes all these net- work applications run. From a micro view, it doesn’t take much computing machinery to run a single Gmail account- (from the CPU clock perspective). With that, the proliferation of network soft- ware which looks suspiciously like ‘websites’, (and perhaps mislabeled as such), are start- ing to to take various business applications off the PC, and onto the webserver, en masse’. Everything from content and asset management systems, to financial account- ing and transaction systems, to the core of the internet- information exchange through blogs, online communities, and on, and on. Through a sort of promiscuity of form [5], http applications are evolving to manifest timeless forms of ‘traditional’ software. Users of any given ISP always include devel- opers, hackers [6] , us. The mass of internet users who do not hack, have the same so- phisticated and diverse demands. For ex- ample, thank MySpace for escalating user expectations in mass-market accessibility in http server applications. With that, iMeme aimed to provide an inexpensive base plat- form for new internet applications like this to grow. The real world of iMeme users: A hacker: “I want to compile LISP”, An undergraduate so- ciology student: “I want to install ‘Foo’ blog software, it’s PHP and the instructions say I need to run Cron”, A web designer: “I want to run an http server on port 8080”. A business owner: “I want to run Foo web application for my business.” A community leader: “I want to run Mailman List Manager”, A 13 year old hacker: “I want to run both an IRC and jabber An ISP Perspective, jail(8) Virtual Private Servers server for my friends”. Most iMeme users simply, just wanted to hack Python/Zope. Fairly simple requirements, yet so hard for commodity web hosting to accommodate! Each of these users demands, and deserves, root. The real world of iMeme users was extremely diverse. From a business perspective, the ‘markets’ served were all considered niche- hosting companies thought we were crazy. However, we felt the internet is merely niches stitched together to make a whole, and jail enabled a unique opportunity to build our ISP in the model of a metropolitan city [7]. Timeless Methodology in Computing (UNIX, the undead in computing) Ancient UNIX computing models revolved around a model which the PC era did away with: server applications, feeding thin clients (server + many UNIX terminals). PC’s evolved, and network computing became largely a peer-to-peer affair. The internet, has now brought a swing in the pendulum back to thin clients, as the Web Browser, as software, takes on the same role a terminal did years ago- and UNIX is right there, ready and waiting to handle the applications- with an astounding wealth of time-tested (and some ancient) tools well suited for managing multi-user multi-process servers. With that, simple, modular, disposable utilities are vital to meeting the diverse needs of the iMeme user, in providing a full Virtual Private Server environment. When jail(8) was first introduced to FreeBSD, it was (and still is) a simple utility, written in the spirit of old UNIX. As a simple utility, jail(8) provided iMeme the opportunity to build on the work of others and avoid rein- vention and incompatibilities, (classic UNIX methodology). jail(8) therefore proved itself well suited to to taking on the complexities of our user needs, which were essentially limitless. Other Virtu- alized system designs come close, but in- somuch as most Virtual OS systems take on the monolithic responsibility of providing all system interfaces, (virtualized memory, net- working, filesystem), they all critically failed to meet the iMeme needs in one area or an- other- as their respective histories were to meet a particular computing problem, or use case. The history of computing is littered with the corpses of Virtual OS systems, all of which end up withering under the sheer weight of the computational responsibilities they take on. However, like UNIX time sharing, simple and modular components of computational virtualization seem to be the only elements which persist. Subsystems like UNIX users and ACL’s, actually the entire concept of UNIX privilege separation, follows in the foot- steps of the simple mechanism of time- sharing. Enter, jail(8), 1998. As a small and complete utility, jail(8) is much like the invention of of the Otis Elevator and it’s affect on the design of skyscrapers, “In the era of the staircase all floors above the second were considered unfit for com- mercial purposes, and all those above the fifth, uninhabitable.” [8] The jail(8) utility, enabled the same sort of liberation of space, and with the same over- tones of ‘safety’- if one compares security features to elevator safety concerns, (falling). (Running the risk of sounding silly, I am di- rectly comparing an internet hosting ISP to a skyscraper, and skyscrapers are different from other types of buildings.) The iMeme Experience (System Specifics) The iMeme systems were quite simple for UNIX administrators to understand. An ISP Perspective, jail(8) Virtual Private Servers We ran high-density 2u (and then 1u) serv- ers, which we aimed to have approximately 50 jails running on at any given time. In 2001, a base account was provisioned 4gb of disk space, and 100mb of what we called ‘process space’, the amalgamation of mem- ory and cpu usage. Bandwidth was rarely an issue worth metering back then, so very ba- sic QOS oriented throttling was performed to ensure every user had a fair slice of available network traffic. For disk space, we ran scripts from the host server which simply used du, and shoved the output into MySQL databases- where we then automated the process of implementing policies of charging for extra disk usage. We choose to give 1 month of ‘grace time’, in- somuch as sometimes logfiles would ex- plode, or users would accidentally consume undue disk space- and we felt this was a simple buffer our customers appreciated. Hard limits for disk space were always a con- sideration. Disk slices were far too rigid to meet user demands, (creating extreme over- head in managing upgrading disk space), though we did experiment with them. A per- sistent risk was that a user, by choice, acci- dent, or compromise, could consume all the available disk space for a jailing system. With that, again, simple unix strategies came back into place to contain the problem. The strategy we ended up liking best was to ab- solutely a partition for jails, (the majority of available disk), and then perhaps break it into a few chunks to isolate various jailed disk space from each other. After time, 80gb slices worked nicely, and fitting 4x 300gb drives into 1u, this afforded a sort of ‘neigh- borhood’ partitioning. Extreme cases of disk consumption were further restricted on a per- case basis, using file-backed memory disks (disk images); but, especially in recent FreeBSD releases, this incurs an additional i/ o penalty, which users do not appreciate- (and it soaks RAM on the host system as well). Disk images are not necessarily a practical solution for every jailed system, however flexible they are in providing hard limits to disk space. Memory and CPU usage was polled on a regular basis for each jail. Shell scripts were originally setup to run as cron jobs inside each jail, which took cumulative memory consumption and cpu usage by parsing ps(1) output inside a given jail. While iMeme origi- nally ran thes scripts inside of each jailed system, outputting totals to text files in /jail/ dir/var/log/, however this always carried the risk that a user could (trivially) bypass this system to avoid increased billing or other- wise. In their jail, remember, the user has root. That stated, eventually iMeme moved this system out to the host system with new jailing features in FreeBSD 5.x- insomuch as one can list/kill processes based on the jail id, information availble to ps, and processes listed in the /proc filesystem. FreeBSD 4.x jailing relied heavily on a jailed hostname for host-level process identification (and subsequent management)- which cre- ated problems. If a user changed their host- name, accidentally or maliciously, havoc would follow for management systems in the host system. FreeBSD 5.x solved this prob- lem by pinning a ‘jail id’ to each process on the system, and providing a sysctl to lock down the ability to change hostnames within a jail. Jailed process restrictions were then handled neatly using renice(8). Processes which hogged undue CPU were simply renice’d by the host server, releasing the process renice level after 5 minutes to see if the process was again behaving. If not, it was reniced again. This crude strategy was wildly suc- cessful in maintaining fair-share cpu and memory usage for processes. Problem processes, (things with memory leaks, for example), were then in the hands of the jailed user to deal with- without negatively impact- ing the other jailed users. Fork bombs were still a threat, but from FreeBSD 5.x onward, each jail could be set An ISP Perspective, jail(8) Virtual Private Servers to start with an escalated securelevel, and maxprocs could be locked for a jail, chflags(2) disabled in jails via host sysctl set- tings, and viola- fork bombs as a threat are mitigated, with relatively minimal manage- ment and resource consumption. Network resource management is far outside the scope of this material, however, it is worth mentioning one thing: at iMeme, each jailing hardware server was conceptually treated like a network border or gateway, with routing and filtering tasks carried out inside the ma- chine. This paradigm shift in management greatly simplified the physical network re- quirements, (making routers, firewalls, non- existent). With that, we ran NAT for our ex- ternal IP blocks, and mapped addresses to our jails- which all ran using a private net- block, (192.168.x.x). This NAT strategy had pros and cons and is hardly worth discus- sion- except to state it all was run from the host servers, with negligible impact on jailed systems. Also, back then, ipfw(8) and dum- mynet(4) were used for very minimal network management- dummynet(4) configured to provide eqal-share bandwidth (ad-hock QOS), and IPFW was crudely used to put out fires. Today, in my Diversaform jail cluster, pf(4) nicely replaces these tools- and is be- coming the de-facto packet filter- and in 5 more years, there may be something else, but it will still be running from the jailing host hardware. Large Scale Management Techniques (System Specifics) At iMeme, we maintained Master Record Server (obviously a redundant system). This system primarily kept the MySQL database which recorded everything from resource us- age, to billing and contact information. This strategy worked well, provided any modifications/additions to this system were thoroughly tested. This was easy, insomuch as we could replicate this system in one of our jails at any time, and then dispose of the jail. There was no reason in particular for the MySQL database, it was just used in the be- ginning and stuck with us reliably. The website, where users bought jailed sys- tems, and managed their account and billing, was all written in Zope, and had PHP ele- ments added over time. This could have been any web technology. As each iMeme jailed system had some cus- tom tweaks, we maintained a pre-compiled FreeBSD useraland, preconfigured with any small tweaks to our enviornment (like the cpu/memory polling cron job mentioned be- fore). These jailed systems were built, and put into cvs(1) repositories for long-term management, however tar(1) became the deployment tool of choice. Scripts to add new systems would effectively untar the cur- rent jailing userland, and then run scripts to add an initial user, add the root password, and start the jail. Upgrading jails was a trivial technical proc- ess. System upgrades were handled simi- larly, un-tarring updated userland sources to jailed userland directories. Following the hi- er(7) man page, users additional applications ended up in /usr/local, and only in extreme edge cases did a customer application have problems with minor dot upgrades, (4.5 to 4.6, for example). In FreeBSD 5.x, it became clear that running installworld, and tossing it an additional flag for the jailed directories, was even simpler than the tarballs, with the additional benefit of dispensing with keeping userland (binaries!) in CVS. When monitoring the systems, based on the rapid scaling possibilities with the ease of adding jails, keep monitoring simple- and quiet. When problems occur on jailed sys- tems, it’s *always* possible that all jails on a particular host are affected, so if they all trip alarms, administrators can get lost in white noise. An experiment, was logger(1)/ syslog(3). iMeme tried pushing all jailed logs out to master syslogd(8) server, with nearly An ISP Perspective, jail(8) Virtual Private Servers worthless results. The valuable information was covered by the white noise of everything users were doing and running in their sys- tems, and it also provided outright surprising breaches of privacy- so iMeme abandoned this idea immeadiately. While there are ways to sanely utilize syslogd(8) schemes, they are far outside of the scope of this material. Jailing Redundancy (failure is life) Jails present a uniquely simplistic mechanism for backup and fail-over. At iMeme, each jail- ing host kept jails in /usr/local/jails. As time and internal methodology evolved, (disk slice strategies, etc...) /usr/local/jails/hostname.jailing.host became collected mount ponts and soft links, but the userland interface was always the same to find a given jail: /usr/local/jails/hostname.jailing.host/JAIL_DI R Then, each jaiing host both exported, and mounted, all other jail directories as an NFS mount. This carried extreme management benefits, worth the hassle and cursing asso- ciated with heavy NFS use. Operations could be carried out on each jailed userland from any jailing host in the cluster! With that stated, backups and restore became simple operations. Backing up became an operation of tarballing each jail to a backup server, (in- dependently redundant), and restores con- sisted of untarring the jailed userland in the NFS mount of a jailed host. If a jailing host server died, all of it’s jailed systems could then be rapidly re-distributed and re-started across the whole cluster. This process re- quired Administrator intervention. Post-iMeme, Diversaform jailed systems are run slightly differently- without NFS. Each jailing host has an identical hardware ma- chine, which jailed systems are regularly synchronized to. If a jailed application re- quires time-based backups, it is synchronized to another jailing server (itself having a hard- ware twin). Diversaform systems have also been experimenting with a combination of carp(4) and ggated(8) (GEOM Gate), provid- ing network in
本文档为【ISP 眼中的 jail(8) 虚拟专用服务器】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_631325
暂无简介~
格式:pdf
大小:139KB
软件:PDF阅读器
页数:0
分类:互联网
上传时间:2010-05-02
浏览量:22