This is just a note which I contributed to a thread on sage-members, to get something off my chest, as to where people should maintain their crontab entries. I sincerely doubt that reading what I have to say will bring you any great illumination.
I’d say, any reasonable SysAdmin should default to /etc/crontab because every other reasonable SysAdmin already knows where it is. If anything is used in addition to /etc/crontab, leave a note in /etc/crontab advising the new guy who just got paged at 3:45am where else to look for crons.
For production systems, I strongly object to the use of per-user crontabs. I’m glad to hear I’m not alone. One thing I have to do in a new environment tends to be to write a script that will sniff out all the cron entries.
And then there was the shop that used /etc/crontab, user crons, and fcron to keep crons from running over each other. This frustrated me enough that I did a poor job of explaining that job concurrency could easily be ensured by executing a command through (something like) the lockf utility, instead of adding a new layer of system complexity.
I was startled by this YouTube video, where we discover that Bill Gates can make fun of himself. Or, at least, his people can assemble a video where Bill Gates makes fun of himself. Good for Bill! I was then reassured at the consistency of the universe, when it was revealed that Bill really can’t make fun of himself without at least a dozen star cameos to reassure us that it is not so much that he is poking fun at himself, but that he is “acting”.
It is telling that Al Gore has the funniest line.
I hope Bill’s foundation does much good in the world. I almost feel sorry for Microsoft that after all the effort, Vista has proven to be a cold turkey. For what its worth, from a UI and performance perspective, I prefer Windows XP to Mac OS X. Though I’m not sure that this is praise for Microsoft as much as it is an aversion to the Smug Cult of Apple.
(Yes, I am a contrarian. People hate contrarians. Especially Mac people, who think they have the contrarian cred: the last thing a contrarian wants to encounter is a contradicting contrarian!)
Problem: You have logins to a bajillion things and that is too many unique passwords to remember. Maybe you remember a half dozen passwords, if you’re lucky, but you would prefer to have a unique password for each account so the hackers can’t get you.
One approach is to always generate a new password when you get access to a new account, and store that somewhere safe. Sticky notes on your monitor? A GPG-encrypted file with a regularly-changing hash? Either way, you have to account for what happens if someone else gets access to your password list, or you yourself can not access this password list. I am not fond of this approach.
My Tip: I suggest instead of storing passwords, you come up with a couple of ways to “hash” unique passwords depending, on say, a web site’s name.
For example, if you were really lame, and you used the password “apple” for everything, you’d make things better if instead, say, you replaced the the ‘pp’ part with the first three letters of your web site’s name.
Now, you can get a lot more creative than that, like using a non-dictionary word, mixing up letter cases and punctuation, etc.
Try a more advanced hash:
– Start with a pass-phrase “apples are delicious, I eat one every day”
– Take the last letter from each word: “sesiteyy”
– Capitalize the last half of the passphrase: “sesiTEYY”
– Stick the first three letters of the web site’s name in the middle: “sesi___TEYY”
– If the third letter you insert is a vowel, follow it with a “!” otherwise, add an “@”
– Change the first letter that you can from the substitution: a becomes a 4, e becomes a 3, i becomes a 1, and o becomes a zero
Now you get:
Yahoo: sesiy4h@TEYY
Google: sesig0o!TEYY
Amazon: sesi4ma!TEYY
MSN: sesimsn@TEYY
Apple: sesi4pp@TEYY
It is best if you have a few different schemes you can use: some web sites reject strong passwords, so having a really bad password handy is good, and some places you’ll want extra secure. For example, use a different “hash” for your bank passwords, just in case your “every day” hash is compromised.
I have been following the “One Laptop per Child” project for a while now, formerly known as the “Hundred Dollar Laptop” project, though right now the price comes in closer to $200 . . . in November I am looking forward to getting my hands on one with the “Give One Get One” program. I enjoy following developments on the “OLPC News” blog. Today I learned that Microsoft is scrambling resources to shoehorn its normally-bloated Windows Operating System onto this lightweight gem. That makes me smile because it is usually the case that computers like the laptop I am typing on right now are “Designed for Windows(R) XP” or the like, and it is the Open Source community that must scramble to reverse-engineer and build drivers for the new hardware.
Anyway, I was just looking at a post that suggests that since the OLPC is rather ambitious, technologically and culturally, they have no qualms about redesigning the keyboard: no more CAPS LOCK but instead a mode to shift between Latin alphabet and the local alphabet. Also, perhaps, a “View Source” key: which could perhaps allow kids to poke under the Python hood and check out the code that is running underneath. My goodness!
I’d like to chime in with a “me too” . . . sure most people don’t find much use for the hood latch on a car, but we’re glad it is there: it allows us to get in if we need to. For the smaller number of people who DO want to play under the hood, the hood release is invaluable. We all learn differently and and those who are going to get into computers ought to be given the access and encouragement to learn.
As for code complexity: you can still view the source on this very page and understand much of it. I understand that Python is constrained to 80 columns and is highly highly readable.
As for breaking things: EXACTLY!! The kids ought to have access to break the code on their computers. Rather than turning them in to worthless bricks: worst case you reinstall the OS! Talk about a LEARNING experience!! Anyway, programmers use revision control: hopefully an XO could provide some rollback mechanism. :)
It should also be good for long-term security … people will learn that computers execute code, and code can have flaws an exploits. If the kids can monkey with their own code, you KNOW they’re going to have some early transformative learning experience NOT to paste in “cool” code mods from the Class Hacker. ;)
After yesterday’s post, I figured I would have to re-synchronize the slave database from the master, but probably build a more capable machine before doing that. I figured at that point, I might as well try fiddling with MySQL config variables, just to see if a miracle might happen.
At first I twiddled several variables, and noticed only that there was less disk access on the system. This is good, but disk throughput had not proven to be the issue, and replication lag kept climbing. The scientist in me put all those variables back, leaving, for the sake of argument, only one changed.
This morning as I logged in, colleagues asked me what black magic I had done. Check out these beautiful graphs:
The default value of this variable is 1, which is the value that is required for ACID compliance. You can achieve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. If you set the value to 0, then any mysqld process crash can erase the last second of transactions. If you set the value to 2, then only an operating system crash or a power outage can erase the last second of transactions. However, InnoDB’s crash recovery is not affected and thus crash recovery does work regardless of the value. Note that many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. Then the durability of transactions is not guaranteed even with the setting 1, and in the worst case a power outage can even corrupt the InnoDB database. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer.
The Conventional Wisdom from another colleague: You want to set innodb_flush_log_at_trx_commit=1 for a master database, but for a slave–as previously noted–is at a disadvantage for committing writes, it can be entirely worthwhile to set innodb_flush_log_at_trx_commit=0 because at the worst, the slave could become out of sync after a hard system restart. My take-away: go ahead and set this to 0 if your slave is already experiencing excessive replication lag: you’ve got nothing to lose anyway.
(Of course, syslog says the RAID controller entered a happier state at around the same time I set this variable, so take this as an anecdote.)
I’ve got a MySQL slave server, and Seconds_Behind_Master keeps climbing. I repaired some disk issues on the server, but the replication lag keeps increasing and increasing. A colleague explained that several times now he has seen a slave get so far behind that it is completely incapable of catching up, at which point the only solution is to reload the data from the master and re-start sync from there. This isn’t so bad if you have access to the innobackup tool.
The server is only lightly loaded. I like to think I could hit some turbo button and tell the slave to pull out all the stops and just churn through the replication log and catch up. So far, I have some advice:
2) MySQL suggest that you can improve performance by using MyISAM tables on the slave, which doesn’t need transactional capability. But I don’t think that will serve you well if the slave is intended as a failover service.
Your options are fairly limited. You can monitor how far behind the slave is . . . and assign less work to it when it starts to lag the master a lot . . . You can make the slave’s hardware more powerful . . . If you have the coding kung-fu, you might also try to “pipeline the relay log.”
As of 11AM this morning, and until 11AM next Tuesday, I’m “on call” . . . which means that if something breaks, especially at 3AM, I’m the first guy responsible for fixing it.
This is actually a new form of “on call” for me–this is the first time I have been in a “rotation”. At other, smaller companies, I have spent years on-call. Now, that isn’t quite so bad in a small environment where things seldom fail, but it is something of a drag to keep your boss informed of your weekend travel plans so he can watch for pages in your stead. In a larger environment, a week spent on-call can be particularly onerous, because there are plenty of things that will break. But, come the end of the week, you pass the baton . . .
So, this week, I will get my first taste, and over time I will have a better sense as to whether “on call” is better in a smaller environment or a larger environment. I have a feeling that while this week could be rough, that the larger environment is an overall better deal: there is a secondary on-call person, there is an entire team I can call for advice on different things, and the big company provides nice things like a cellular modem card, and bonus pay for on-call time.
Which means, you can just set a series of variables to $1, $2, $3, and so forth. In Anon’s example, the IP address is split into words with tr, and the variables set nice and easy with set.
Of course, if your script gets complex, you probably want to avoid relying on those variables. My original code could be re-expressed:
Big companies like to try to control consumers with new technology. Consumers invariably defeat this technology. Copy-protected video cassettes, CDs, DVDs . . . DVD “regions” so that a DVD bought in one part of the world can’t play in another part of the world, and of course, you can’t play DVDs on Linux . . . but faster and faster all these restrictions get hacked away with software. The geeks have an understanding that a new technology isn’t really useful until the “Digital Rights Management” has been defeated.
The big corporation Google has been trying to fight, ostensibly, on our behalf as well, convincing Congress to sell new radio spectrum for use with open standards, which would give us more raw material to work with that isn’t managed by the big telephone companies. Exciting, esoteric struggles afoot, and you know who I’m rooting for!
In the financial industry, generally accepted accounting practices call for double-entry bookkeeping, a chart of accounts, budgets and forecasting, and repeatable, well-understood procedures such as purchase orders and invoices. An accountant or financial analyst moving from one company to another will quickly understand the books and financial structure of their new environment, regardless of the line of business or size of the company.
There are no generally accepted administration procedures for the IT industry. Because of the ad-hoc nature of activity in a traditional IT shop, no two sets of IT procedures are ever alike. There is no industry-standard way to install machines, deploy applications, or update operating systems. Solutions are generally created on the spot, without input from any external community. The wheel is invented and re-invented, over and over, with the company footing the bill. A systems administrator moving from one company to another encounters a new set of methodologies and procedures each time.
[. . .]
This means that the people who are drawn to systems administration tend to be individualists. They are proud of their ability to absorb technology like a sponge, and to tackle horrible outages single-handedly. They tend to be highly independent, deeply technical people. They often have little patience for those who are unable to also teach themselves the terminology and concepts of systems management. This further contributes to failed communications within IT organizations.
Caveat SysAdmin. It’s just the price we pay for working in a nascent field.
I just completed a feedback form regarding my AppleCare warranty experience. Question 12a gave me a chance to bitch. Question 12b made me smile at my ridiculous expectations:
12a Is there anything else you would like to tell Apple about your recent in-store repair experience at the Apple Retail Store? (NOTE: 2000 character limit)
Replacing the optical drive on a Mac Mini is a simple procedure that takes fifteen minutes, requiring a screwdriver and a putty knife. That I should have to drive to a God damned mall and explain to a “genius” that he doesn’t actually need my password to log in to OS X, wait for twenty minutes as the “genius” engages in manual data entry, then wait “seven to ten business days” for the part to be replaced is FUCKING SAD.
(Note: Hold down command+s during boot, run to the appropriate init level and type “passwd” to reset the password. Even someone who isn’t a “genius” can pull that off!)
[NOTE: For some time I have been considering a series of short “Deathmatch” style articles, contrasting similar-but-different words. This post is the “Pilot” for such a series.]
His big point is that programmers need to stop fretting over moving things between memory and disk themselves. He explains that on a modern computer system, RAM is backed by disk, and disk accesses are buffered in RAM, and a lot of work goes in to the kernel to ensure that the system behaves effectively. By managing your own RAM-or-disk conundrum, you end up making a mess of things, because when you go to move an unused “memory” object to disk, the kernel may already have paged the memory region to disk, and what happens is the object then must move from virtual memory on disk, to RAM, to a disk memory cache in RAM, and then back out to disk. It is simpler and more efficient to just ask for a big chunk of memory and let the kernel page things to disk for you.
He then explains some clever things you can do for multi-processor programming. It seems to boil down to trying to give threads their own stack space wherever practical, and managing worker pools as a stack, so that you are most likely to find yourself processing on the same CPU at the lowest level of cache, and least likely to need to pass memory variables between CPUs.
Not that I write multi-threaded applications, but if I ever do, I’ll try to keep this understanding in mind.
(I like what Yelp have done with their down page.)
The short story is that an underground transformer exploded downtown, and the 365 Main data center failed to automatically start their generators, and had to start them manually, cutting power for nearly an hour for some customers, many of which are smaller, trendier web sites like Craigslist, LiveJournal, Yelp and others. (I have interviewed with half of the companies mentioned in Scott’s post.)
You do not want to lose power across a production-class network. This can cause equipment failure, servers to delay boot because they need to run disk consistency checks, servers to stall boot noting a missing keyboard, disk errors, or whatever. Some services may wedge up because when they started they couldn’t talk to the database . . . in some cases you may have had machines running for a few years, which may have last rebooted three SysAdmins ago. The running state may be subtly different from the boot state, with no documentation . . .
A few years ago I had a chance to rebuild a production network from the ground up, with a decent budget to do everything the right way: redundant network switches, serial consoles, remote power management . . . I remember talking to my manager as to whether we might want a UPS in each rack. We figured that the data center is supposed to keep the power running, or else. Also, if the data center loses power then we lose our network access anyway . . . perhaps the whole point of this post is that data centers do lose power, so a UPS can be worthwhile. If nothing else, it may leave your systems up and ready to go as soon as the network is restored.
Data centers have UPSes too. Huge ones that you may get to walk through on a tour. The purpose of the UPS is to provide battery power between the time utility power fails and on-site generators begin to provide energy. I don’t know enough to comment on this particular case, but I do recall touring a data center in Emeryville, and the guy explained that batteries become less effective over time, and a lot of data centers fail to test their batteries regularly. When wired in series, one bad battery brings down the entire UPS, and so even though you have a generator on-site, the UPS can fail before you manage to transfer to generator power. While this stuff is beyond my expertise, I’m inclined to believe that this is what happened at 365 Main yesterday: a data center should not only test its failover-to-generator procedure on a regular basis, they need to ensure sufficient battery capacity to keep systems running during the time it would reasonably take to switch to generator power.
On the weekend of July 22 and 23, I and about 400 other folks attended WordCamp 2007 in San Francisco. This is a conference about WordPress blogging software, and blogging itself. I am usually a bit wary of killing my weekend by spending the bulk of it with a bunch of nerds. Especially bloggers. But then, I am a nerd, and this is, I admit, a blog . . . that and registration was merely $25 and covered my food for the weekend. That’s a pretty compelling deal for the unemployed! Added value was found at the open bar on Saturday night at one of my favorite bars: Lucky 13.
Here are notes I compiled during the Saturday presentations. (more…)