ServerA> virsh -c qemu:///system list
Id Name State
----------------------------------------------------
5 testvm0 running
ServerA> virsh -c qemu+ssh://ServerB list # Test connection to ServerB
Id Name State
----------------------------------------------------
The real trick is to work around a bug whereby the target disk needs to be created. So, on ServerB, I created an empty 8G disk image:
This will take some time. I used virt-manager to attach to testvm0’s console and ran a ping test. At the end of the procedure, virt-manager lost console. I reconnected via ServerB and found that the VM hadn’t missed a ping.
And now:
ServerA> virsh -c qemu+ssh://ServerB/system list
Id Name State
----------------------------------------------------
4 testvm0 running
Admittedly, the need to manually create the target disk is a little more janky than I’d like, but this is definitely a nice proof of concept and another nail in the coffin of NAS being a hard dependency for VM infrastructure. Any day you can kiss expensive, proprietary SPOF dependencies goodbye is a good day.
I inherited a bunch of ProxMox. It is a rather nice, freemium (nagware) front-end to virtualization in Linux. One of my frustrations is that the local NAS is pretty weak, so we mostly run VMs on local disk. That compounds with another frustration where ProxMox doesn’t let you build local RAID on the VM hosts. That is especially sad because it is based on Debian and at least with Ubuntu, building software RAID at boot is really easy. If only I could easily manage my VMs on Ubuntu . . .
And then, on your local Ubuntu workstation: (you are a SysAdmin, right?)
sudo apt-get install virt-manager
Then, upon running virt-manager, you can connect to the remote host(s) via SSH, and, whee! Full console access! So far the only kink I have had to iron is that for guest PXE boot you need to switch Source device to vtap. The system also supports live migration but that looks like it depends on a shared network filesystem. More to explore.
I have been working with AWS to automate disaster recovery. Sync data up to S3 buckets (or, sometimes, EBS) and then write Ansible scripts to deploy a bunch of EC2 instances, restore the data, configure the systems.
Restoring data from Glacier is kind of a pain to automate. You have to iterate over the items in a bucket and issue restore requests for each item. But it gets more exciting than that on the billing end: Glacier restores can be crazy expensive!
2) Amazon Glacier will also charge you money if you delete data that hasn’t been in there for at least three months. If you Glacier something, you will pay to store it for at least three months. So, Glacier your archive data, but for something like a rolling backup, no Glacier.
3) When you get a $,$$$ bill one month because you were naive, file a support request and they can get you some money refunded.
I have some racks in a data center that were designed to use 3-phase PDUs. This allows for greater density, since a 3-phase circuit delivers more watts. The way three-phase works is that the PDU breaks the circuit into three branches, and each branch is split across two of the legs. Something like:
Branch A: leg X->Y
Branch B: leg Y->Z
Branch C: leg Z->X
The load needs to be balanced as evenly as possible across the three branches. When the PDU tells me the power draw on Branch A is say, 3A, I am really sure if that is because of equipment on Branch A or … Branch C? This only becomes a problem when one branch reports chronic overloading and I want to balance the loads.
I chatted with a friend who has a PhD in this stuff. He said that if my servers don’t all have similar characteristics, then the math says I want to randomly distribute the load. That is hard to do on an existing rack.
I came up with an alternative approach. Most of my servers fall into clusters of similar hardware and what I would call a “load profile.” I had a rack powered by two unbalanced 3-phase circuits. I counted through the rack’s server inventory and classified each host into a series of cohorts based on their hardware and load profile. Two three-phase circuits gives me six branches to work with, so I divided the total by six to come up with a “branch quota.” Something like:
<= 2 hadoop node, hardware type A
<= 2 hadoop node, hardware type B
1 spark node, hardware type A
1 spark node, hardware type C
1 misc node, hardware type C
I then sat down with a pad and paper, one circuit on either side of the sheet, and wrote down what servers I had, and where the circuit was relative to quota. So, one circuit might have started off as:
I then moved servers around, updating each branch circuit with pencil and eraser as I went, like a diligent Dungeon Master. (And also, very important: the PDU receptacle configuration.)
The end result:
I started moving servers around Thu 15:00, and was done about 16:30, which is also when the hadoop cluster went idle. It kicked back up again at 17:00, and started spinning down around midnight.
What is important is to keep sustained load on the branches under the straight green line, which represents 80% of circuit capacity. You can see that on the left, especially the second circuit had two branches running “hot” and after the re-balancing the branch loads fly closer together, and top off at the green line.
I have a Nagios check to monitor the power being drawn through our PDUs. While setting all this up, I had to figure out alert threshholds. The convention for disk capacity is warn at 80%, critical at 90%. I also had this gem to work with:
National Electric Code requires that the continuous current drawn from a branch circuit not exceed 80% of the circuit’s maximum rating. “Continuous current” is any load sustained continuously for at least 3 hours.
Lately I have been getting a lot of warning notifications about circuits exceeding 80%. Ah, but the NEC says that is only a problem if they are at 80% for more than three hours. So, I dig through Nagios documentation and split my check out into two services:
define service{ # PDU load at 90% of circuit rating
use generic-service
hostgroup_name pdus
service_description Power Load Critical
notification_options c,u,r
check_command check_sentry
contact_groups admins
}
define service{ # PDU sustained load at 80% of circuit rating for 3 hours
use generic-service
hostgroup_name pdus
service_description Power Load High
notification_options w,r
first_notification_delay 180
check_command check_sentry
contact_groups admins
}
The first part limits regular notifications to critical alerts. In the second case, the first_notification_delay should cover the “don’t bug me unless it has been happening for three hours” caveat and I set that service to only notify on warnings and recovery.
------0-------0----|-2-------2-------|------0-------0----|-----------------|
----0---3---0---3--|-----3-------3---|----0-------0------|---------0-------|
--0-------0--------|-----------------|-------------------|-2---2-------0---|
-------------------|-----------------|-------------------|-----------------|
-------------------|---0-------0-----|--0-------0--------|-----------------|
G B D D G B D D E G D E G D G B D G B D A A B G
I have been thinking about whether and where there are “drone notes” to work a roll around. I started with cleaning up the timing. (My knowledge of this song consists mainly of YouTube videos from India.) The bars go:
father finger x2
where are you? x2
here i am! x2
how do you do? x1
Each note here corresponds to a syllable, with quarter-note spacing on the last phrase. At least at the speed I play at this time, there is no roll to be added.
I did make two changes:
1) The fourth and eighth Ds in the first phrase moved from fifth string to fourth. Now the middle finger needs to be ready on that third fret a bit earlier, but we get out of plucking the fifth string twice in a row.
2) To avoid double-plucking in the third phrase, I went from G D D to G B D. They sound equally good to me, and to get G D D you can just keep your finger on the third fret.
Tommy can not get enough of this song, so I picked this out on the banjo. My first attempt at building my own tab:
-----0-0-----0-0--|-2-------2-------|---0-0-----0-0---|------------------
---0-------0------|-----3-------3---|-----------------|-----0--------0---
-0-------0--------|-----------------|-----------------|-2-2---0--2-2---0-
------------------|-----------------|-----------------|------------------
------------------|---0-------0-----|-0-------0-------|------------------
G B D D G B D D E G D E G D G D D G D D A A B G A A B G
As a total total total newbie, the wife helped me pick the notes out on the piano. I then shifted them left 3: C –> G, E –> B, because that is what was sounding right on the banjo. And I had picked out my own notes chart using the “Fine Chromatic Tuner” app:
0 1 2 3 4 5 6 (fingers on frets..)
D D# E F F# G
B C C# D D#
G G# A A# B C C#
D D# E F F# G G#
G -----------------
The revelation was when I “discovered” (I’m sure I have been told but it hadn’t been relevant at the time) that the fifth “G” string matches first string, fifth fret. Before that I was picking 2, then 5 on the D string.
With any luck I can pick around on this and mush it into some roll somehow. It would be nice if the second bit mapped to a chord . . .
There seems to be some backlash going on against the religion of “Agile Software Development” and it is best summarized by PragDave, reminding us that the “Agile Manifesto” first places “Individuals and interactions over processes and tools” — there are now a lot of Agile Processes and Tools which you can buy in to . . .
He then summarizes how to work agilely:
What to do:
Find out where you are
Take a small step towards your goal
Adjust your understanding based on what you learned
Repeat
How to do it:
When faced with two or more alternatives that deliver roughly the same value, take the path that makes future change easier.
Sounds like sensible advice. I think I’ll print that out and tape it on my display to help me keep focused.
I had the worst experience at work today: I had to prepare a computer for a new employee. That’s usually a pretty painless procedure, but this user was to be on Windows, and I had to … well, I had to call it quits after making only mediocre progress. This evening I checked online to make sure I’m not insane. A lot of people hate Windows 8, so I enjoyed clicking through a few reviews online, and then I just had to respond to Badger25’s review of Windows 8.1:
I think you are being way too easy on Windows 8.1 here, or at least insulting to the past. This isn’t a huge step backwards to the pre-Windows era: in DOS you could get things done! This is, if anything, a “Great Leap Forward” in which anything that smells of traditional ways of doing things has been purged in order to strengthen the purity of a failed ideology.
As far as boot speed, I was used to Windows XP booting in under five seconds. That was probably the first incarnation of Windows I enjoyed using. I just started setting up a Windows 8 workstation today for a business user and it is the most infuriatingly obtuse Operating System I have ever, in decades, had to deal with. (I am a Unix admin, so I’ve seen things….) This thing does NOT boot fast, or at least it does not reboot fast, because of all the updates which must be slowly applied.
Oddly enough, it seems that these days, the best computer UIs are offered by Linux distros, and they have weird gaps in usability, then Macs, then … I wouldn’t suggest Windows 8 on anyone except possibly those with physical or mental disabilities. Anyone who is used to DOING THINGS with computers is going to feel like they are using the computer with their head wrapped in a hefty bag. The thing could trigger panic attacks.
Monday is another day. I just hope the new employee doesn’t rage quit.
I think the El Camino BRT could be a great project to transform El Camino Real from a ghetto of 1950s strip malls into the sort of place where people would go to enjoy shopping. Maybe. Anyway, the news that a dedicated lane from Santa Clara to Palo Alto could make the bus faster than cars excited me. I’ll try to be at the Sunnyvale meeting this evening, and I also submitted my own enthusiasm to our governments via Transform’s handy link:
I used to commute along El Camino from Mountain View to Palo Alto. I switched to the bus out of environmental concerns. El Camino has the best transit service in the county but it still took 2-3 times longer to take the bus than it would have taken to drive. Now it sounds like you could get BRT running on El Camino FASTER than cars? YES!! If the cars get slowed a bit that’s not such a big deal, especially since any driver going any distance knows that Central Expressway / Alma is a much nicer car trip. Even though I now live 1.5 miles off of El Camino in Sunnyvale, if there were excellent transit services I would be tempted to hop on the 55, walk, or bike to enjoy the transit corridor, especially for trips up to Mountain View or Palo Alto or Stanford Shopping Center. What a pleasure it would be to not have to hassle with parking, traffic, or the Caltrain schedule. If it were sufficiently fast, I would totally use that as a commute option up to Menlo Park.
Also, I’d probably be more inclined to visit Santa Clara.
I have a playbook which installs and configures NRPE. The packages and services are different on Red Hat versus Debian-based systems, but my site configuration is the same. I burnt a fair amount of time trying to figure out how to allow the configuration tasks to notify a single handler. The result looks something like:
# Debian or Ubuntu
- name: Ensure NRPE is installed on Debian or Ubuntu
when: ansible_pkg_mgr == 'apt'
apt: pkg=nagios-nrpe-server state=latest
- name: Set nrpe_handler to nagios-nrpe-server
when: ansible_pkg_mgr == 'apt'
set_fact: nrpe_handler='nagios-nrpe-server'
# RHEL or CentOS
- name: Ensure NRPE is installed on RHEL or CentOS
when: ansible_pkg_mgr == 'yum'
yum: pkg={{item}} state=latest
with_items:
- nagios-nrpe
- nagios-plugins-nrpe
- name: Set nrpe_handler to nrpe
when: ansible_pkg_mgr == 'yum'
set_fact: nrpe_handler='nrpe'
# Common
- name: Ensure NRPE will talk to Nagios Server
lineinfile: dest=/etc/nagios/nrpe.cfg regexp='^allowed_hosts=' line='allowed_hosts=nagios.domain.com'
notify:
- restart nrpe
### A few other common configuration settings ...
Then, over in the handlers file:
# Common
- name: restart nrpe
service: name={{nrpe_handler}} state=restarted
We had company over Wednesday evening. Friends of the family who have cat-sat for us. They brought dim sum. After dinner we sat around chatting. I got a call on my mobile from a 408 number. I took it.
“Are you the owner of Maxwell?”
“I am. Is he causing trouble?”
It was the opposite. I grabbed a cardboard box and hustled down to the corner, where a small crowd had gathered. The woman who had called me said he had been standing in the street, looking the other way, when the car hit him. He died instantly. She removed him from the street and found my number on the tag. We hugged. She was obviously a cat person, who was glad that he had a collar, a bell, and an identification tag.
I brought him home. He rested briefly where his feline companion Maggie took a last opportunity to groom him. The young woman who drove the car and her father came by to express their remorse and see if they could make amends, but there was nothing to be done. The young woman was in tears. She wants to be a veterinarian. The Father remembers dogs who had been lost to cars. We agreed that the Humane Society might receive a donation. We shook hands several times. What a way to meet the neighbors.
Maxwell napping in the front yard in June.
In the back yard, a shallow grave was dug. Maxwell was wrapped in a familiar fabric, and lain to rest. Words were said.
It will take some time to feel his absence and truly mourn his departure. He might have lived a much longer life as a house cat, but he loved the outdoors and was well known in the neighborhood. He lived as he chose and while his end was violent, it was swift and he did not suffer.
Apple ships some nice hardware, but the Mac OS is not my cup of tea. So, I run Ubuntu (kubuntu) within VMWare Fusion as my workstation. It has nice features like sharing the clipboard between host and guest, and the ability to share files to the guest. Yay.
At work, I have a Thunderbolt display, which is a very comfortable screen to work at. When I leave my desk, the VMWare guest transfers to the Retina display on my Mac. That is where the trouble starts. You can have VMWare give it less resolution or full Retina resolution, but in either case, the screen size changes and I have to move my windows around.
The fix?
1) In the guest OS, set the display size to: 2560×1440 (or whatever works for your favorite external screen …)
Now I can use Exposé to drag my VM between the Thunderbolt display and the Mac’s Retina display, and back again, and things are really comfortable.
The only limitation is that since the aspect ratios differ slightly, the Retina display shows my VM environment in a slight letterbox, but it is not all that obvious on a MacBook Pro.
I reported the following to the FBI, to LogMeIn123.com, to Century Link, and to Bing, and now I’ll share the story with you.
Yesterday, May 12, 2014, a relative was having trouble with Netflix. So she went to Bing and did a search for her ISP’s technical support:
Bing leads you to a convenient toll-free number to call for technical support!
She called the number: 844-835-7605 and spoke with a guy who had her go to LogMeIn123.com so he could fix her computer. He opened up something that revealed to her the presence of “foreign IP addresses” and then showed her the Wikipedia page for the Zeus Trojan Horse. He explained that she would need to refresh her IP address and that their Microsoft Certified Network Security whatevers could do it for $350 and they could take a personal check since her computer was infected and they couldn’t do a transaction online.
So, she conferenced me in. I said that she could just reinstall Windows, but he said no, as long as the IP was infected it would need to be refreshed. I said, well, what if we just destroyed the computer. No, no, the IP is infected. “An IP address is a number: how can it get infected?” I then explained that I was a network administrator . . . he said he would check with his manager. That was the last we heard from him.
I advised her that this sounded very very very much like a phishing scam and that she should call the telephone number on the bill from her ISP. She did that and they were very interested in her experience.
I was initially very worried that she had a virus that managed to fool her into calling a different number for her ISP. I followed up the next day, using similar software to VNC into her computer. I checked the browser history and found that the telephone number was right there in Bing for all the world to see. She doesn’t have a computer virus after all! (I’ll take a cloer look tonight . . .)
I submitted a report to the FBI, LogMeIn123.com, Bing, and Century Link. And now I share the story here. Its a phishing scam that doesn’t even require an actual computer virus to work!