dannyman.toldme.com

This page features every post I write, and is dedicated to Andrew Ho.

June 24, 2015
About Me, Good Reads, Technical

What’s the Deal With One on Ones?

Link: https://dannyman.toldme.com/2015/06/24/one-on-ones/

Early in my career, I didn’t interact much with management. For the past decade, the companies I have worked at had regular one-on-one meetings with my immediate manager. At the end of my tenure at Cisco, thanks to a growing rapport and adjacent cubicles, I communicated with my manager several times a day, on all manner of topics.

One of the nagging questions I’ve never really asked myself is: what is the point of a one-on-one? I never really looked at it beyond being a thing managers are told to do, a minor tax on my time. At Cisco, I found value in harvesting bits of gossip as to what was going in the levels of management between me and the CEO.

Ben Horowitz has a good piece on his blog. In his view, the one-on-one is an important end point of an effective communication architecture within the company. The employee should drive the agenda, perhaps to the point of providing a written agenda ahead of time. “This is what is on my mind,” giving management an opportunity to listen, refine strategies, clarify expectations, un-block, and provide insight up the management chain. He suggests some questions to help get introverted employees talking.

I am not a manager, but as an employee, the take-away is the need to conjure an agenda: what is working? What is not working? How can we make not merely the technology, but the way we work as a team and a company, more effective?

Feedback Welcome

April 23, 2015
Linux, Technical

Live Migrate VM with Local Storage: virsh migrate

Link: https://dannyman.toldme.com/2015/04/23/virsh-migrate-local-storage-live/

In my previous post I lamented that it seemed I could not migrate a VM between virtual hosts without shared storage. This is incorrect. I have just completed a test run based on an old mailing list message:

ServerA> virsh -c qemu:///system list
 Id    Name                           State
----------------------------------------------------
 5     testvm0                       running

ServerA> virsh -c qemu+ssh://ServerB list # Test connection to ServerB
 Id    Name                           State
----------------------------------------------------

The real trick is to work around a bug whereby the target disk needs to be created. So, on ServerB, I created an empty 8G disk image:

ServerB> sudo dd if=/dev/zero of=/var/lib/libvirt/images/testvm0.img bs=1 count=0 seek=8G

Or, if you’re into backticks and remote pipes:

ServerB> sudo dd if=/dev/zero of=/var/lib/libvirt/images/testvm0.img bs=1 count=0 seek=`ssh ServerA ls -l /var/lib/libvirt/images/testvm0.img | awk '{print $5}'`

Now, we can cook with propane:

ServerA> virsh -c qemu:///system migrate --live --copy-storage-all --persistent --undefinesource testvm0 qemu+ssh://ServerB/system

This will take some time. I used virt-manager to attach to testvm0’s console and ran a ping test. At the end of the procedure, virt-manager lost console. I reconnected via ServerB and found that the VM hadn’t missed a ping.

And now:

ServerA> virsh -c qemu+ssh://ServerB/system list
 Id    Name                           State
----------------------------------------------------
 4     testvm0                        running

Admittedly, the need to manually create the target disk is a little more janky than I’d like, but this is definitely a nice proof of concept and another nail in the coffin of NAS being a hard dependency for VM infrastructure. Any day you can kiss expensive, proprietary SPOF dependencies goodbye is a good day.

Feedback Welcome

April 22, 2015
Linux, Technical

Easy VM Management on Ubuntu!

Link: https://dannyman.toldme.com/2015/04/22/virt-manager-hell-yeah/

I may have just had a revelation.

I inherited a bunch of ProxMox. It is a rather nice, freemium (nagware) front-end to virtualization in Linux. One of my frustrations is that the local NAS is pretty weak, so we mostly run VMs on local disk. That compounds with another frustration where ProxMox doesn’t let you build local RAID on the VM hosts. That is especially sad because it is based on Debian and at least with Ubuntu, building software RAID at boot is really easy. If only I could easily manage my VMs on Ubuntu . . .

Well, turns out, we just shake the box:

On one or more VM hosts, check if your kernel is ready:

sudo apt-get install cpu-checker
kvm-ok

Then, install KVM and libvirt: (and give your user access..)

sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils
sudo adduser `id -un` libvirtd

Log back in, verify installation:

virsh -c qemu:///system list

And then, on your local Ubuntu workstation: (you are a SysAdmin, right?)

sudo apt-get install virt-manager

Then, upon running virt-manager, you can connect to the remote host(s) via SSH, and, whee! Full console access! So far the only kink I have had to iron is that for guest PXE boot you need to switch Source device to vtap. The system also supports live migration but that looks like it depends on a shared network filesystem. More to explore.

UPDATE: You CAN live migrate between local filesystems!

It will get more interesting to look at how hard it is to migrate VMs from ProxMox to KVM+libvirt.

Feedback Welcome

April 8, 2015
Technical, Technology

AWS: Glacier Restores Can Get Pricey

Link: https://dannyman.toldme.com/2015/04/08/aws-melt-glaciers-slowly/

I have been working with AWS to automate disaster recovery. Sync data up to S3 buckets (or, sometimes, EBS) and then write Ansible scripts to deploy a bunch of EC2 instances, restore the data, configure the systems.

Restoring data from Glacier is kind of a pain to automate. You have to iterate over the items in a bucket and issue restore requests for each item. But it gets more exciting than that on the billing end: Glacier restores can be crazy expensive!

A few things I learned this week:

1) Amazon Glacier restore fees are based on how quickly you want to restore your data. You can restore up to 5% of your total S3 storage on a given day for free. If you restore more than that they start to charge you and at the end of the month you’re confused by the $,$$$ bill.

2) Amazon Glacier will also charge you money if you delete data that hasn’t been in there for at least three months. If you Glacier something, you will pay to store it for at least three months. So, Glacier your archive data, but for something like a rolling backup, no Glacier.

3) When you get a $,$$$ bill one month because you were naive, file a support request and they can get you some money refunded.

Feedback Welcome

March 27, 2015
Technical

Balancing 3-Phase Power

Link: https://dannyman.toldme.com/2015/03/27/balancing-3-phase-power/

I have some racks in a data center that were designed to use 3-phase PDUs. This allows for greater density, since a 3-phase circuit delivers more watts. The way three-phase works is that the PDU breaks the circuit into three branches, and each branch is split across two of the legs. Something like:

Branch A: leg X->Y
Branch B: leg Y->Z
Branch C: leg Z->X

The load needs to be balanced as evenly as possible across the three branches. When the PDU tells me the power draw on Branch A is say, 3A, I am really sure if that is because of equipment on Branch A or … Branch C? This only becomes a problem when one branch reports chronic overloading and I want to balance the loads.

I chatted with a friend who has a PhD in this stuff. He said that if my servers don’t all have similar characteristics, then the math says I want to randomly distribute the load. That is hard to do on an existing rack.

I came up with an alternative approach. Most of my servers fall into clusters of similar hardware and what I would call a “load profile.” I had a rack powered by two unbalanced 3-phase circuits. I counted through the rack’s server inventory and classified each host into a series of cohorts based on their hardware and load profile. Two three-phase circuits gives me six branches to work with, so I divided the total by six to come up with a “branch quota.” Something like:

<= 2 hadoop node, hardware type A
<= 2 hadoop node, hardware type B
   1 spark node, hardware type A
   1 spark node, hardware type C
   1 misc node, hardware type C

I then sat down with a pad and paper, one circuit on either side of the sheet, and wrote down what servers I had, and where the circuit was relative to quota. So, one circuit might have started off as:

Circuit A, Branch XY:
INVENTORY
hadoopA-12
hadoopA-13
hadoopA-14
sparkA-5
miscC-5

QUOTA
hadoopA: RM 1
hadoopB: ADD 2
sparkA:  --
sparkC:  ADD 1
misc:    --

I then moved servers around, updating each branch circuit with pencil and eraser as I went, like a diligent Dungeon Master. (And also, very important: the PDU receptacle configuration.)

The end result:

pdu4a

pdu4b

I started moving servers around Thu 15:00, and was done about 16:30, which is also when the hadoop cluster went idle. It kicked back up again at 17:00, and started spinning down around midnight.

What is important is to keep sustained load on the branches under the straight green line, which represents 80% of circuit capacity. You can see that on the left, especially the second circuit had two branches running “hot” and after the re-balancing the branch loads fly closer together, and top off at the green line.

Feedback Welcome

March 23, 2015
Technical

Nagios: Delay WARNING Notifications

Link: https://dannyman.toldme.com/2015/03/23/nagios-delay-warning-notifications/

I have a Nagios check to monitor the power being drawn through our PDUs. While setting all this up, I had to figure out alert threshholds. The convention for disk capacity is warn at 80%, critical at 90%. I also had this gem to work with:

National Electric Code requires that the continuous current drawn from a branch circuit not exceed 80% of the circuit’s maximum rating. “Continuous current” is any load sustained continuously for at least 3 hours.

(Thanks to mike Pennington, via http://serverfault.com/a/413307/72839)

So, I went with 80% warning, 90% critical.

Lately I have been getting a lot of warning notifications about circuits exceeding 80%. Ah, but the NEC says that is only a problem if they are at 80% for more than three hours. So, I dig through Nagios documentation and split my check out into two services:

define service{ # PDU load at 90% of circuit rating
    use                     generic-service
    hostgroup_name          pdus
    service_description     Power Load Critical
    notification_options    c,u,r
    check_command           check_sentry
    contact_groups          admins
}

define service{ # PDU sustained load at 80% of circuit rating for 3 hours
    use                       generic-service
    hostgroup_name            pdus
    service_description       Power Load High
    notification_options      w,r
    first_notification_delay  180
    check_command             check_sentry
    contact_groups            admins
}

The first part limits regular notifications to critical alerts. In the second case, the first_notification_delay should cover the “don’t bug me unless it has been happening for three hours” caveat and I set that service to only notify on warnings and recovery.

Feedback Welcome

March 23, 2015
Banjo

“Father Finger” Banjo Tab v2

Link: https://dannyman.toldme.com/2015/03/23/father-finger-banjo-tab-v2/

------0-------0----|-2-------2-------|------0-------0----|-----------------|
----0---3---0---3--|-----3-------3---|----0-------0------|---------0-------|
--0-------0--------|-----------------|-------------------|-2---2-------0---|
-------------------|-----------------|-------------------|-----------------|
-------------------|---0-------0-----|--0-------0--------|-----------------|
  G B D D G B D D    E G D   E G D      G B D   G B D      A   A   B   G

I have been thinking about whether and where there are “drone notes” to work a roll around. I started with cleaning up the timing. (My knowledge of this song consists mainly of YouTube videos from India.) The bars go:

father finger x2
where are you? x2
here i am! x2
how do you do? x1

Each note here corresponds to a syllable, with quarter-note spacing on the last phrase. At least at the speed I play at this time, there is no roll to be added.

I did make two changes:
1) The fourth and eighth Ds in the first phrase moved from fifth string to fourth. Now the middle finger needs to be ready on that third fret a bit earlier, but we get out of plucking the fifth string twice in a row.
2) To avoid double-plucking in the third phrase, I went from G D D to G B D. They sound equally good to me, and to get G D D you can just keep your finger on the third fret.

Feedback Welcome

March 11, 2015
Banjo

“Father Finger” Banjo Tab

Link: https://dannyman.toldme.com/2015/03/11/father-finger-banjo-tab/

Tommy can not get enough of this song, so I picked this out on the banjo. My first attempt at building my own tab:

-----0-0-----0-0--|-2-------2-------|---0-0-----0-0---|------------------
---0-------0------|-----3-------3---|-----------------|-----0--------0---
-0-------0--------|-----------------|-----------------|-2-2---0--2-2---0-
------------------|-----------------|-----------------|------------------
------------------|---0-------0-----|-0-------0-------|------------------
 G B D D G B D D    E G D   E G D     G D D   G D D     A A B G  A A B G

As a total total total newbie, the wife helped me pick the notes out on the piano. I then shifted them left 3: C –> G, E –> B, because that is what was sounding right on the banjo. And I had picked out my own notes chart using the “Fine Chromatic Tuner” app:

0  1  2  3  4  5  6   (fingers on frets..)
D  D# E  F  F# G
B  C  C# D  D#
G  G# A  A# B  C  C#
D  D# E  F  F# G  G#
G  -----------------

The revelation was when I “discovered” (I’m sure I have been told but it hadn’t been relevant at the time) that the fifth “G” string matches first string, fifth fret. Before that I was picking 2, then 5 on the D string.

With any luck I can pick around on this and mush it into some roll somehow. It would be nice if the second bit mapped to a chord . . .

Good times!

Feedback Welcome

March 6, 2015
JIRA, Religion, Technical, Technology

HOWTO: Develop Software Agilely

Link: https://dannyman.toldme.com/2015/03/06/howto-program-agile/

There seems to be some backlash going on against the religion of “Agile Software Development” and it is best summarized by PragDave, reminding us that the “Agile Manifesto” first places “Individuals and interactions over processes and tools” — there are now a lot of Agile Processes and Tools which you can buy in to . . .

He then summarizes how to work agilely:

What to do:

Find out where you are
Take a small step towards your goal
Adjust your understanding based on what you learned
Repeat

How to do it:

When faced with two or more alternatives that deliver roughly the same value, take the path that makes future change easier.

Sounds like sensible advice. I think I’ll print that out and tape it on my display to help me keep focused.

Feedback Welcome

November 22, 2014
Sundry, Technical, Technology, Testimonials

Windows 8 Is a Horrible Horrible Operating System

Link: https://dannyman.toldme.com/2014/11/22/windows-8-wtf-microsoft/

I had the worst experience at work today: I had to prepare a computer for a new employee. That’s usually a pretty painless procedure, but this user was to be on Windows, and I had to … well, I had to call it quits after making only mediocre progress. This evening I checked online to make sure I’m not insane. A lot of people hate Windows 8, so I enjoyed clicking through a few reviews online, and then I just had to respond to Badger25’s review of Windows 8.1:

I think you are being way too easy on Windows 8.1 here, or at least insulting to the past. This isn’t a huge step backwards to the pre-Windows era: in DOS you could get things done! This is, if anything, a “Great Leap Forward” in which anything that smells of traditional ways of doing things has been purged in order to strengthen the purity of a failed ideology.

As far as boot speed, I was used to Windows XP booting in under five seconds. That was probably the first incarnation of Windows I enjoyed using. I just started setting up a Windows 8 workstation today for a business user and it is the most infuriatingly obtuse Operating System I have ever, in decades, had to deal with. (I am a Unix admin, so I’ve seen things….) This thing does NOT boot fast, or at least it does not reboot fast, because of all the updates which must be slowly applied.

Oddly enough, it seems that these days, the best computer UIs are offered by Linux distros, and they have weird gaps in usability, then Macs, then … I wouldn’t suggest Windows 8 on anyone except possibly those with physical or mental disabilities. Anyone who is used to DOING THINGS with computers is going to feel like they are using the computer with their head wrapped in a hefty bag. The thing could trigger panic attacks.

Monday is another day. I just hope the new employee doesn’t rage quit.

Feedback Welcome

November 11, 2014
About Me, Letters to The Man, News and Reaction, Testimonials

El Camino BRT Could be Faster Than Driving

Link: https://dannyman.toldme.com/2014/11/11/el-camino-brt-could-be-faster-than-driving/

The Friends of Caltrain sent me e-mail touting progress on public transportation and density along the Peninsula, with provocative news that for the first time in its history, Santa Clara could build a transit service that is faster than driving.

I think the El Camino BRT could be a great project to transform El Camino Real from a ghetto of 1950s strip malls into the sort of place where people would go to enjoy shopping. Maybe. Anyway, the news that a dedicated lane from Santa Clara to Palo Alto could make the bus faster than cars excited me. I’ll try to be at the Sunnyvale meeting this evening, and I also submitted my own enthusiasm to our governments via Transform’s handy link:

I used to commute along El Camino from Mountain View to Palo Alto. I switched to the bus out of environmental concerns. El Camino has the best transit service in the county but it still took 2-3 times longer to take the bus than it would have taken to drive. Now it sounds like you could get BRT running on El Camino FASTER than cars? YES!! If the cars get slowed a bit that’s not such a big deal, especially since any driver going any distance knows that Central Expressway / Alma is a much nicer car trip. Even though I now live 1.5 miles off of El Camino in Sunnyvale, if there were excellent transit services I would be tempted to hop on the 55, walk, or bike to enjoy the transit corridor, especially for trips up to Mountain View or Palo Alto or Stanford Shopping Center. What a pleasure it would be to not have to hassle with parking, traffic, or the Caltrain schedule. If it were sufficiently fast, I would totally use that as a commute option up to Menlo Park.

Also, I’d probably be more inclined to visit Santa Clara.

Thanks,
-danny

2 Comments

October 7, 2014
Ansible, Technical

Ansible: Set Conditional Handler

Link: https://dannyman.toldme.com/2014/10/07/ansible-set-conditional-handler/

I have a playbook which installs and configures NRPE. The packages and services are different on Red Hat versus Debian-based systems, but my site configuration is the same. I burnt a fair amount of time trying to figure out how to allow the configuration tasks to notify a single handler. The result looks something like:

# Debian or Ubuntu
- name: Ensure NRPE is installed on Debian or Ubuntu
  when: ansible_pkg_mgr == 'apt'
  apt: pkg=nagios-nrpe-server state=latest

- name: Set nrpe_handler to nagios-nrpe-server
  when: ansible_pkg_mgr == 'apt'
  set_fact: nrpe_handler='nagios-nrpe-server'

# RHEL or CentOS
- name: Ensure NRPE is installed on RHEL or CentOS
  when: ansible_pkg_mgr == 'yum'
  yum: pkg={{item}} state=latest
  with_items:
    - nagios-nrpe
    - nagios-plugins-nrpe

- name: Set nrpe_handler to nrpe
  when: ansible_pkg_mgr == 'yum'
  set_fact: nrpe_handler='nrpe'

# Common
- name: Ensure NRPE will talk to Nagios Server
  lineinfile: dest=/etc/nagios/nrpe.cfg regexp='^allowed_hosts=' line='allowed_hosts=nagios.domain.com'
  notify:
    - restart nrpe

### A few other common configuration settings ...

Then, over in the handlers file:

# Common
- name: restart nrpe
  service: name={{nrpe_handler}} state=restarted

The trick boiled down to using the set_fact module.

Feedback Welcome

« Newer Stuff . . . Older Stuff »
Site Archive