No steak today

Today is Wednesday, and we’ve got a routine in the Gordon office of having a bbq (steak, of course) on the balcony at lunchtime. It’s normally the only day I head into a Sun office, because I’m one of these almost-hardcore work from home (wfh) type people who finds it easier to work harder and more efficiently from my home office.Anyway, today we’re on our final approach with the RTI (a term which, surprisingly, is not listed in the glossary) for the project. I’ve just knocked out a collection of changes for a manpage, we’re talking with the gatekeeper in our irc channel…. no time for wandering up to the office and being sociable today.




Jeremy Allison on Programming

Not sure how I stumbled across it (perhaps from Simon Phipps) but I’ve just spent a few minutes reading through Jeremy Allison’s latest on what it takes to be a good programmer. It was a very interesting read, and I totally understand the thoughts in his head when he found out that his office-mate @ Google is around 27. That makes him about the same age as Spoonboy… and reminded me how I felt when I started back at uni, realising that my classmates had been born in the year that Crocodile Dundee was released.I don’t think of myself as being old, but sometimes I do wonder.`Technorati`_ tags: topic:{Technorati}[Jeremy Allison], topic:{Technorati}[Google], topic:{Technorati}[Programming], topic:{Technorati}[Art], topic:{Technorati}[Crocodile Dundee], topic:{Technorati}[Younguns]




What a way to spend an Easter weekend

J and I spent Saturday and Sunday in the Mt Majura forest in the A.C.T. being support crew for Spoonboy at the Australian 24 hour Solo mountainbike championships.It was exhausting. Otoh, the camaraderie between the support crews was a beautiful thing and both J and I really appreciated it.Of course, we didn’t go and actually ride anywhere, but we still wound up with sore muscles from static positioning and dodgy posture throughout the event.Fortunately for us we didn’t have to drive back to Sin City (Sydney) yesterday, but were able to crash at J and Spoonboy’s sister’s place in Gundaroo a mere 40km away. Boy oh boy were we glad of that! Despite being able to grab 40 minutes sleep (0400 to 0440) yesterday morning, we realised that we should try to stay awake for a few more hours until after dinner to try to get the circadian rhythms and body clock back on track. So after a beautiful meal at Crowe’s Wine Bar (stumbling distance, fortunately) we crashed out. 8:30pm and I’m not sure whether we were awake when our heads hit the pillow, but we slept through until birdsong woke us this morning… and then crashed out again .. image:: /images/smilies/icon_smile.gif

System Message: ERROR/3 (<string>, line 4)

Unexpected indentation.

alt

:-)

Of course, having a few glasses of a major sponsor’s product last night didn’t hurt our sleeping either.Managed to get almost 12 hours sleep, had a leisurely breakfast and some quality time with our nephew and then headed back north.Not sure when the results from the event will be posted on their website (our rider came 19th overall out of 61 or 62) but when the results are up I’ll point to them along with my photo gallery from the event. I managed to take about 700 photos…. surely some of them have turned out ok

System Message: WARNING/2 (<string>, line 7)

Block quote ends without a blank line; unexpected unindent.

;-)

Technorati tags: topic:{Technorati}[Mountain bike], topic:{Technorati}[MTBA], topic:{Technorati}[Solo 24hr racing], topic:{Technorati}[Australian Championships 2007], topic:{Technorati}[Spoonboy]




6539777 Cannot disable mpxio on x86/x64 platforms without serious pain

Yesterday I logged `6539777 Cannot disable mpxio on x86/x64 platforms without serious pain`_, which is the real root cause of `6539612 Run “stmsboot -d” on x86 platform will cause system boot failure`_.

Unfortunately, whatever process is used to push bug data from internal (bugster) to external (`b.o.o.`_) completely screwed up the workaround entry. This makes me mad not just because I spent a heap of time writing it carefully, but because the information that it presents to you via `b.o.o.`_ is useless!

AAAARRRRRGHHHHHH

So herewith is the workaround field, unadulterated and (hopefully!) useful if you ever findyourself in this situation:

ALWAYS exclude your root fibre-channel controller from the mpxio-disabled list. A modification to the /kernel/drv/fp.conf file is required. Determine your bootpath``and then make the appropriate modifications:``# /usr/sbin/eeprom boot-path bootpath=**/pci@1d,0/pci1022,7450@4/pci1077,132@1/fp@0,0/sd@w266000c0ffe92245,8:a**!! we want the piece between /pci@1d,0 and /fp@0,0``# cat >> /kernel/drv/fp.conf name="fp" parent="/pci@1d,0/pci1022,7450@4/pci1077,132@1" port=0 mpxio-disable="no"; ^D``!! hit control-D here``# /sbin/bootadm update-archive``If you find yourself in the situation where your system has not come back up and is stuck trying to``fsck /``, then login as root when prompted, run .. code-block:

System Message: WARNING/2 (<string>, line 17); backlink

Inline literal start-string without end-string.

# mount | grep "/ on"
/ on /pci@1d,0/pci1022,7450@4/pci1077,132@1/fp@0,0/sd@w266000c0ffe92245,8:a read/write/setuid/devices/dev=780240
# mount -o remount,rw,logging /devices/pci@1d,0/pci1022,7450@4/pci1077,132@1/fp@0,0/disk@w266000c0ffe92245,8:a /

Note the change from "sd" to "disk" - this is due to the way that fibre-channel luns are presented by the device tree on the x86 architecture.

Now I need to log a bug against `b.o.o.`_ itself. grrrrrrrr .. _6539777 Cannot disable mpxio on x86/x64 platforms without serious pain: http://bugs.opensolaris.org/view_bug.do?bug_id=6539777 .. _b.o.o.: http://bugs.opensolaris.org .. _6539612 Run “stmsboot -d” on x86 platform will cause system boot failure: http://bugs.opensolaris.org/view_bug.do?bug_id=6539612

Docutils System Messages

System Message: ERROR/3 (<string>, line 2); backlink

Unknown target name: "6539777 cannot disable mpxio on x86/x64 platforms without serious pain".

System Message: ERROR/3 (<string>, line 2); backlink

Unknown target name: "6539612 run “stmsboot -d” on x86 platform will cause system boot failure".

System Message: ERROR/3 (<string>, line 6); backlink

Unknown target name: "b.o.o.".

System Message: ERROR/3 (<string>, line 6); backlink

Unknown target name: "b.o.o.".

System Message: ERROR/3 (<string>, line 35); backlink

Unknown target name: "b.o.o.".




Congratulations to the new OpenSolaris Governing Board members

The polls have now closed in the OGB election for 2007, and I would like congratulate our new OGB Overlords .. image:: /images/smilies/icon_smile.gif

System Message: ERROR/3 (<string>, line 4)

Unexpected indentation.

alt

:-)

.

System Message: WARNING/2 (<string>, line 12)

Block quote ends without a blank line; unexpected unindent.

James D. CarlsonAlan CoopersmithCasper DikGlynn FosterStephen LauRich Teer and Keith M. WesolowskiI’d like to be amongst that group – perhaps next year?

Thankyou to the previous members of the OGB, too – we know you’ve put in a lot of effort to make things work.

Good luck guys, you’ve got a lot of work to do and don’t forget that there are plenty of people who are willing to help if you ask.

Of just as much if not more importance was the question about ratifying our Constitution, which passed.

Technorati tags: topic:{Technorati}[OpenSolaris], topic:{Technorati}[OpenSolaris Governing Board], topic:{Technorati}[OGB]




I’ve now got a 64bit nVidia driver with Xorg on snv_60

When I upgraded to snv_60 I noticed that the X server wouldn’t start. This is due to there being no 64bit nVidia driver integrated into that build. Alan Coopersmith is working with nVidia to resolve it, but in the meantime somebody on #opensolaris pointed me to`http://www.nvidia.com/object/solaris_display_1.0-9755.html`_ which has the 64bit driver ready for use.The workaround for running Xorg in 32bit mode on is

# svccfg -s x11-server setprop options/server=/usr/X11/bin/i386/Xorg

And if you want to run Xorg in 64bit mode, use

# svccfg -s x11-server setprop options/server=/usr/X11/bin/amd64/Xorg

Don’t forget to re-run

# /sbin/bootadm update-archive ; reboot

after you’ve installed the new package. Once you’ve rebooted and logged in again, you can make use of the nVidia control panel:

nVidia control panel



An in-your-face lesson about power draw

On Tuesday I got my act together, wandered down to Dick Smith Electronics at North Sydney and purchased the internal drive power-splitter cable that I’d been meaning to get for weeks. This was all part of the grand plan to install another 2 SATA disks inside my Ultra20-M2… along with the 4 36Gb scsi disks I’ve got attached in a multipack.Great idea, but with insufficient regard for the limitations of my hardware.First off, the sata cables that I used were “standard” pc cables, so their plug length was about 2x the plug length that Sun uses when building these boxes. Not a problem on the motherboard end, but definitely a problem when you want to close the case if you don’t rotate your additional disks by 90 degrees.Secondly, current and power draw. The Ultra20 and Ultra20-M2 come with a 400W psu, which as far as I’m aware is plentiful enough to run the box with 2 disks and each PCI and PCI-Express slot filled, but not if you want to add extra disks. That’s what I hadn’t bothered to think about. My disk0 and disk1 are 320Gb Seagate ST3320620AS SATA disks which run just fine. The two disks I added were a a 200Gb Seagate ST3200822AS and a 300Gb Maxtor 6V300F0.Earlier this evening I noticed that a lot of processes were starting to hang – firefox, thunderbird, gaim, xchat … apache, postgres, tomcat …. and shortly thereafter everything decided to not respond. I was able to run a reboot -dq though, so I got some data.On getting back to the grub screen, I received the worrying message that the system thought I had no slice 0 on my boot disk. Eeeeek! Power-off and power-on … boot up …. login …. hangHard hang. No chance of using F1+A to get out of this. Power-off it was.While I’d been waiting to see whether the hang was really hard I did a bit of thinking. What was the last change I made to the system? [added new disks internally] Was it disk-related? [yes] Am I an idiot? [quite possibly]Power-off, unscrew the case, remove the extra disks and re-set the original drive power cable, power-on, boot up with no problems whatsoever.I figure I’m now on the lookout for an appropriate hba with external connections along with an external enclosure to house the drives. Could be a while. In the meantime I think I’ve learnt my lesson.




Perhaps I jumped to the wrong conclusion

Yesterday I wrote an entry `wherein I blamed power draw`_ for causing the hard hangs I’ve experienced since I LU’d to snv_60. After a bit more downtime overnight and serious worrying about the safety of my data (photos…. I can’t lose them!) I’ve done a bit more analysis and come up with this explanation: it’s a dodgy disk.

Not my preferred explanation, but one which fits the evidence better than the power draw theory.

After I removed the two extra disks, I still had the problem. Ergo, the power draw theory is unlikely to be the cause.

I tried copying files off my `camera’s`_ CF card into my photo storage area three times. Each time I did, the cp process and then every other process making use of somewhere under /scratch would hang, with a stack trace like this:

> fffffffed2910340::print proc_t p_tlist|::findstack -v
stack pointer for thread fffffffed0ca5280: ffffff0005673b60
[ ffffff0005673b60 _resume_from_idle+0xf8() ]
ffffff0005673ba0 swtch+0x17f()
ffffff0005673bd0 cv_wait+0x61(fffffffec7dd2b16, fffffffec7dd2ad0)
ffffff0005673c20 txg_wait_open+0x7f(fffffffec7dd2a00, 2a3968)
ffffff0005673c60 dmu_tx_wait+0x92(ffffffff23b55800)
ffffff0005673d60 zfs_write+0x2de(fffffffef1bc9680, ffffff0005673e20, 0, fffffffee7401e88, 0)
ffffff0005673dd0 fop_write+0x3f(fffffffef1bc9680, ffffff0005673e20, 0, fffffffee7401e88, 0)
ffffff0005673e90 write+0x2ad(4, fe400000, 800000)
ffffff0005673ec0 write32+0x1e(4, fe400000, 800000)
ffffff0005673f10 sys_syscall32+0x101()

After managing to pull my head out and think about this for a moment, I realised that the problem had not occurred before I LU’d to snv_60. When I did the LU, I activated my alternate BE (on my second disk) and made it the logical lefthand side of the mirror. Whenever I hit a specific part of the filesystem in multiuser/64bit mode, all IOs would hang.

A failsafe boot to 32bit followed by a zpool scrub didn’t find anything wrong with the pool or its filesystems, but when I rebooted again I saw the dreaded GRUB error message that it couldn’t find my root partition. Another failsafe boot and “format…label” later and once more, a hard hang while doing heavy IO to and from the pool.

I removed the disk, attached the jumper to the end of it which forces 1.5Gbps (SATA-I) speeds, re-inserted and rebooted. I’ve now been up and running for nearly 45 minutes and doing some fairly heavy IO …. looks ok for the moment.

I’m more confident that I’ve nailed the source of the problem, but we’ll just have to wait and see.

Update: Another possibility, given that I’ve stumbled across 6536905 biosdev 1.4,1.5 changes render SATA disks under old framework invisible to LU, is that there’s a bios bug which prevents the second onboard SATA channel from operating at full SATA-II speeds. Not exactly sure how I’m going to investigate this idea though. .. _camera’s: http://www.jmcpdotcom.com/roller/jmcp/entry/why_are_black_eos400d_bodies .. _wherein I blamed power draw: http://www.jmcpdotcom.com/roller/jmcp/entry/20070325

Docutils System Messages

System Message: ERROR/3 (<string>, line 2); backlink

Unknown target name: "wherein i blamed power draw".

System Message: ERROR/3 (<string>, line 14); backlink

Unknown target name: "camera’s".




Weirdness in Sovietistan

While browsing my blog’s referrers list today, I saw hits from Planet SLUG which features James Purser who recently interviewed me on an OpenSolaris Round Table.That lead me here which has photos of Soviet-era bus shelters. Weird and wonderful all at the same time. For a nation that was so top-down driven and controlled, the variety of designs and styles is amazing.

Technorati tags: topic:{Technorati}[Soviet Union], topic:{Technorati}[Bus shelter], topic:{Technorati}[SLUG], topic:{Technorati}[Open Source On The Air], topic:{Technorati}[k-sit.com]




There’s a bug in my Ultra20-M2 bios :(

Today I spent a bit of time making a live-upgrade from snv_57 to snv_60 work. I’m not sure that I did things quite the right way, but …. I had to manually pkgadd the PatchPro packages to my alternate BE (SUNWppror SUNWpprou SUNWppro-plugin-sunos-base) before LU would allow me to continue — this is not what I wanted, because I don’t use PatchPro at all. I deliberately uninstalled it as soon as I possibly could. The alternate BE seemed to still have StarOffice8 installed, so of course LU upgraded that too. Again, I don’t want StarOffice8, I want OpenOffice.org 2.x instead. My non-global zones (which have their zonepaths on zfs) were copied to the new BE’s / so on reboot I had to move the contents out of the way before zfs mount -a would succeed. And, most annoying of all, /sbin/biosdev stumbled across a bug (6536905 biosdev 1.4,1.5 changes render SATA disks under old framework invisible to LU — not on b.o.o and will most probably have its synopsis changed) which meant that /sbin/biosdev couldn’t tell LU what a valid bios-registered boot device was. After running /sbin/biosdev -d and having a chat with the RE for the bug, I came up with the following hackaround – replace /sbin/biosdev with a shell script which outputs the correct information. In my case, with a dual-channel glm card installed and 6 scsi disks attached to it along with 4 SATA disks hanging off the motherboard, I need this:

#!/bin/sh
echo "0x80 /pci@0,0/pci-ide@5/ide@0/cmdk@0,0"
echo "0x81 /pci@0,0/pci10de,370@6/pci1000,1000@9/sd@2,0"
echo "0x82 /pci@0,0/pci10de,370@6/pci1000,1000@9/sd@4,0"
echo "0x83 /pci@0,0/pci10de,370@6/pci1000,1000@9/sd@5,0"
echo "0x84 /pci@0,0/pci10de,370@6/pci1000,1000@9/sd@1,0"
echo "0x85 /pci@0,0/pci-ide@5/ide@1/cmdk@0,0"
echo "0x86 /pci@0,0/pci-ide@5,1/ide@0/cmdk@0,0"
echo "0x87 /pci@0,0/pci-ide@5,1/ide@1/cmdk@0,0"
exit 0

Technorati topic:{Technorati}[Solaris] tags: topic:{Technorati}[OpenSolaris] topic:{Technorati}[LiveUpgrade] topic:{Technorati}[biosdev] topic:{Technorati}[bios bug] topic:{Technorati}[Sun Ultra20 M2]