[pca] Patch 142901-05: installation woes
Scott Severtson
scott.severtson at digitalmeasures.com
Fri Apr 23 21:18:55 CEST 2010
All,
Finally, we have a resolution to this issue from Sun/Oracle:
---
There are a number of issues all combining to create the problem you see.
1. Issue #6850329 causes 139556-08 to fail to install; the failure leaves
behind files in /usr/lib/libc, suffixed by the patchadd PID:
---
# ls -ltr /usr/lib/libc/libc_hwcap2.so.1.*
-rwxr-xr-x 1 root bin 1411388 Oct 3 2008
/usr/lib/libc/libc_hwcap2.so.1.167626
---
2. After applying 141445-09 (10u8 Kernel Upgrade), we have the following:
---
# mount -p | grep libc
/usr/lib/libc/libc_hwcap2.so.1.167626 - /lib/libc.so.1 lofs - no
---
Now we have the 10u7 hardware optimised libc mounted onto /lib/libc.so.1.
This is very wrong.
The mount is determined at boot time by "moe":
---
# */usr/bin/moe -32 '/usr/lib/libc/*$HWCAP'
/usr/lib/libc/libc_hwcap2.so.1.167626
---
"moe" is confused by the libc_hwcap2.so.1.$$ files, and is mounting the
failed 10u7 patch files.
3. Installing 142901-05 delivers a new hardware optimised libc; we are
fatally wounded. The rebooted machine now has 10u7 libc, and the kernel from
post 10u8. At that stage we cannot boot at all, and the provided bootadm
core verifies this as it shows that symbols are corrupted.
I can reproduce this at will on AMD gear. In your case, you Live Upgraded
from 10u6 to 10u8, but this still left failed libc_hwcap2.so.1.$$ in the new
Boot Environment. I reproduced this by starting with 10u6, incurring Issue
#6850329, adding the fix to #6850329, adding the rest of the Kernel Upgrades
till I got to 142901-05, then the reboot -- -r failed.
The issue with "moe" choosing the wrong hard ware optimised libc is tracked
via 6748925.
If there are more machines involved:
1. Check what kernel is installed (i.e. "uname -a" after the system is
rebooted will suffice). If the installed kernel is 141445-09, continue.
2. Check that /usr/lib/libc/libc_hwcap2.so.1 matches
141445-09/SUNWcsl/reloc/usr/lib/libc/libc_hwcap2.so.1 (size and checksum);
if they match, continue.
3. Run "mount -p | grep libc"; if it shows that libc_hwcap2.so.1.$$ is
mounted, continue.
4. Unmount /lib/libc.so.1
5. Remove the /usr/lib/libc.libc_hwcap2.so.1.$$ files (making sure not to
remove /usr/lib/libc/libc_hwcap2.so.1)
6. Remount /usr/lib/libc/libc_hwcap2.so.1 onto /lib/libc.so.1, using the
same options as "mount -p" previously showed.
---
We were unable to remount /lib/libc.so.1 as recommended, as it was in use by
a number of processes. Instead, we simply moved the libc_hwcap2.so.1.$$ into
a temporary directory and rebooted; "moe" was then able to find the correct
file version.
FYI: Sun/Oracle/Enda O'Connor provided the solution one month ago, we did
not have time to revisit the issue until now. We've now applied this fix to
multiple machines, and are rolling this into our upgrade procedure for the
rest of our systems.
Thanks to everyone on the PCA mailing list for your help and advice - I
doubt we would have gotten a successful resolution without your assistance.
--Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.univie.ac.at/mailman/private/pca/attachments/20100423/a7fbec24/attachment.html
More information about the pca
mailing list