Capturing core files in Solaris is pretty straightforward – even more so if you’ve used JASS to secure the system. By default JASS will give you a nice /etc/coreadm.conf file:
COREADM_GLOB_PATTERN=/var/core/core_%n_%f_%u_%g_%t_%p
COREADM_GLOB_CONTENT=default
COREADM_INIT_PATTERN=core
COREADM_INIT_CONTENT=default
COREADM_GLOB_ENABLED=yes
COREADM_PROC_ENABLED=no
COREADM_GLOB_SETID_ENABLED=yes
COREADM_PROC_SETID_ENABLED=no
COREADM_GLOB_LOG_ENABLED=yes
This ensures that we keep all our core files in a sensible place, and that they have enough information in the filename to identify where they came from.
With some visualisation applications required Linux – more specifically Red Hat Enterprise Linux (RHEL), you’ll find the handy coreadm tool missing. Core file management is instead configured in the kernel configuration file, /etc/sysctl.conf
We’ve got three main challenges in RHEL:
- enable core dumps from setuid processes
- remove file size limits for core dumps
- stick them all in a sensible place, and give the core files sensible names
To accomplish all of this, we need to add the following lines into /etc/sysctl.conf:
fs.suid_dumpable = 2
kernel.core_pattern = /var/corecore_%h_%e_%u_%g_%t_%p
And then to make sure we aren’t imposing limits on our core files, we add the following to /etc/sysconfig/init:
DAEMON_COREFILE_LIMIT='unlimited' # don't limit our core file sizes
Luckily there’s just a couple of differences between Solaris and Linux when it comes to naming our core files:
Solaris |
Variable |
Linux |
%n |
nodename |
%n |
%f |
executable name |
%e |
%u |
UID |
%u |
%g |
GID |
%g |
%t |
epoch time |
%t |
%p |
PID |
%p |
Once you’ve updated /etc/sysctl.conf we can just refresh our settings by running sysctl:
[root@altix ~]# /sbin/sysctl -p
< list of kernel tunables >
fs.suid_dumpable = 2
kernel.core_pattern = /var/core/core_%h_%e_%u_%g_%t_%p
< list of kernel tunables >
This is something that crops up again – a bit of a FAQ. Very often, the machine you’re using your web browser on is not the machine you want to download software to. When you’re dealing with multi-gigabyte files (like an OpenSolaris ISO), downloading them and then having to copy them over a LAN is just duplicating the pain.
And that’s even assuming your server isn’t in a remote data centre somewhere.
However, help is at hand with the trusty command line tool, wget. Now Sun appear to be embedding session IDs in the download URLs, it’s possible to log into Sun Download Centre (SDLC), find the file you want, right click, and select ‘Copy Link Location’. (Or whatever your browser says the option is.)
Then start a terminal session on your server, and execute the following:
wget -O filename.ext "SDLC_URL_goes_here"
wget will head off to the URL, and you’ll find that filename.ext will be downloaded and saved.
Excellent news has arrived from the HPC Developer OpenSolaris community, via Bruce Rothermal. Traditionally it’s been very hard to get involved in HPC – you need a lot of kit, a lot of software, and some knowledge to get it all setup.
Sun have solved all of this by making available a Virtual Machine Image for VirtualBox (or VMWare) which contains an entire HPC stack:
- Sun Grid Engine and Zones
- MPI and HPC Cluster Tools
- compilers, scripting languages, and more
The HPC Developer Stack provides a simple, easy way to start getting up to speed with the same technologies and tools that are used on monster installs like TACC’s Ranger.
Grab the download, fire up VirtualBox, and start getting involved in the world of HPC.
It’s an annoying and recurring problem – your previously configured and well behaved Solaris machine has now dropped off the network, and no-one can log in. Going in via the console shows that all LDAP lookups fail, and that’s why no-one can log in.
/var/adm/messages is filled with cheery messages like this:
ldap_cachemgr[173]: [ID 293258 daemon.error] libsldap: Status: 0 \
Mesg: Empty config file: '/var/ldap/ldap_client_file'
You’ll also find the LDAP client SMF service has gone into maintenance mode:
bash-3.00$ svcs ldap/client
STATE STIME FMRI
maintenance 17:17:42 svc:/network/ldap/client:default
So what happened? The Solaris ldap_cachemgr process regularly talks to your LDAP servers, and at a pre-defined interval (usually 12 hours) it refreshes the client config. This has a number of benefits, not least of which is that you can make one change in the LDAP directory, and the have your clients all update themselves automatically.
This is great for putting a new LDAP server into play, or for doing a server migration.
The problem arises when /var, where the two LDAP configuration files are stored, is full. Unfortunately ldap_cachemgr doesn’t bother to check that it can save the new config – so it tries to replace the two existing config files, fails, and ends up writing zero byte files in their place.
Luckily the fix is a simple one – simply copy ldap_client_cred and ldap_client_file from another working server into /var/ldap, and then restart the ldap_cachemgr.
bash-3.00$ svcadm clear ldap/client
bash-3.00$ svcs ldap/client
STATE STIME FMRI
online 17:17:42 svc:/network/ldap/client:default
The workaround is to make sure that the /var partition never fills up. If it’s 100% full it’s bad for a number of reasons, and you need to put processes in place to trigger alerts to stop this happening.
The bug in ldap_cachemgr is being tracked with SunSolve Bug ID 6495683 – “LDAP client files & cred files are deleted when /var is full”
The ldap_cachemgr can also be queried to find out who it’s bound to – and also when it will next be refreshing the LDAP client configuration. Pass it the -g option:
bash-3.00$ /usr/lib/ldap/ldap_cachemgr -g
cachemgr configuration:
server debug level 0
server log file "/var/ldap/cachemgr.log"
number of calls to ldapcachemgr 30
cachemgr cache data statistics:
Configuration refresh information:
Previous refresh time: 2009/06/03 05:17:42
Next refresh time: 2009/06/03 17:17:42
Server information:
Previous refresh time: 2009/06/03 09:57:42
Next refresh time: 2009/06/03 11:17:42
server: 192.168.13.101, status: UP
Cache data information:
Maximum cache entries: 256
Number of cache entries: 0