UNIX Consulting and Expertise
Golden Apple Enterprises Ltd. » Posts in 'Solaris' category

OpenSolaris – turmoil in the community Comments Off on OpenSolaris – turmoil in the community

The continued silence from Oracle is causing a bit of a stir in the OpenSolaris community. The OGB (the governing board for the OpenSolaris community) has given Oracle an ultimatum – appoint a liaison to the community by August 16th, or the OGB will dissolve and dump things back in Oracle’s lap.

Peter Tribble has a good take from the OGB’s point of view here, and Ben Rockwood shares his frustrations here.

In the meantime, the Nexenta guys (who count a number of excellent ex-Sun Solaris chaps amongst their number) have said to sit tight and wait for some news. Out of all the community distributions, Nexenta seem to have the talent and business plan to push forward a solid product built around Oracle’s sources.

As well as checking out Nexenta Core, I’d recommend keeping an eye on Alasdair Lumsden’s efforts to get a community OpenSolaris distribution up and running.

The most notable silence so far on the OpenSolaris lists has been from Joyent – they’re heavy users of OpenSolaris, and it’s pretty key to their business. Are they rolling their own custom distribution internally?

Oracle’s attitude to user groups, smaller Sun partners, and communities around products like OpenSolaris and Lustre has been appalling. Lack of communication and transparency is the least of the problems.

Yes, Sun was a big company, and yes, integration of a bottom-up culture like Sun’s into a top-down culture like Oracle was always going to be painful. But it’s been a year since Oracle bought Sun, and it’s not like they didn’t know what they were getting.

I’m sitting tight for Oracle OpenWorld in September, because there will be a slew of relevant announcements then. Yes, the continued silence from Oracle is pretty poor – but it’s the way they run things, and hopefully post OpenWorld we’ll be seeing some changes in the way Oracle operates.

Installing mod_evasive with Sun’s Webstack Comments Off on Installing mod_evasive with Sun’s Webstack

I’ve been running Webstack builds on some of my servers for a while now, and have been pretty happy with the performance and the ease of configuration. One of my webhosts deals with some pretty high traffic, and odds are that such a visible machine will sooner or later come under a DoS attack.

mod_evasive is an Apache module specifically designed to deal with this. From the author’s site:

mod_evasive is an evasive maneuvers module for Apache to provide evasive action in the event of an HTTP DoS or DDoS attack or brute force attack. It is also designed to be a detection and network management tool, and can be easily configured to talk to ipchains, firewalls, routers, and etcetera. mod_evasive presently reports abuses via email and syslog facilities.

So this is how you go about installing mod_evasive when using Sun’s Webstack build of Apache. Apache’s extension tool (apxs) makes this a quick and simple task, but bear in mind that you will need the Sun Studio compiler installed on your build box. Because you’re not throwing this together on a live webserver, right?

Just to provide the numbers for the build environment I’ve used in this example – I’ve got Sun Studio 12 Update 1 installed, and the box is running Solaris 10 10/09, with Webstack 1.5, which gives me Apache 2.2.11. However there’s nothing too specific, version wise, in any of this, and the process should be the pretty much the same for different versions of Webstack and Solaris 10.

First of all, head on over to Jonathan Zdziarski’s site to download the latest version (1.10.1 as of writing this).

bash-3.00# wget http://www.zdziarski.com/blog/wp-content/uploads/2010/02/mod_evasive_1.10.1.tar.gz
--09:29:02--  http://www.zdziarski.com/blog/wp-content/uploads/2010/02/mod_evasive_1.10.1.tar.gz
           => `mod_evasive_1.10.1.tar.gz'
Resolving www.zdziarski.com... 209.51.159.242
Connecting to www.zdziarski.com|209.51.159.242|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20,454 (20K) [application/x-tar]

100%[====================================>] 20,454        62.37K/s            

09:29:03 (62.25 KB/s) - `mod_evasive_1.10.1.tar.gz' saved [20454/20454]

Then uncompress the archive and extract the files:

bash-3.00# gzcat mod_evasive_1.10.1.tar.gz | tar -xvf -
x mod_evasive, 0 bytes, 0 tape blocks
x mod_evasive/.cvsignore, 26 bytes, 1 tape blocks
x mod_evasive/LICENSE, 18103 bytes, 36 tape blocks
x mod_evasive/Makefile.tmpl, 470 bytes, 1 tape blocks
x mod_evasive/README, 14269 bytes, 28 tape blocks
x mod_evasive/mod_evasive.c, 19395 bytes, 38 tape blocks
x mod_evasive/mod_evasive20.c, 18242 bytes, 36 tape blocks
x mod_evasive/mod_evasiveNSAPI.c, 15621 bytes, 31 tape blocks
x mod_evasive/test.pl, 406 bytes, 1 tape blocks
x mod_evasive/CHANGELOG, 1373 bytes, 3 tape blocks

With Webstack, apxs can be found at /opt/webstack/apache2/2.2/bin/apxs

Simple call apxs and get it to build the Apache 2.0 version of the mod_evasive module:

bash-3.00# /opt/webstack/apache2/2.2/bin/apxs -cia mod_evasive20.c

Important point here – if you expect this to work, you’ll need at least the following setup:

bash-3.00# export PATH=/usr/ccs/bin:/opt/sunstudio12.1/bin:$PATH

apxs will run off, compile the module, and copy everything into place, and then the final message it gives you is this:

[activating module `evasive20' in /etc/opt/webstack/apache2/2.2/conf.d/modules-32.load]

And sure enough, we’ve now got:

bash-3.00# grep evasive /etc/opt/webstack/apache2/2.2/conf.d/modules-32.load
LoadModule evasive20_module   /var/opt/webstack/apache2/2.2/libexec/mod_evasive20.so

Looking good so far, but we have a final chunk of configuration to put into place. mod_evasive needs a few tunables adding to control how it responds to traffic. These are some sensible defaults which I’d recommend trying out initially:

<IfModule mod_evasive20.c>
    DOSHashTableSize    3097
    DOSPageCount        2
    DOSSiteCount        50
    DOSPageInterval     1
    DOSSiteInterval     1
    DOSBlockingPeriod   10
</IfModule>

I highly recommend reading through the README that came with the source, and then keeping a sharp eye on what your webserver does, to see if you need to tweak any defaults. I’d also suggest adding the email alerting option inside the IfModule configuration:

DOSEmailNotify    [email protected]

Now you just need to restart Apache:

bash-3.00# svcadm restart sun-apache22
bash-3.00# svcs sun-apache22
STATE          STIME    FMRI
online          9:56:05 svc:/network/http:sun-apache22

mod_evasive comes with a test script – test.pl – and I’d recommend running that in your test/build environment, to check that everything works as it should.

Hopefully this has shown how easy it is to build mod_evasive DoS protection into Sun’s Webstack build of Apache running on Solaris 10.

Growing swap on a ZFS filesystem 1 comment

Recently I had to tackle a badly installed Solaris machine which hadn’t been configured with enough swap space. Luckily it had been built with a ZFS root filesystem, which made dealing with this a lot less painful.

First of all we need to get the details of our current swap setup:

bash-3.00# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 256,2      16 4194288 4194288

New step is to increase the size of the ZFS ‘filesystem’ under the root pool (here called the default, rpool).

bash-3.00# zfs set volsize=4G rpool/swap

Once the filesystem size has been increased, we need to actually add it as swap. The normal swap command will do this – we just need to make sure we’re pointing it at the correct ZFS device:

bash-3.00# env NOINUSE_CHECK=1 swap -a /dev/zvol/dsk/rpool/swap $((8+4194288))

Let’s just check the status via ZFS:

bash-3.00# zfs list rpool/swap
NAME         USED  AVAIL  REFER  MOUNTPOINT
rpool/swap     4G  3.16G  2.03G  -

And finally we can see the new swap space we’ve just added:

bash-3.00# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 256,2      16 4194288 4194288
/dev/zvol/dsk/rpool/swap 256,2  4194304 4194304 4194304

A simple handful of commands, and no downtime – adding extra swap space using ZFS on Solaris is pretty painless. In another post I’ll explore how to grow ZFS filesystems like /var.

Installing OpenSolaris with the Automated Installer Comments Off on Installing OpenSolaris with the Automated Installer

One of the new features of OpenSolaris is AI – the Automated Installer. If you were hoping to use your existing Jumpstart setup to install OpenSolaris over the network, get ready for some disappointment – it won’t work.

AI replaces Jumpstart for network installs. If you’re deploying OpenSolaris from scratch, this is fine – build one machine manually, set it up as an AI server, and roll out the rest. If you’ve got an existing Solaris infrastructure, however, this becomes a pain of a pain. An additional issue to take into account is that AI can only deploy OpenSolaris to SPARC machines which support WANBoot – so you’ll need to check your OBP versions.

This is the current matrix of what can be installed and how:

OS Install methods OS that can be installed
Solaris Jumpstart
Jumpstart + JET
Solaris SPARC
Solaris x86
OpenSolaris AI
Jumpstart
Jumpstart + JET (4.7)
Solaris SPARC
Solars x86
OpenSolaris SPARC
OpenSolaris x86

I’m sure lots of people are in no hurry to migrate their Jumpstart server to OpenSolaris – especially as it’s probably a SPARC box.

The big advantage of AI is that it’s very easy to get going. I’m going to work through deploying AI to demonstrate how simple it is. In this environment, the OpenSolaris machine that will be used as an AI server is also my workstation. I’ve got a Netra T1 running a complex Jumpstart + JET setup with lots of customisations – I don’t want to replace that, but I do want to use it’s DHCP server.

AI is managed via the installadm tool. It’s probably not installed on your OpenSolaris machine by default, so you’ll need to add it:

pfexec pkg install SUNWinstalladm-tools

Once you’ve got the tools in place, you need to setup the install server. Download an OpenSolaris image – but be careful! The Live CD ISO cannot be used to setup an AI server, you have to download the AI version of the ISO.

Head on over to http://hub.opensolaris.org/bin/view/Main/downloads to grab the relevant ISO.

Once you’ve got the ISO, you can setup the install server. It works based on services – each OS release for each platform is treated as a different service. The you add clients, and tell the client which service it will use to boot from.

I’m going to be sticking AI under /export – the traditional place in Solaris for shared filesystems. I just want to create one install service for OpenSolaris 06/09 x86, which I’ll call 0609×86.

The full command line is:

installadm create-service -n (service_name) \
	-s (source_AI_ISO) (AI_service_data_directory)

Here’s the full command line along with the output:

root@grond:/export# /usr/sbin/installadm create-service -n 0609x86 \
	-s /export/torrents/osol-0906-ai-x86.iso \ 
	/export/aiserver/osol-0906-ai-x86
Setting up the target image at /export/aiserver/osol-0906-ai-x86 ...
Registering the service 0609x86._OSInstall._tcp.local

Detected that DHCP is not set up on this server.
If not already configured, please create a DHCP macro
named dhcp_macro_0609x86 with:
   Boot server IP (BootSrvA) : 192.168.13.100
   Boot file      (BootFile) : 0609x86
   GRUB Menu      (GrubMenu) : menu.lst.0609x86
If you are running Sun's DHCP server, use the following
command to add the DHCP macro, dhcp_macro_0609x86:
   /usr/sbin/dhtadm -g -A -m dhcp_macro_0609x86 -d :BootSrvA=192.168.13.100: \
	BootFile=0609x86:GrubMenu=menu.lst.0609x86:

Additionally, if the site specific symbol GrubMenu
is not present, please add it as follows:
   /usr/sbin/dhtadm -g -A -s GrubMenu -d Site,150,ASCII,1,0

Note: Be sure to assign client IP address(es) if needed
(e.g., if running Sun's DHCP server, run pntadm(1M)).
Service discovery fallback mechanism set up

Helpfully, installadm tells us what commands to run on our DHCP server. First we’ll need to add the GrubMenu symbol (it won’t exist by default) and then we can add in the DHCP macro for the service. Just copy and paste the two commands on your Jumpstart server.

With that out of the way, we can now setup a client. In this case, I have a Sun v20z with a MAC address of 00:09:3d:12:ff:80 on bge0.

We need to run installadm to create the client, giving it the MAC address and telling it which install service to use. The command line is:

installadm create-client -e (MAC_address) -n (AI_service_name_to_use) \
	-t (AI_service_data_directory)

Here’s the full command line with the output:

root@grond:/export# /usr/sbin/installadm create-client \
	-e 00:09:3d:12:ff:80 -n 0609x86 \
	-t /export/aiserver/osol-0906-ai-x86
Setting up X86 client...
Service discovery fallback mechanism set up

Detected that DHCP is not set up on this server.
If not already configured, please create a DHCP macro
named 0100093D12FF80 with:
   Boot server IP (BootSrvA) : 192.168.13.100
   Boot file      (BootFile) : 0100093D12FF80
If you are running Sun's DHCP server, use the following
command to add the DHCP macro, 0100093D12FF80:
   /usr/sbin/dhtadm -g -A -m 0100093D12FF80 -d :BootSrvA=192.168.13.100: \
	BootFile=0100093D12FF80:GrubMenu=menu.lst.0100093D12FF80:

Note: Be sure to assign client IP address(es) if needed
(e.g., if running Sun's DHCP server, run pntadm(1M)).

Once again installadm will helpfully tell us what commands we need to run on our DHCP server to add the macros for this client.

Over on the Jumpstart server, here’s the output of dhtadm showing us the configured macros on the Sun DHCP server (with some line breaks to make it a bit more readable):

bash-3.00# dhtadm -P
Name                    Type            Value
==================================================
dhcp_macro_0609x86      Macro           :BootSrvA=192.168.13.100: \
	BootFile=0609x86:GrubMenu=menu.lst.0609x86:
0100093D12FF80          Macro           :BootSrvA=192.168.13.100: \
	BootFile=0100093D12FF80:GrubMenu=menu.lst.0100093D12FF80:
v20z                    Macro           :BootFile=0100093D12FF80: \
	BootSrvA=192.168.13.101:
192.168.13.0            
GrubMenu                Symbol          Site,150,ASCII,1,0

I’ve removed all of the other stuff that Jumpstart puts in there to clearly show the AI macros that have been added.

At this stage, we can just SSH into the V20z’s ILOM, power on the chassis, and go into the BIOS to change the boot order. PXEboot will then send out a DHCP request, and we’ll then see the OpenSolaris grub menu.

From that point onwards it’s a hands-off install. For more details on the entire process, have a read through the OpenSolaris Automated Installer Guide.

Having played around with AI for a bit now, I’m not that impressed to be honest. I can see that it could be easier for new users who’ve never touched Solaris before – as you can see, it doesn’t take much to setup an install server and configure clients.

However, there’s a big installed base of Solaris users out there, and they’ve all got Jumpstart. AI lacks the features, flexibility and power of Jumpstart – it’s not ready as a replacement just yet. So being forced to use it to be able to deploy OpenSolaris just means many existing Solaris shops won’t bother – integration with Jumpstart for OpenSolaris could well speed up it’s acceptance and adoption.

With so many Solaris users out there I think that OpenSolaris needs a lot of work to become a credible upgrade or migration path. Both AI and the new IPS packaging system show promise, but they’re a long way from being usable replacements to existing Solaris technologies.

Playing with Solaris processor sets Comments Off on Playing with Solaris processor sets

The idea behind processor sets has been around for a decade or so in the HPC arena. You’ve got certain jobs, that require a certain amount of CPU resources, or a certain IO profile, so you want to dedicate some CPUs just to them. Solaris has had processor controls in since the dark days of 2.6.

*Note:* I’m going to be freely talking about CPUs as the processing unit. This is all on T2ks and so I know that they’re not *real* CPUs – call them thread processing units or something, but for simplicity this document will just call them CPUs and be done with it.

The actual management of processor sets is very straightforward, and I’ll be playing about with them on one of my favourite bits of kit – the Sun T2000.

First of all we use the psrinfo command to view the status of our processors:

bash-3.00# psrinfo
0       on-line   since 11/21/2006 20:24:57
1       on-line   since 11/21/2006 20:24:58
2       on-line   since 11/21/2006 20:24:58
3       on-line   since 11/21/2006 20:24:58
4       on-line   since 11/21/2006 20:24:58
5       on-line   since 11/21/2006 20:24:58
6       on-line   since 11/21/2006 20:24:58
7       on-line   since 11/21/2006 20:24:58
8       on-line   since 11/21/2006 20:24:58
9       on-line   since 11/21/2006 20:24:58
10      on-line   since 11/21/2006 20:24:58
11      on-line   since 11/21/2006 20:24:58
12      on-line   since 11/21/2006 20:24:58
13      on-line   since 11/21/2006 20:24:58
14      on-line   since 11/21/2006 20:24:58
15      on-line   since 11/21/2006 20:24:58
16      on-line   since 11/21/2006 20:24:58
17      on-line   since 11/21/2006 20:24:58
18      on-line   since 11/21/2006 20:24:58
19      on-line   since 11/21/2006 20:24:58
20      on-line   since 11/21/2006 20:24:58
21      on-line   since 11/21/2006 20:24:58
22      on-line   since 11/21/2006 20:24:58
23      on-line   since 11/21/2006 20:24:58
24      on-line   since 11/21/2006 20:24:58
25      on-line   since 11/21/2006 20:24:58
26      on-line   since 11/21/2006 20:24:58
27      on-line   since 11/21/2006 20:24:58
28      on-line   since 11/21/2006 20:24:58
29      on-line   since 11/21/2006 20:24:58
30      on-line   since 11/21/2006 20:24:58
31      on-line   since 11/21/2006 20:24:58

Let’s do a quick network performance test with iperf to see what sort of throughput we can get when all processing units are able to process network IO:

bash-3.00# ./iperf --client np1unx0006 --time 60 --dualtest
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to np1unx0006, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  5] local 192.168.105.62 port 37438 connected with 192.168.105.59 port 5001
[  4] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63459
[  5]  0.0-60.0 sec  3.77 GBytes    540 Mbits/sec
[  4]  0.0-60.0 sec  3.62 GBytes    518 Mbits/sec

At the same time, let’s have a look with mpstat to get an idea of what the processors are dealing with while this is going on.

The important colums here are intr, showing the amount of interrupts each CPU is handling. We also need to keep an eye on the number of system calls each CPU is fielding (syscl) and also the context switches and involuntary context switches (csw and icsw respectively) to make sure jobs are completely before the scheduler kicks them off the CPU.

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   96   0  248  3481    0 7126    4   34  439    0  6041    2  25   0  74
  1  122   0  171  1332    0 2796    2   24  340    0  2369    2  14   0  85
  2   79   0  216   646    0 1472    0   18  226    0   202    0   5   0  95
  3   30   0  143   356    0  829    0   16  137    0    23    0   2   0  98
  4   47   0  260   618    0 1514    0   18  163    0    74    0   3   0  97
  5   56   0  257   714    0 1662    1   19  234    0   311    1   6   0  94
  6   67   0  466  1085    0 2593    1   19  588    0  1234    0  17   0  82
  7   26   0  268   894    0 2031    0   18  202    0   136    0   4   0  96
  8  241   0  341   993    0 2286    0   22  258    0   358    1   7   0  91
  9  190   0  292  1431    0 3102    1   21  257    0  1551    1   9   0  90
 10  114   0  336  1155    0 2580    0   18  286    0   429    0   6   0  94
 11   28   0  283   837    0 1883    1   18  551    0  1283    1  15   0  84
 12    0   0    1     2    0    3    0    1    0    0     0    0   0   0 100
 13    0   0    1     3    0    4    0    1    1    0     1    0   0   0 100
 14    3   0    2     5    0    9    0    1    4    0     3    0   0   0 100
 15    0   0    0     9    0    0    8    0    4    0 534955   75  25   0   0
 16   64   0  423  1299  110 2418    0   18  286    0    59    0   5   0  95
 17   89   0  454  1473    0 3233    0   19  319    0   793    1   7   0  92
 18   46   0  397   960    1 2217    0   18  290    0    39    0   4   0  96
 19   79   0  321  1048    2 2340    2   19  494    0  2073    2  15   0  83
 20   79   0  205   852    1 1773    1   21  313    0  1493    1  14   0  85
 21   27   0 19965 41259 41036  635   15   28 2862    0   415    0  47   0  53
 22   65   0  129  1069    0 2274    1   21  139    0  1053    1   7   0  92
 23   62   0  134   681    0 1446    1   20  370    0   931    1  14   0  85
 24  115   0  260   799    0 1986    0   22  212    0   313    0   4   0  95
 25  113   0  273   962    1 2225    1   22  266    0   684    1   7   0  93
 26   73   0  312  1241    0 2862    0   23  271    0   663    0   6   0  94
 27  115   0  270   862    0 2017    0   22  201    0   209    1   5   0  95
 28  179   0  225   689    0 1548    0   17  213    0   302    1   5   0  94
 29   42   0  224   656    0 1507    0   15  163    0   134    0   3   0  97
 30   40   0  298   774    0 1821    1   14  459    0  1316    1  17   0  83
 31   27   0  227   649    0 1544    1   15  644    0  1418    1  18   0  82

From this we can see we’re getting fairly decent throughput over GigE, and that the interrupts are spread across all the CPUs.

Now let’s create a processor set, and stick half our CPUs in it.

The command is psrset with the -c option to create a set. As this is the first processor set it will be processor set 1 – the next would be 2, etc. etc.

Remember we can get the number of our CPUs from the psrinfo command.

bash-3.00# psrset -c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
created processor set 1
processor 0: was not assigned, now 1
processor 1: was not assigned, now 1
processor 2: was not assigned, now 1
processor 3: was not assigned, now 1
processor 4: was not assigned, now 1
processor 5: was not assigned, now 1
processor 6: was not assigned, now 1
processor 7: was not assigned, now 1
processor 8: was not assigned, now 1
processor 9: was not assigned, now 1
processor 10: was not assigned, now 1
processor 11: was not assigned, now 1
processor 12: was not assigned, now 1
processor 13: was not assigned, now 1
processor 14: was not assigned, now 1
processor 15: was not assigned, now 1

Now that we’ve assigned half our CPUs to processor set 1, we want to disable interrupt handling for them. We could use the psradm command to do it on a per CPU basis, but it’s much easier to just apply the setting to the entire processor set.

bash-3.00# psrset -f 1

The -f option disables interrupt handling, and the 1 is the processor set we want to apply this to.

We can check the effect by calling psrinfo again:

bash-3.00# psrinfo
0       no-intr   since 12/19/2006 18:15:15
1       no-intr   since 12/19/2006 18:15:15
2       no-intr   since 12/19/2006 18:15:15
3       no-intr   since 12/19/2006 18:15:15
4       no-intr   since 12/19/2006 18:15:15
5       no-intr   since 12/19/2006 18:15:15
6       no-intr   since 12/19/2006 18:15:15
7       no-intr   since 12/19/2006 18:15:15
8       no-intr   since 12/19/2006 18:15:15
9       no-intr   since 12/19/2006 18:15:15
10      no-intr   since 12/19/2006 18:15:15
11      no-intr   since 12/19/2006 18:15:15
12      no-intr   since 12/19/2006 18:15:15
13      no-intr   since 12/19/2006 18:15:15
14      no-intr   since 12/19/2006 18:15:15
15      no-intr   since 12/19/2006 18:15:15
16      on-line   since 11/21/2006 20:24:58
17      on-line   since 11/21/2006 20:24:58
18      on-line   since 11/21/2006 20:24:58
19      on-line   since 11/21/2006 20:24:58
20      on-line   since 11/21/2006 20:24:58
21      on-line   since 11/21/2006 20:24:58
22      on-line   since 11/21/2006 20:24:58
23      on-line   since 11/21/2006 20:24:58
24      on-line   since 11/21/2006 20:24:58
25      on-line   since 11/21/2006 20:24:58
26      on-line   since 11/21/2006 20:24:58
27      on-line   since 11/21/2006 20:24:58
28      on-line   since 11/21/2006 20:24:58
29      on-line   since 11/21/2006 20:24:58
30      on-line   since 11/21/2006 20:24:58
31      on-line   since 11/21/2006 20:24:58

Rock on! psrinfo clearly shows that half our CPUs will no longer handle interrupts. Let’s kick off another iperf throughput test and see what happens:

bash-3.00# ./iperf --client np1unx0006 --time 60 --dualtest
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to np1unx0006, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  4] local 192.168.105.62 port 37419 connected with 192.168.105.59 port 5001
[  5] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63457
[  4]  0.0-60.0 sec  3.36 GBytes    481 Mbits/sec
[  5]  0.0-60.0 sec  3.05 GBytes    436 Mbits/sec

Looking at mpstat we can clearly see the effects:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  1    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  2    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  3    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  4    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  5    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  6    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  7    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  8    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  9    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 10    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 11    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 12    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 13    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 14    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 15    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 16  336   0  276  1854  112 3372   12   49  844    0 135008   24  37   0  39
 17  261   0  115  2236    1 4594    6   51 1050    0 27470    7  37   0  56
 18  104   0  105  1699    3 3506    9   36 1071    0 103500   17  38   0  44
 19  106   0  148   855    1 1800    2   28  286    0 10881    2   8   0  89
 20  290   0 19102 42492 42200  838   28   42 2676    0 74629   17  60   0  23
 21  256   0  801  1952    0 4397    5   39 1272    0  2196    2  29   0  68
 22  209   0  475  1191    0 2663    2   38  552    0   776    1  12   0  87
 23  260   0  500  1134    4 2540    2   38  597    0 13071    4  13   0  84
 24  455   0  752  2213    1 5038    4   41  916    0 10316    4  20   0  77
 25  500   0  803  2485    0 5499    4   45 1352    0 17171    5  31   0  64
 26  654   0  683  1773    0 4009    5   45  933    0  2119    8  19   0  73
 27  503   0  516  1812    0 3952    5   45  748    0 21682    6  16   0  79
 28  552   0  860  2332    0 5093    7   40 1065    0 12217   16  21   0  63
 29  480   0  688  2292    0 4996    4   47  924    0  1395    3  17   0  80
 30  663   0  476  1553    0 3357    5   45  658    0  2680    9  16   0  75
 31  485   0  445  1520    0 3297    4   47  716    0  1167    2  16   0  82

We can see the non-interrupt handling CPUs in processor set 1 are totally idle – they’re just sitting there, twiddling their thumbs, and laughing at the other 16 CPUs working their socks off.

Involuntary context switches aren’t causing us an issue, so we can see that even with the reduced number of CPUs handling the interrupts, they’re still managed to deal with the load.

Now let’s see what happens when we execute the single-thread iperf process inside processor set 1. We can control this by using the psrset command to launch our app.

bash-3.00# psrset -e 1 ./iperf --client np1unx0006 --time 60 --dualtest
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to np1unx0006, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  4] local 192.168.105.62 port 37419 connected with 192.168.105.59 port 5001
[  5] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63457
[  4]  0.0-60.0 sec  3.36 GBytes    481 Mbits/sec
[  5]  0.0-60.0 sec  3.05 GBytes    436 Mbits/sec

And mpstat should give us an idea of what’s happening:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0  7751    0 16048    3    0  179    0 15235    3  42   0  55
  1    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  2    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  3    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  4    0   0    0   201    0  403    6    0 2478    0  8254    3  81   0  16
  5    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  6    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  7    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  8    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
  9    0   0    1     3    0    4    0    0    2    0     0    0   0   0 100
 10    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 11    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 12    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 13    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 14    0   0    0     1    0    0    0    0    0    0     0    0   0   0 100
 15    0   0    0     8    0    0    7    0    4    0 532794   75  25   0   0
 16  313   0  942  1346  114 2533    1   20  459    0   419    1  13   0  86
 17  306   0  450   687    1 1529    0   13  297    0   558    1  13   0  86
 18  399   0  467   653    3 1442    1   14  255    0  3559   13  13   0  74
 19  221   0  326   509    1 1164    0   11  196    0   401    1   7   0  92
 20  646   0 10825 47201 47171   90    7   18 2405    0  1210    4  55   0  40
 21  156   0  673   757    0 1769    0   15  346    0   201    1   9   0  91
 22  220   0  959  1055    0 2467    0   15  488    0   397    2  12   0  86
 23  204   0  844   791    2 1844    0   14  443    0   223    1  12   0  88
 24  341   0  718  1033    1 2439    1   13  401    0  2571   10  14   0  76
 25  205   0  570   804    0 1945    0   13  314    0   376    1   9   0  90
 26  262   0  369   584    0 1379    0   12  204    0   422    1   7   0  92
 27  199   0  348   519    0 1226    0   13  180    0   533    2   6   0  92
 28  356   0  434   726    0 1730    0   17  247    0   515    2   9   0  89
 29  393   0  267   428    0 1043    1   18  197    0   999    3  11   0  86
 30  367   0  491   812    0 1829    0   17  298    0   449    1  10   0  89
 31  302   0  424   675    0 1538    0   15  245    0   339    1   8   0  91

Well, that’s broken things. How come the processors in the set are now handling interrupts?

It looks like executing the binary inside the processor set still generates interrupts – but these are unlikely to be network I/O. Check out the number of syscalls being generated! It’s likely an artefact of my poor choice of application – iperf generates a huge amount of interrupts and can really cane your ethernet interfaces.

We could use dtrace to have a real poke around, but I think that should be the topic for another day.

Now we’ve finished playing around, we need to re-enable interrupt handling on those CPUs. As the -f flag to psrset disabled interrupt handling, -n is the option we need to re-enabled interrupt handling on a processor set.

bash-3.00# psrset -n 1

Now the CPUs are handling interrupts again, we need to delete the processor set. We do this by passing the psrset command the -d option, and giving it the processor set number:

bash-3.00# psrset -d 1
removed processor set 1

Finally let’s run psrinfo and double check the state of our CPUs:

bash-3.00# psrinfo
0       on-line   since 12/19/2006 18:23:42
1       on-line   since 12/19/2006 18:23:42
2       on-line   since 12/19/2006 18:23:42
3       on-line   since 12/19/2006 18:23:42
4       on-line   since 12/19/2006 18:23:42
5       on-line   since 12/19/2006 18:23:42
6       on-line   since 12/19/2006 18:23:42
7       on-line   since 12/19/2006 18:23:42
8       on-line   since 12/19/2006 18:23:42
9       on-line   since 12/19/2006 18:23:42
10      on-line   since 12/19/2006 18:23:42
11      on-line   since 12/19/2006 18:23:42
12      on-line   since 12/19/2006 18:23:42
13      on-line   since 12/19/2006 18:23:42
14      on-line   since 12/19/2006 18:23:42
15      on-line   since 12/19/2006 18:23:42
16      on-line   since 11/21/2006 20:24:58
17      on-line   since 11/21/2006 20:24:58
18      on-line   since 11/21/2006 20:24:58
19      on-line   since 11/21/2006 20:24:58
20      on-line   since 11/21/2006 20:24:58
21      on-line   since 11/21/2006 20:24:58
22      on-line   since 11/21/2006 20:24:58
23      on-line   since 11/21/2006 20:24:58
24      on-line   since 11/21/2006 20:24:58
25      on-line   since 11/21/2006 20:24:58
26      on-line   since 11/21/2006 20:24:58
27      on-line   since 11/21/2006 20:24:58
28      on-line   since 11/21/2006 20:24:58
29      on-line   since 11/21/2006 20:24:58
30      on-line   since 11/21/2006 20:24:58
31      on-line   since 11/21/2006 20:24:58

Solaris processor sets are the easiest to use of all the resource controls built into the OS. We can peg things like zones, individual applications, or even specific processes, to their own processor sets to control and manage resource usage. This gives us some really fine grained control over how the system is used, and with a machine like the T2000 it allows us to really scale performance.

Top of page / Subscribe to new Entries (RSS)