A squid-2.5.STABLE7 box, running under FREEBSD-4.8p, with (HT should be closed?):
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3060.23-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf29 Stepping = 9
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
HTT,TM,PBE>
Hyperthreading: 2 logical CPUs
real memory = 1073573888 (1048412K bytes)
avail memory = 1041555456 (1017144K bytes)
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.4.10> port
0xdcc0-0xdcff mem 0xfcf20000-0xfcf3ffff irq 7 at device 4.0 on pci1
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=3<rxcsum,txcsum>
inet xx.xx.xx.xx netmask 0xffffff00 broadcast xx.xx.xx.xx
inet 10.10.0.xxx netmask 0xffffff00 broadcast 10.10.0.255
ether 00:xx:xx:xx:xx:xx
media: Ethernet autoselect (1000baseTX <full-duplex>)
status: active
Under the productional environment, it is able to run more than 3000
requests per second.
input (Total) output
packets errs bytes packets errs bytes colls
656 0 2086797 817 0 9504793 0
13011 0 1849640 19003 0 7706106 0
735 0 2450112 898 0 7379960 0
11862 0 3460007 16745 0 7874831 0
2436 0 3105506 2592 0 7652123 0
As you see, the throughput is still under 100Mbit although a GBit
adapter is used. When its load is increased, the value of
client_http.hit_median_svc_time becomes higher and higher and finally
unacceptable.
client_http.hit_median_svc_time = 0.070143 seconds
I found that if 'netstat -an|wc -l' is less than 2700,
hit_median_svc_time is quite small (0.000911s or less), but if the
netstat number is bigger than 2800, hit_median_svc_time will climb
high.
When hit_median_svc_time is high, the CPU is not heavily loaded:
(top result)
last pid: 40547; load averages: 0.40, 0.46, 0.43
up 1+05:55:25 19:39:42
34 processes: 1 running, 33 sleeping
CPU states: 35.0% user, 0.0% nice, 8.6% system, 7.8% interrupt, 48.6% idle
Mem: 388M Active, 313M Inact, 250M Wired, 52M Cache, 112M Buf, 1664K Free
Swap: 4096M Total, 104K Used, 4096M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
39605 nobody -6 0 366M 365M biord 48:17 47.07% 47.07% squid
So I added some time-counter to squid sources, from http->start is set
to httpRequestFree. I found the time of HTTP request processing is
quite low (less than 10% of hit_median_svc_time), but the calls of
default_read_method and default_write_method takes a lot of time. It
accords with top result (CPU load is not high and system/intr is a
little bit loaded).
netstat shows:
# netstat -an|grep FIN|wc -l
0
# netstat -an|grep WAIT|wc -l
1000
# netstat -an|wc -l
2960
About one third of all connections are TIME_WAITs. The calls to close(fd) are fast.
I have adjusted sysctl values as following:
kern.ipc.somaxconn=8192
kern.maxfiles=99328
kern.maxfilesperproc=98304
net.inet.ip.portrange.last=20000
net.inet.tcp.sendspace=8192
net.inet.tcp.recvspace=8192
I have closed access logging to disk by using /dev/null as filename.
The log lines are still generated.
So the CPU and memory are all not saturated. Is it possible to break
the bottleneck of concorrent connection number of 2700 (a hard-limit
or not?), and even break the bottleneck of 100MBps? Turn down the
TIME_WAIT timeout will help? Thanks.
------------------------------------
Regards, |