OBSOLETE Patch-ID# 123037-01
Download this patch from My Oracle Support
Your use of the firmware, software and any other materials contained
in this update is subject to My Oracle Support Terms of Use, which
may be viewed at My Oracle Support.
|
For further information on patching best practices and resources, please
see the following links:
|
Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.
|
Keywords: qmaster scheduler qmon qstat qconf qconf memory usage root security
Synopsis: Obsoleted by: 124519-01 N1 Grid Engine 6.0: maintenance patch
Date: May/04/2006
Install Requirements: See Special Install Instructions
Solaris Release: 7 8 9 10
SunOS Release: 5.7 5.8 5.9 5.10
Unbundled Product: N1 Grid Engine
Unbundled Release: 6.0
Xref: See patch matrix below
Topic:
Relevant Architectures: sparc
Bugs fixed with this patch:
Changes incorporated in this version: 4737342 6287945 6291033 6319223 6359054 6363823 6364440 6365380 6368747 6382156 6383513 6384682 6384698 6384709 6384812 6387206 6389526 6390494 6391238 6397383 6397987 6398008 6398723 6400729 6401993 6407513 6411230 6412215
Patches accumulated and obsoleted by this patch: 118094-07 121956-01
Patches which conflict with this patch:
Patches required with this patch:
Obsoleted by: 124519-01
Files included with this patch:
<install_dir>/bin/sol-sparc/qacct
<install_dir>/bin/sol-sparc/qalter
<install_dir>/bin/sol-sparc/qconf
<install_dir>/bin/sol-sparc/qdel
<install_dir>/bin/sol-sparc/qhost
<install_dir>/bin/sol-sparc/qmake
<install_dir>/bin/sol-sparc/qmod
<install_dir>/bin/sol-sparc/qmon
<install_dir>/bin/sol-sparc/qping
<install_dir>/bin/sol-sparc/qsh
<install_dir>/bin/sol-sparc/qstat
<install_dir>/bin/sol-sparc/qsub
<install_dir>/bin/sol-sparc/qtcsh
<install_dir>/bin/sol-sparc/sge_coshepherd
<install_dir>/bin/sol-sparc/sge_execd
<install_dir>/bin/sol-sparc/sge_qmaster
<install_dir>/bin/sol-sparc/sge_schedd
<install_dir>/bin/sol-sparc/sge_shadowd
<install_dir>/bin/sol-sparc/sge_shepherd
<install_dir>/bin/sol-sparc/sgepasswd
<install_dir>/examples/jobsbin/sol-sparc/work
<install_dir>/lib/sol-sparc/libXltree.so
<install_dir>/lib/sol-sparc/libcrypto.so.0.9.7
<install_dir>/lib/sol-sparc/libdrmaa.so
<install_dir>/lib/sol-sparc/libspoolb.so
<install_dir>/lib/sol-sparc/libspoolc.so
<install_dir>/lib/sol-sparc/libssl.so.0.9.7
<install_dir>/utilbin/sol-sparc/adminrun
<install_dir>/utilbin/sol-sparc/berkeley_db_svc
<install_dir>/utilbin/sol-sparc/checkprog
<install_dir>/utilbin/sol-sparc/checkuser
<install_dir>/utilbin/sol-sparc/db_archive
<install_dir>/utilbin/sol-sparc/db_checkpoint
<install_dir>/utilbin/sol-sparc/db_deadlock
<install_dir>/utilbin/sol-sparc/db_dump
<install_dir>/utilbin/sol-sparc/db_load
<install_dir>/utilbin/sol-sparc/db_printlog
<install_dir>/utilbin/sol-sparc/db_recover
<install_dir>/utilbin/sol-sparc/db_stat
<install_dir>/utilbin/sol-sparc/db_upgrade
<install_dir>/utilbin/sol-sparc/db_verify
<install_dir>/utilbin/sol-sparc/filestat
<install_dir>/utilbin/sol-sparc/fstype
<install_dir>/utilbin/sol-sparc/gethostbyaddr
<install_dir>/utilbin/sol-sparc/gethostbyname
<install_dir>/utilbin/sol-sparc/gethostname
<install_dir>/utilbin/sol-sparc/getservbyname
<install_dir>/utilbin/sol-sparc/infotext
<install_dir>/utilbin/sol-sparc/loadcheck
<install_dir>/utilbin/sol-sparc/now
<install_dir>/utilbin/sol-sparc/openssl
<install_dir>/utilbin/sol-sparc/qrsh_starter
<install_dir>/utilbin/sol-sparc/rlogin
<install_dir>/utilbin/sol-sparc/rsh
<install_dir>/utilbin/sol-sparc/rshd
<install_dir>/utilbin/sol-sparc/sge_share_mon
<install_dir>/utilbin/sol-sparc/spooldefaults
<install_dir>/utilbin/sol-sparc/spooledit
<install_dir>/utilbin/sol-sparc/spoolinit
<install_dir>/utilbin/sol-sparc/testsuidroot
<install_dir>/utilbin/sol-sparc/uidgid
Problem Description:
6412215 encryption and/or decryption of passwords fails because crypto engine is not seeded
6411230 Job Sequence Number got screwed up when restarting qmaster daemon
6407513 Scheduler hangs after a qmaster crash and restart
6401993 qstat -u <user> crashes
6400729 weak authentication and authorization in CSP mode
6398723 Tickets are not reset for running jobs after disabling the ticket policy
6398008 Off-by-one overrun in communication library
6397987 several buffer overruns
6397383 qmaster deadlock when reporting file cannot be written
6391238 qrsh does not accept -o/-e/-j
6390494 qrsh issue with interactive jobs and directory write permissions
6389526 commlib closes wrong connection on SSL error
6387206 CSP revocation lists are not supported
6384812 qstat produces non-well-formed XML output
6384709 slow scheduler performance for jobs with hard queue requests
6384698 schedulers mem use growing, if pe jobs are running
6384682 "qstat -j" aborts
6383513 resource filtering in qselect broken
6368747 Job tickets are not correctly shown in qstat for none running jobs
6365380 possible buffer overflow in sge_exec_job()
6364440 qconf -mhgrp <hostgroup> results in glibc error message and abort
6363823 qsub -w w changes -sync behavior
6319223 subordinate properties lost on qmaster restart
6291033 Unclear share caclulation of running jobs
6287945 Interrupting qrsh while pending does not remove job
4737342 interactive jobs leave behind output/error files if prolog/epilog are run
The following Change Request (CR) is related only to Grid Engine running
under the Microsoft Windows operating system family
6382156 qloadsensor.exe consumes too much time and delivers too high load values
6359054 On hosts with 3 GB RAM, qloadsensor.exe shows only 2GB RAM
(from 121956-1)
6366691 utilbin/<arch>/rsh can be used to gain root access
(from 118094-07)
6355263 reschedule of a parallel job crashes the qmaster
6354164 drmaa does not work on hp11 platform
6354143 mutually subordinating queues suspend each other simultaneously
6353526 reprioritize field in qmon cluster config missing
6351728 installation of qmaster failed when using /etc/services
6349972 DRMAA crashes during some operations on bulk jobs
6349818 an additional started schedd/execd daemons may not stop if started when qmaster is down
6348517 job finish although terminate method is still running
6348516 job finish does not terminate all processes of a job
6348299 qconf -mstree aborts
6346704 qrsh -V doesn't always work.
6346696 connection to Berkeley DB RPC server can timeout
6342005 a scheduler configuration change with a sharetree can result in a usage leak
6339756 Quotes in qtask file can result in memory corruption
6338314 occasional "failed to deliver job" errors due to SIGPIPE in sge_execd
6336519 changing the cwd flag in qmon - qalter has no effect
6333467 sgemaster -migrate may not delete qmaster lock file and may break shadowd functionality
6333407 configuring the halflife_decay_list crashes the qmaster
6332877 qstat -pe filter does not work
6332876 qstat -U does not consider queue access for job and project access for queues
6329832 qconf and qmaster accept invalid settings for queue complex_values
6328703 fstype does not recognize nfs4 share in all cases
6327427 qping core dump with enabled message content dump
6322498 calendar syntax "week mon=0-21" corrupts SGE and may crash qmaster
6320869 sge_qmaster daemon is running on both the master and shadow nodes after a long network failure
6320683 Binary switch reversed in job category and can cause application to hang
6319233 Parsing of context variable options fails for values containing commas in single quotes
6319231 unable to delete a configuration of a non existing host
6319228 Backslash line continuation is broken for host groups
6318660 the system hold on an array task can vanish
6318018 shepherd doesn't handle qrlogin/qrsh jobs correctly
6317048 Memory leaks in drmaa library, japi_wait and drmaa_job2sge_job
6317028 Quotes in job category can result in memory corruption
6316995 qconf -mp prints error messages two times
6315111 doing a qalter -l rsc=val on running jobs breaks consumable debit
6313445 Qrsh tries to free invalid pointer
6307557 qhost returns wrong total_memory value on MacOSX 10.3
6306834 consumables as thresholds are not working correctly with pe jobs
6306229 wrong soft requests decision
6305095 qstat schema files are incomplete
6304490 qconf -as/-ah leads to segmentation fault
6304471 qlogin -R does not work like documented
6304466 qmaster crashes with large number of qconf -aattr calls
6303671 DRMAA can abort in the middle of a session if NIS becomes unavailable
6301047 qstat -s p doesn't show pending array tasks while there are tasks of this job running
6299982 Slow submission rate with drmaa_run_job()
6295791 qacct -h should not resolve hostnames
6294875 CSP: consolidate error output if cert CA on client and server don't match
6294052 suspend threshold is not working for calendar disabled queues
6293411 NFS write error on host <NFS server>: Permission denied.
6292926 qconf -mattr can crash qmaster
6292751 admin mail information is incorrect
6292742 tight integration - qrsh_exit_code file not written
6291023 qstat -j <name> doesn't print delimiter between jobs
6291016 qmon startup and queue add/modify warning messages
6289455 qstat -XML output does not match the schema
6288626 default PATH variable set for job insufficient for non-login shell jobs
6287955 strange reservation
6287946 qconf -[dm]attr gets confused by shortcuts
6287935 qmod -sq can kill a pe job in t state
6287865 qrsh default job names are not consistent with documented job name limitations
6287862 qhost -l for complexes is broken
6287850 Allow SIGTRAP to enable debugging
6287847 qstat -j shows wrong message for parallel jobs which can't be dispatched
6286510 delivery of queue based signals to execd repeated endlessly
6282996 use of IP address as host name disables unique hostname resolving
6275789 soft requirements on load values are ignored
6268799 confusing execd startup messages and delays in case of problems
6256590 qconf -mq disallows 2057 hostspecific profiles in slots configuration
6255111 Binary jobs are problematic for starter and epilog scripts
6253860 First character is lost in quoting
6250692 accounting(5) record can't be made available immediately after job finish
6242169 Multi-threaded, multi-CPU username problems
6207868 wording with qconf -cq should be changed
6287953 repeated logging of the error message: "failed building category string for job N"
The following Change Request (CR) is related only to Grid Engine running
under the Microsoft Windows operating system family
6353638 default process priority on windows freezes the whole system until job is finished
6348478 install script sets rsh_daemon to /usr/sbin/in.rshd on win32-x86
6314019 qloadsensor.exe uses up more and more handles
6279523 qlogin on windows does not work!
(from 118094-06)
6299939 distribution should contain all Berkeley DB utilities
6299351 qrsh fails when execd_param INHERIT_ENV=false and no ARC set in sge_execd environment
6299345 No error messages in case SSL initialization failes
6298233 no user notification or command hanging if an immediate job cannot be scheduled
6298056 INHERIT_ENV and SET_LIB_PATH are not reset by setting execd_params to NONE
(from 118094-05)
6295165 finished array job tasks can be rescheduled if master/scheduler daemons are stopped/started
6294397 wrong drmaa jnilib link on MacOS
6288588 jobs submitted with -v PATH do not retain $TMPDIR prefixed by N1GE as required for tight integration
6288156 sge_shepherd SEGV's when it tries to fopen the usage file
6287958 suspend not working under Mac OS X
6287867 tight integration: temporary files are not deleted at task exit
6286533 job wallclock monitoring and enforcement considers prolog/epilog runtime part of net job runtime
6285898 qconf -Xattr does not resolve fqdn hostnames
6283308 overhead with job execution could lead to overoptimistic backfilling and break resource reservation
6281462 qmaster profiling can only be turned on by restarting qmaster
6281440 resource allocation shown by qstat/qhost not consistent with resource utilization
6280698 Resource filtering with qhost broken
6279409 qconf -tsm command generates too much data (very large schedd_runlog file)
6279402 drmaa_exit() causes qmaster error logging if host is no admin host
6278727 qstat -xml -urg output contains badly formatted numbers
6278147 drmaa_job_ps() returns DRMAA_PS_QUEUED_ACTIVE for finished array job rather than DRMAA_PS_DONE
6277909 qconf -mq coredumps
6274467 qmon kills a system
6273217 race condition with qsub -sync and drmaa_wait() if job exits directly after being submitted
6273006 qstat -j "" results in a segmentation fault
6269411 Close integration cause jobscripts with multiple mprun commands to be killed.
6269305 qrsh/qsh/qlogin reject -js option
6268707 job_load_adjustements is not correctly working when parallel jobs are submitted.
6267932 high CPU load of qmaster even on empty cluster
6267245 Repeated logging of the same message produces giant logging files
6267238 Multithreaded DRMAA may crash due to use of sge_strtok()
6266450 performace bottleneck with subordinate list
6266392 Performance problem with qconf -mattr exechost XX XX global
6265154 Wildcards in PE Name Cause Unusual Behavior
6264592 drmaa_control(DRMAA_JOB_IDS_SESSION_ALL, DRMAA_CONTROL_SUSPEND|RESUME) returns INVALID_JOB error
6260656 incomplete resource reservation with array jobs
6252525 qmon: complex attributes not removeable
6252469 missleading qstat -j messages in case of resource reservation
6250603 qmon crash (segmentation fault) on Solaris64
6218877 qstat -t is broken
4769608 qalter shows wrong priority number when using negative priorities with -p option
The following Change Request (CR) is related only to Grid Engine running
under the Microsoft Windows operating system family
6239470 Avoid that sge_execd has to be started by the Domain Administrator
(from 118094-04)
6260729 Can't select 'slots' in select box when adding consumables for execution host
6260024 qmon cluster queue modify cancel not working correct
6259380 potential qmaster sec. fault.
6256530 cqueues/all.q trashed after qmaster shutdown with 1362 hosts
6256457 pe jobs disappear in t state (execd doesn't know this job)
6255902 qmake in dynamic allocation mode core dump
6255850 the usage in projects is never spooled while the qmaster
6255804 job in error state breaks qstat -f -xml
6255336 execd does sends empty job report for a pe slave task
6255329 qmaster does not store sharetree usage on shutdown
6253266 failed array tasks are rescheduled only one by one
6253093 qstat -f -pe make breaks
6252524 Missing success message with qconf -Aprj
6252465 qsub option parameter string only supports 2048 character strings
6251943 japi does not work with host aliasing
6251172 reserved jobs prevent other jobs from starting
6247889 qsub -sync y return code behaviour broken
6247239 sequence nr of execd load reports corrupted
6247238 qsub fails to work correctly with -b n -cwd
6247211 qstat -explain E does not print queue errors correctly
6245487 qhost -h <hostname> does not show selected host
6244865 a series of matching soft queue requests gets not counted separately
6244808 scheduler does not get all objects on a qmaster or scheduler startup
6244229 misleading qstat -j message when the scheduler is not running
6244215 qsub -b y must fail if no command is specified
6242779 qsub -now yes not working on CSP system
6242181 Failed drmaa_control (DRMAA_CONTROL_TERMINATE) causes deadlock
6242172 Multi-threaded args parsing problems
6242165 Profiling library never frees thread slots
6242057 jobs which request consumable resources which are set to infinity are not scheduled
6242055 Consumable request may not be 0 if PE requested
6241544 qstat -F dies in case of a infinit integer setting
6241487 termination script may not be ignored, when job submited with -notify
6241430 error message "no execd known on host"
6241401 Conflicting requirements should have the same meaning with qstat and qsub
6241378 Reservation of wrong hosts
6241376 qstat -U aborts
6240739 qstat -s hu shows pending jobs only
6239660 qmaster profiling doesn't start at qmaster startup
6239569 qmaster does not accept new connections if number of execd's exceed FD_SETSIZE
6239394 Spooledit fails during database upgrade
6236475 DRMAA segfaults with > 255 threads
6236472 qsub -sync y doesn't remove session directories
6236469 JAPI: Can be made to start two event client threads
6236261 BDB install on NFSv4 share
6234836 Need a means to purge host or hostgroup specific cluster queue
6234371 error message from execd about endpoint is not unique
6233162 global scheduler messages are reported multiple times
6232074 load formula is not working for pe jobs
6231366 deadlock in the qmaster due to qconf -k[s|e]
6230846 execd logs error mesage, when a tight pe job in "t" state is deleted
6229373 An array pe job can set queues into error state
6229277 qselect uses sge_qstat file
6229253 a parallel array job can kill the qmaster
6228786 Long delay when starting up large pe jobs
6228350 Execd messages file contains incorrectly-formatted lines
6226085 suspend_interval is ignored when enabling jobs due to suspend_thresholds change
6225570 sharetree has a usage leak
6222930 After shadowd takes over there is a long delay before execd connects to new qmaster
6222861 error message "no execd known on host"
6222811 scheduler can get out of sync
6222237 huge CPU and memory overhead when modifiying complex attributes
6221244 releasing user hold state through qrls may not require manager priviledges
6221231 qsub -sync y return code behaviour broken
6221167 sge_schedd segfaults in case of a restart and a running pe job.
6220060 wrong calendar settings kills the qmaster
6219999 changing of local execd_spool_dir is fault prone
6218430 Problems with load values if execution daemons run in a solaris zone at x86
6215730 qdel failed to delete qrsh (login) job on a Solaris box when Secure Shell is used
6205060 SGE tools segfault when gid can't be looked up
6199256 qconf -[a|A|m|M]stree kills qmaster
6194719 starter_method is ignored with binary jobs that are started without a shell
6186597 qconf error diagnosis broken
6178843 qconf changes to complex doesn't display all the changes made upon exit
5085004 qstat -f -q all.q@HOSTNAME does not resolve hostname
(from 118094-03)
6216020 pending job task deletion may not work
6215580 execd messages file contains errors for tight integrated jobs
6211309 qmaster running out of file descriptors
6211243 The qstat -ext -xml command is broken with N1GE6 Update 2 patch
6205648 error in commlib read/write timeout handling
(from 118094-02)
6201042 qdel "*" produces error logging in qmaster messages file
6201040 Exit 99 jobs are not rescheduled to hosts where they ran before
6201039 qconf -ks gives bad error message if scheduler isn't running
6201038 reduce the impact of qstat on the overall performance
6201033 qmaster might fail if jobs are deleted which have multiple hold states applied
6200013 arch script does not know about /lib64
6199261 a sharetree delete can kill qmon
6196578 backup failes, when...
6195249 QMON Cluster Queue Window: Heading line words does not match into column width
6194729 Subordinate queue thresholds are not spooled with BDB
6194713 Only first subordinate queue will be suspended at qmaster restart
6194625 subordinate queues consume excessive memory
6194002 sgemaster -migrate on qmaster host tries to start second qmaster
6193866 backup/restore does not work under Linux and others..
6193361 Jobs fail in case of NFS execd installation on volumes exported without root write priviledges
6193348 qconf -mq does not output the subordinate_list correct
6191366 tightly integrated pe jobs: scheduler doesn't respect usage of pe tasks in sharetree calculation
6190164 too many array tasks are deleted
6189289 a cluster queue can be deleted, even though it is referenced in an other cq
6189286 memory leak in the scheduler with consumables as load thresholds
6185211 Job environments should not include Grid Engine dynamic library path
6185208 qmon and equal job arguments
6185169 qmon returns an error dialog, when editing a calendar
6185136 Job customize shows weird characters for fields, additional fields cannot be added
6184466 scheduler does not look ahead to consider queue calendars state transitions
6184460 qmod -[d|e] cannot handle the folowing qnames: "[0-9]*"
6183365 qconf -sstree gives a SIGBUS error
6180529 meaningless job error state diagnosis text in qstat -j
6176181 qdel "" kills qmaster
6176177 restoring a backup does not restore the job_scripts dir.
6176115 Show qmaster/execd application status in qping
6174915 qconf has wrong exit status
6174821 segmentation fault when vmemsize limit is reached
6174331 Option "-v VAR" does not fetch from envrionment
6174326 qconf -sq displayes "slots" in the complex_values line
6174301 N1GE6: qsub -js and negative job_share numbers acts strangely/unexpectedly.
5108639 qconf -sstree seg faults with large share trees
5108635 $ARCH required in path for qloadsensor and qidle.
5104789 mail sent by qmaster leaves zombie processes
5104270 Cannot add calendar with \ syntax
5102442 qconf -de <live_exec_host> crashes qmaster
5102320 memory leak in the scheduler, with pe jobs and resource requests
5097732 Need detailed error messages from communication layer
5095907 qacct -l is not working
5094016 o-tickets assigned to departments are ignored
5092487 hard resource requests ignored in parallel jobs
5090162 qmake does not export shell env. vars
5089255 Submit to a queue domain is never scheduled
5089222 scheduling weirdness with wild-card PE's
5086108 wrong message appears when queue instance becomes error state
5085010 qmon customize filter for running jobs does not filter
5075968 Thread enabled commlib coredumps on exit on a 32bit Solaris x86 box
(from 118094-01)
5085392 qstat -j -xml generates no parseble xml output
5084317 Invalid job_id's in reporting file (only l24_amd64)
5083115 Need more verbose diagnosis msg if execd port is already bound
5083102 hostgroup changes do not always take effect.
5082490 qstat -ext -urg omits time info
5081839 qconf -ahgrp fails if no hgrp name is specified
5081822 Deleting a queue instance slots value actually adds it
5081821 qstat XML output typo
5080856 QCONF: qconf -mc segfaults
5080853 DRMAA doesn't reject jobs that never will be dispatchable
5080852 qconf -aq <queue>@<host> crashes qmaster
5080851 qalter/qdel/qmod abort
5080840 problems when qconf -mattr is used in conjunction with host_aliases file
5080839 qconf -mq displayes "slots" in the complex_values line
5080836 qhosts outputs NCPU as float
5080833 qconf -mattr dumps core if used incorrectly
5080784 qselect crash
5080779 qconf -de host does not update the host groups
5079572 Resending queue signals broken
5079514 execd shutdown with sgeexecd fails when host aliases are used
5078783 Wallclock time limit in qmon
5077589 schedd and qmaster get out of sync - no scheduling for long time
5077549 qsub -N "@" causes qmaster down
5077165 reprioritize_interval descr in sched_conf(5) needs improvemen
5076491 qmaster clients may not reconnect after qmaster outage
5076372 "|" should be able to be used with qsub -N
5076358 It shuld be used "." and "$" with qsub -N
5075936 qmon's queue filtering doesn't work
5075849 a registering event client can get events before it got its total update
5075451 sched_conf(5) reprioritize_interval should default to 0
5075398 variable syntax : equal sign support
5075346 Sharetree doesn't work correct
5074788 jobs on hold due to -a time cause qmaster/schedd get out of sync
5073218 qconf -aq <queue>@<host> crashes qmaster
5072772 sge_qmaster constantly rewrites spool files of tightly integrated parallel jobs
5072481 Deleted pending job appears in qstat
5072005 drmaa_run_job() may change the current directory
5071987 Qmaster requires a local conf in order to start.
5071918 qmod -e '@<host>' causes segmentation fault in qmaster
5071914 scheduler ignores queue seqno for queue sorting
5071539 qping doesn't support host_aliases file
5071525 qalter abort
5071522 Startup of qmaster changes act_qmaster to `hostname`
5071502 calendars broken
5071498 projects not available after sge_qmaster restart
5063987 qmaster cannot bind port below 1024 on Linux
5063316 PE job submit error, when qmaster is busy
5063311 high memory usage of schedd and qmaster (schedd_job_info)
Patch Installation Instructions:
--------------------------------
For Solaris 7, 8, 9 and 10 releases, refer to the man pages for instructions
on using 'patchadd' and 'patchrm' scripts provided with Solaris. Any other
special or non-generic installation instructions should be described below
as special instructions. The following example installs a patch to a
standalone machine:
example# patchadd /var/spool/patch/104945-02
The following example removes a patch from a standalone system:
example# patchrm 104945-02
For additional examples please see the appropriate man pages.
See the "Special Install Instructions" section below before installing this
patch.
Patch requirements and patch matrix for N1 Grid Engine 6 packages
-----------------------------------------------------------------
The patches below update a N1 Grid Engine 6 distribution to N1 Grid
Engine 6 Update 8 (N1GE 6.0u8). The "-help" output of most commands will
print a version string "N1GE 6.0u8" after applying the patch.
All packages of a N1 Grid Engine 6 distribution must have the same patch
level (exception for ARCo - see requirements for ARCo in the ARCo patches
README's). Please refer to the patch matrix below which updates the
distribution to most recent patch level.
It is not supported and possible to mix different patch levels of
binaries and the "common" package in a single N1 Grid Engine cluster.
1. Patches for packages in Sun pkgadd format
--------------------------------------------
Package name* OS* Architecture* Patch-Id
-----------------------------------------------------------------
SUNWsgee Solaris, Sparc, 32bit sol-sparc 123037-01
SUNWsgeex Solaris, Sparc, 64bit sol-sparc64 123038-01
SUNWsgeex Solaris x86 sol-x86 123039-01
SUNWsgeeax Solaris, x64 (AMD64) sol-amd64 123040-01
SUNWsgeec all common 118132-08
SUNWsgeea all arco 118133-06
SUNWsgeed all doc 119846-02
*Package Name = see pkginfo(1)
*OS = Operating system
*Architecture = N1 Grid Engine binary architecture string or
"common" = architecture independent packages
"arco" = Accounting and Reporting console
"doc" = PDF documentation
"gemm" = Grid Engine Management Module for Sun
Control Station (SCS) (tar.gz only)
2. Patches for packages in tar.gz format
----------------------------------------
OS* Architecture Patch-Id
-----------------------------------------------------
Solaris, Sparc, 32bit sol-sparc 123041-01
Solaris, Sparc, 64bit sol-sparc64 123042-01
Solaris, x86 sol-x86 123043-01
Solaris, x64 (AMD64) sol-amd64 123044-01
Linux kernel2.4/2.6, x86 lx24-x86 123045-01
Linux kernel2.4/2.6, AMD64 lx24-amd64 123046-01
IBM AIX 4.3 aix43 123047-01
IBM AIX 5.1 aix51 123048-01
Apple MAC OS/X darwin 123049-01
HP HP-UX 11 hp11 123050-01
SGI Irix 6.5 irix65 123051-01
Microsoft Windows win32-x86 123052-01
all common 118092-08
all arco 118093-06
all doc 119861-02
Solaris, Linux gemm 120435-02
Special Install Instructions:
-----------------------------
Content
-------
Patch Installation
Stopping the N1 Grid Engine cluster to start jobs
Shutting down the N1 Grid Engine daemons
Installing the patch and restarting the software
New functionality delivered with N1GE 6.0 Update 7
Reworked "qstat -xml" output
Reworked PE range matching algorithm in the scheduler
New monitoring feature in qmaster
New parameter for specialized job deletion
New reporting parameter to control accounting file flush time
New functionality delivered with N1GE 6.0 Update 6
Berkeley DB database tools are included in the distribution
New functionality delivered with N1GE 6.0 Update 4
New "qconf -purge" option
Berkeley DB spooling on NFSv4 under Solaris 10 supported
Execd installation in Solaris 10 zones supported
Faster execution daemon reconnect in CSP mode
New functionality delivered with N1GE 6.0 Update 2
Avoid setting of LD_LIBRARY_PATH
DRMAA Java[TM] language binding delivered with this patch
New qstat options to optimize memory overhead and speed of qstat
Tuning parameter for sharetree spooling
Patch Installation
------------------
NOTE: This patch requires that you update your Berkeley DB database files
if you are upgrading from N1GE 6.0u1 or 6.0. Please read the full
notes when applying this patch.
These installation instructions assume that you are running a homogenous
N1 Grid Engine cluster (called "the software") where all hosts share the
same directory for the binaries. If you are running the software in a
heterogenous environment (mix of different binary architectures), you
need to apply the patch installation for all binary architectures as well
as the "common" and "arco" packages. See the patch matrix above for
details about the available patches.
If you installed the software on local filesystems, you need to install
all relevant patches on all hosts where you installed the software
locally.
By default, there should by no running jobs when the patch is installed.
There may pending batch jobs, but no pending interactive jobs (qrsh,
qmake, qsh, qtcsh).
It is possible to install the patch with running batch jobs. To avoid a
failure of the active 'sge_shepherd' binary, it is necessary to move the
old shepherd binary (and copy it back prior to the installation of the
patch).
You can not install the patch with running interactive jobs, 'qmake' jobs
or with running parallel jobs which use the tight integration support
(control_slaves=true in PE configuration is set).
A. Stopping the N1 Grid Engine cluster to start jobs
----------------------------------------------------
Disable all queues so that no new jobs are started:
# qmod -d '*'
Optional (only needed if there are running jobs which should continue to
run when the patch is installed):
# cd $SGE_ROOT/bin
# mv <arch>/sge_shepherd <arch>/sge_shepherd.sge60
It is important that the binary is moved with the "mv" command. It should
not be copied because this could cause the crash of an active shepherd
process which is currently running job when the patch is installed.
B. Shutting down the N1 Grid Engine daemons
-------------------------------------------
You need to shutdown (and restart) the qmaster and scheduler daemon and
all running execution daemons.
Shutdown all your execution hosts. Login to all your execution hosts and
stop the execution daemons:
# /etc/init.d/sgeexecd softstop
Then login to your qmaster machine and stop qmaster and scheduler:
# /etc/init.d/sgemaster stop
Now verify with the 'ps' command that all N1 Grid Engine daemons on all
hosts are stopped. If you decided to rename the 'sge_shepherd' binary so
that running jobs can continue to run during the patch installation, you
must not kill the 'sge_shepherd' binary (process).
C. Installing the patch and restarting the software
---------------------------------------------------
Now install the patch by installing the patch with "patchadd" or by
unpacking the 'tar.gz' files included in this patch as outlined above.
Berkeley DB database update needed
----------------------------------
NOTE: This update is *not* needed if you already installed N1GE 6.0u3
or higher. The update is only needed if you are upgrading from
N1GE 6.0u2 or earlier.
After installing this patch, and before restarting your cluster you
need to update your Berkeley DB (BDB) database in the following cases:
- you choose the BDB spooling option (not needed for classic
spooling) either locally or with the BDB RPC option, and you are
upgrading your cluster for N1 Grid Engine 6.0 or 6.0u1 to N1 Grid
Engine 6.0u2 or higher
1. For safety reasons, please make a full backup of your existing
configuration. To perform a backup use this command
% inst_sge -bup
2. Upgrade your BDB database. This is done as follows:
% inst_sge -updatedb
Restarting the software
-----------------------
Please login to your qmaster machine and execution hosts and enter:
# /etc/init.d/sgemaster
# /etc/init.d/sgeexecd
After restarting the software, you may again enable your queues:
# qmod -e '*'
If you renamed the shepherd binary, you may safely delete the old
binary when all jobs which where running prior the patch installation
have finished.
New functionality delivered with N1GE 6.0 Update 7
--------------------------------------------------
1. Reworked "qstat -xml" output
-------------------------------
The schema for "qstat -xml" and the "qstat -xml" output have been
reworked to ensure consistency between them and easy parsing of them via
JAXB. The most noticeable change will the date output. It follows now the
XML datetime format.
2. Reworked PE range matching algorithm in the scheduler
--------------------------------------------------------
The PE range matching algorithm is now adaptable and learns from the past
decisions. This will lead to a much faster scheduling decision in case of
pe-ranges. This can be controlled by a new scheduling configuration
parameter: SELECT_PE_RANGE_ALG. It allows to restore the old behavior.
See sge_conf(5) for more information.
3. New monitoring feature in qmaster
------------------------------------
The monitoring allows to get detailed statistics what the qmaster
threads are doing and how busy they are. The statistics can be accessed
via "qping -f" or from the qmaster messages file. The feature is controlled
by two qmaster configuration parameters:
MONITOR_TIME specifying the time interval for the statistics
LOG_MONITOR_MESSAGE enables/ disables the logging of the monitoring
messages into the qmaster messages file.
See sge_conf(5) for more information.
4. New parameter for specialized job deletion
---------------------------------------------
A new "execd_param" (configured in the global cluster configuration):
ENABLE_ADDGRP_KILL=true
can be configured to enable addition code within the execution host to
delete jobs. If this parameter is set then the supplementary group id's
are used to identify all processes which are to be terminated when a job
should be deleted. It has only effect for following architectures:
sol*
lx*
osf4
tru64
See sge_conf(5) under "gid_range" for more information.
5. New reporting parameter to control accounting file flush time
----------------------------------------------------------------
A new reporting parameter, "accounting_flush_time", controls the flush
period for the accounting file. Previously, both the accounting and
reporting files were flush at the same interval. Now they can be set
independently. Additionally, buffering of the accounting file can now be
disabled, allowing accounting data to be written to the accounting file
as soon as it becomes available.
See sge_conf(5) for more information.
New functionality delivered with N1GE 6.0 Update 6
--------------------------------------------------
1. Berkeley DB database tools are included in the distribution
--------------------------------------------------------------
All Berkeley DB database tools are now part of the N1 Grid Engine
distribution (not for Microsoft Windows platform)
db_archive
db_checkpoint
db_deadlock
db_dump
db_load
db_printlog
db_recover
db_stat
db_upgrade
db_verify
The HTML documentation for these tools is part of the "common" patch and
can be found in:
<sge_root>/doc/bdbdocs
New functionality delivered with N1GE 6.0 Update 4
--------------------------------------------------
1. New "qconf -purge" option
----------------------------
"qconf -purge" deletes all hosts or hostgroups settings from a cluster
queue. This facilitates the uninstallation of host or hostgroups. See
qstat(1) for more a description how to use this parameter
2. Berkeley DB spooling on NFSv4 under Solaris 10 supported
-----------------------------------------------------------
The Berkeley DB database now can be installed on a NFSv4 mounted
filesystem on Solaris 10.
For performance reasons it is recommended to use NFSv4 BDB spooling only
when the NFSv4 mount provides an excellent high speed connection to the
file server.
3. Execd installation in Solaris 10 zones supported
---------------------------------------------------
The execution daemon installation in Solaris 10 zones is supported. If an
execution daemons is installed in the global zone and in local zones you
need to ensure that the additional group id range (-> "gid_range" in
cluster configuration) from the global zone and the local zones does not
overlap. Local zones may use the same additional group id range in the
same host.
4. Faster execution daemon reconnect in CSP mode
------------------------------------------------
The Certificate Security Protocol (CSP) has been reworked and now is fully
integrated in the communication library layer. The allows a faster
reconnect of execution daemons after qmaster or execution daemon restart.
New functionality delivered with N1GE 6.0 Update 2
--------------------------------------------------
1. Avoid setting of LD_LIBRARY_PATH; inherited job environment
--------------------------------------------------------------
There are two new "execd_params" (defined in the global or local cluster
configuration) which control the environment inherited by a job:
SET_LIB_PATH
INHERIT_ENV
By default, SET_LIB_PATH is false and INHERIT_ENV is true. If
SET_LIB_PATH is true and INHERIT_ENV is true, each job will inherit the
environment of the shell that started the execd, with the N1GE lib
directory prepended to the lib path. If SET_LIB_PATH is true and
INHERIT_ENV is false, the environment of the shell that started the execd
will not be inherited by jobs, and the lib path will contain only the
N1GE lib directory. If SET_LIB_PATH is false and INHERIT_ENV is true,
each job will inherit the environment of the shell that started the execd
with no additional changes to the lib path. If SET_LIB_PATH is false and
INHERIT_ENV is false, the environment of the shell that started the execd
will not be inherited by jobs, and the lib path will be empty.
Environment variables which are normally overwritten by the shepherd,
such as PATH or LOGNAME, are unaffected by these new parameters.
2. DRMAA Java[TM] language binding delivered with this patch
------------------------------------------------------------
The DRMAA Java language binding is now available. The DRMAA Java language
binding library and documentation is contained in the patch for the
"common" package.
3. New qstat options to optimize memory overhead and speed of qstat
-------------------------------------------------------------------
The qstat client command has been enhanced to reduce the overall amount
of memory which is requested from the qmaster. To enable these changes it
is necessary to change the qstat default behavior. This is possible by
defining a cluster-global or user-specific sge_qstat file. More
information can be found in sge_qstat(5) manual page. In addition two new
qstat options ("-u" and "-s") have been introduced to be used with the
sge_qstat default file. Find more information in qstat(1).
4. Tuning parameter for sharetree spooling
------------------------------------------
A new "qmaster_param" (configured in the global cluster configuration):
STREE_SPOOL_INTERVAL=<time>
can be configured to control the interval for how often the sharetree
usage is spooled. The interval can be set to any time in the following
formats:
HH:MM:SS or
<int>
E.g.:
STREE_SPOOL_INTERVAL=0:05:00
STREE_SPOOL_INTERVAL=300
This parameter is a tuning parameter only. It has the biggest effect on a
system using classic spooling and bigger sharetrees and a slow
filesystem.
README -- Last modified date: Tuesday, August 11, 2015