Article 1

April 21, 2020, 3:16 am

Novalink

Novalink is a sort of "replacement" of the HMC. In a usual installation all Openstack services (Neutron, Cinder, Nova etc.) were running on the PowerVC host. For example the Nova service required 1 process for each Managed System:

# ps -ef | grep [n]ova-compute
nova 627 1 14 Jan16 ? 06:24:30 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10D5555.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10D5555.log
nova 649 1 14 Jan16 ? 06:30:25 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_65E5555.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_65E5555.log
nova 664 1 17 Jan16 ? 07:49:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1085555.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1085555.log
nova 675 1 19 Jan16 ? 08:40:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_06D5555.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_06D5555.log
nova 687 1 18 Jan16 ? 08:15:57 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6575555.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6575555.log

Beside the extra load, all PowerVC actions had to go through the HMC. PowerVC and HMC were single point of contact for every action, and this could cause slowness in large environments . In 2016 IBM came up with a solution, that a special LPAR on each Managed System could do all those actions what usually an HMC would do. This special LPAR is called Novalink. So it means if this special LPAR is created on all Managed Systems, then PowerVC will stop querying the HMC and will query dierctly the Novalink LPARs, where additionally some Openstack services are alos running (Nova, Neutron, Ceilometer). It is a Linux LPAR (currently Ubuntu or RHEL) which has a CLI and an API.

---------------------------------------------------------------------

Novalink Install/Update

+--------------------------------+ Welcome +---------------------------------+
| |
| Welcome to the PowerVM NovaLink Install wizard. |
| |
| (*) Choose to perform an installation. |
| This will perform an installation of the NovaLink partition, its core |
| components and REST APIs, and all needed Virtual I/O servers. |
| |
| ( ) Choose to repair a system. |
| This will repair the system by performing a rescue/repair of existing |
| Virtual I/O servers and NovaLink partitions. |
| Choose this option if PowerVM is already installed but is corrupted |
| or there is a failure. |
| |
| |
| <Next> <Cancel> |
| |
+----------------------------------------------------------------------------+
<Tab>/<Alt-Tab> between elements | <Space> selects | <F12> next screen

Novalink is a standard LPAR whose I/O is provided by the VIOS (therefore no physical I/O is required) with a special permission bit to enable PowerVM Management authority. If you install the NovaLink environment on a new managed system, the NovaLink installer creates the NovaLink partition automatically. It creates the Linux and VIOS LPARs and installs the operating systems and the NovaLink software. It creates logical volumes from the VIOS rootvg for the NovaLink partition. (The VIOS installation files (extracted mksysb files from VIOS DVD iso) needs to be added to the Novalink installer manually: https://www.ibm.com/support/knowledgecenter/POWER8/p8eig/p8eig_creating_iso.htm)

If you install the NovaLink software on a system that is already managed by a HMC, use the HMC to create a Linux LPAR and set the powervm_mgmt_capable flag to true. (the NovaLink partition must be granted the capability of PowerVM management)
$ lssyscfg -m p850 -r lpar --filter "lpar_ids=1"
name=novalink,lpar_id=1,lpar_env=aixlinux,state=Running,resource_config=1,os_version=Unknown,logical_serial_num=211FD2A1,default_profile=default,curr_profile=default,work_group_id=none,shared_proc_pool_util_auth=0,allow_perf_collection 0,power_ctrl_lpar_ids=none,boot_mode=norm,lpar_keylock=norm,auto_start=1,redundant_err_path_reporting=0,rmc_state=active,rmc_ipaddr=129.40.226.21,time_ref0,lpar_avail_priority=127,desired_lpar_proc_compat_mode=default,curr_lpar_proc_compat_mode=POWER8,suspend_capable=0,remote_restart_capable0,simplified_remote_restart_capable=0,sync_curr_profile=0,affinity_group_id=none,vtpm_enabled=0,powervm_mgmt_capable=0
$ chsyscfg -m seagull -r lpar -i lpar_id=1,powervm_mgmt_capable=1

powervm_mgmt_capable flag is valid for Linux partitions only:
0 - do not allow this partition to provide PowerVM management functions
1 - enable this partition to provide PowerVM management functions

PowerVM NovaLink by default installs Ubuntu, but also supports RHEL. The installer provides an option to install RHEL after the required setup or configuration of the system completes. For easier installation of PowerVM NovaLink on multiple servers, set up a netboot (bootp) server to install PowerVM NovaLink from a network.

Installation log files are in /var/log/pvm-install and the NovaLink installer creates an installation configuration file /var/log/pvm-install/novalink-install.cfg (which can be used if we need to restore Novalink partition). Updating PowerVM NovaLink is currently driven entirely through Ubuntu’s apt package system

---------------------------------------------------------------------

Novalink and HMC

NovaLink provides a direct connection to the PowerVM server rather than proxying through an HMC. For example a VM create request in PowerVC goes directly to NovaLink, which then communicates with PowerVM. This allows improved scalability (from 30 to 200+ servers), better performance, and better alignment with OpenStack.

Hosts can be managed by NovaLink only (without HMC), or can be co-managed (Novalink and HMC together). In this co-managed setup either NovaLink or the HMC is the master. Both of them have read-access to partition configuration, but only the master can make changes to the system. Typically NovaLink will be the co-management master, however if a task has to be done from the HMC (like firmware upgrade), we can explicitly request master authority to the HMC, perform the action, and then give back the authority to NovaLink.

HMC: saves the LPAR configuration in the FSP NVRAM also uses FSP lock mechanism and receives event from FSP/PHYP
NovaLink: receives events from PHYP, it is not aware of FSP, does not receive FSP events

In co-management mode there are no partition profiles. In OpenStack, the concept of a flavor is similar to profiles, and these are all managed by OpenStack, not the HMC or NovaLink. For example, you can activate a partition with the current configuration, but not with a profile.

To update the firmware on a system that is managed by only NovaLink, use the ldfware command on the service partition. If the system is co-managed by NovaLink and HMC, firmware updates can be performed only from the HMC. The HMC must be set to the master mode to update the firmware. After firmware update is finished master mode can be set back to Novalink. (The current operation has to be finished before the change completes, or force option is also possible.)

In HMC CLI:
$ chcomgmt -m <managed_system> -o setmaster -t norm <--set HMC to be master on the specified Man. Sys.
$ chcomgmt -m <managed_system> -o relmaster <--set Novalink to be master again

In Novalink CLI:
$ pvmctl sys list -d master <--list master (-d: display)
$ pvmctl <managed_system> set-master <--set Novalink to be master

---------------------------------------------------------------------

Novalink partition and services

NovaLink is not a part of the PowerVC, but the two technologies work closely together. If NovaLink is installed on a host, even if an HMC is connected to it, PowerVC must manage that host through the NovaLink partition. The Novalink LPAR (with the installed software packages) provides Openstack services and it can perform virtualization tasks in the PowerVM/Hypervisor layer. The following OS packages are providing these functions in NovaLink:
-ibmvmc-dkms: this is the device driver kernel module that allows NovaLink to talk to the Hypervisor
-pvm-core: this is the base novalink package. It primarily provides a shared library to the REST server.
-pvm-rest-server: this is the java webserver used to run the REST API service
-pvm-rest-app: this is the REST APP that provides all the REST APIs and communicates with pvm-core
-pypowervm: pypowervm library provides a Python-based API wrapper for interaction with the PowerVM API
-pvm-cli: this provides the python based CLI (pvmctl)

A meta package called pvm-novalink ensures dependencies between all these packages. When updating, just update pvm-novalink and it will handle the rest.

NovaLink contains two system services that should always be running:
- pvm-core
- pvm-rest

If you are not able to complete tasks on NovaLink, verify whether these services are running. Use the systemctl command to view the status of these services and to stop, start, and restart these services. (Generally restarting pvm-core will cause pvm-rest to also restart.)
# systemctl status pvm-core / pvm-rest
# systemctl stop pvm-core / pvm-rest
# systemctl start pvm-core / pvm-rest
# systemctl restart pvm-core / pvm-rest

With these installed packages NovaLink provides 2 main services: Openstack services and Novalink Core services:

OpenStack Services
- nova-powervm: Nova is the compute service of Openstack. This handles VM managements (creating VMs, add/remove CPU/RAM...)
- networking-powervm: this is the network service of OpenStack (Neutron). Provides functions to manage SEA, VLANs ...
- ceilometer-powervm: Ceilometer is the monitoring service of Openstack. Collects monitoring data for CPU, network, memory, and disk usage

These services are using the pypowervm library, which is a python based library that interacts with the PowerVM REST API.

NovaLink Core Services
These services are communicating with the PHYP and the VIOS, these provide direct connection to the managed system.
- REST API: It is based on the API that is used by the HMC. It also provides a python-based software development kit.
- CLI: It provides shell interaction with PowerVM. It is based on python as well.

---------------------------------------------------------------------

RMC with PowerVM NovaLink

RMC connection between NovaLink and each LPAR is routed through a dedicated internal virtual switch (mandatory name is MGMTSWITCH) and the virtual network is using the PVID 4094.

It uses an IPv6 link, and VEPA mode has to be configured, so LPARs can NOT communicate directly to each other, network traffic will go out to the switch first. After it is configured correctly NovaLink and the client LPARs can communicate for DLPAR and mobility. The minimum RSCT version to use RMC with Novalink is 3.2.1.0. The management vswitch is required for LPARs deployed using PowerVC, however the HMC can continue using RMC through the existing mechanisms.

The LPARs are using virtual Ethernet adapters to connect to NovaLink through a virtual switch. The virtual switch is configured to communicate only with the trunk port. An LPAR can therefore use this virtual network only to connect with the NovaLink partition. LPARs can connect with partitions other than the NovaLink partition only if a separate network is configured for this purpose.

---------------------------------------------------------------------

Novalink CLI (pvmctl, viosvrcmd)

The NovaLink command-line interface (CLI) is provided by the Python based pvm-cli package. It uses the pvmctl and viosvrcmd commands for most operations. Execution of the pvmctl command is logged in the file /var/log/pvm/pvmctl.log and commands can only be executed by users who are in the pvm_admin group. The admin user (i.e. padmin) is added automatically to the group during installation.

pvmctl

It runs operations against an object: pvmctl OBJECT VERB

Supported OBJECT types:
ManagedSystem (sys)
LogicalPartition (lpar or vm)
VirtualIOServer (vios)
SharedStoragePool (ssp)
IOSlot (io)
LoadGroup (lgrp)
LogicalUnit (lu)
LogicalVolume (lv)
NetworkBridge (nbr or bridge)
PhysicalVolume (pv)
SharedEthernetAdapter (sea)
VirtualEthernetAdapter (vea or eth)
VirtualFibreChannelMapping (vfc or vfcmapping)
VirtualMediaRepository (vmr or repo)
VirtualNetwork (vnet or net)
VirtualOpticalMedia (vom or media)
VirtualSCSIMapping (scsi or scsimapping)
VirtualSwitch (vswitch or vsw)

Supported operations (VERB) example:
logicalpartition (vm,lpar) supported operations: create, delete, list, migrate, migrate-recover, migrate-stop, power-off, power-on, restart, update
IOSlot (io) supported operations: attach, detach, list

---------------------------------------------------------------------

pvmctl listing objects

$ pvmctl lpar list
Logical Partitions
+----------+----+----------+----------+----------+-------+-----+-----+
| Name | ID | State | Env | Ref Code | Mem | CPU | Ent |
+----------+----+----------+----------+----------+-------+-----+-----+
| novalink | 2 | running | AIX/Lin> | Linux p> | 2560 | 2 | 0.5 |
| pvc | 3 | running | AIX/Lin> | Linux p> | 11264 | 2 | 1.0 |
| vm1 | 4 | not act> | AIX/Lin> | 00000000 | 1024 | 1 | 0.5 |
+----------+----+----------+----------+----------+-------+-----+-----+

$ pvmctl lpar list --object-id id=2
Logical Partitions
+----------+----+---------+-----------+---------------+------+-----+-----+
| Name | ID | State | Env | Ref Code | Mem | CPU | Ent |
+----------+----+---------+-----------+---------------+------+-----+-----+
| novalink | 2 | running | AIX/Linux | Linux ppc64le | 2560 | 2 | 0.5 |
+----------+----+---------+-----------+---------------+------+-----+-----+

$ pvmctl lpar list -d name id state --where LogicalPartition.state=running
name=novalink,id=2,state=running
name=pvc,id=3,state=running

$ pvmctl lpar list -d name id state --where LogicalPartition.state!=running
name=vm1,id=4,state=not activated
name=vm2,id=5,state=not activated

---------------------------------------------------------------------

pvmctl creating objects:

creating an LPAR:
$ pvmctl lpar create --name vm1 --proc-unit .1 --sharing-mode uncapped --type AIX/Linux --mem 1024 --proc-type shared --proc 2
$ pvmctl lpar list
Logical Partitions
+-----------+----+-----------+-----------+-----------+------+-----+-----+
| Name | ID | State | Env | Ref Code | Mem | CPU | Ent |
+-----------+----+-----------+-----------+-----------+------+-----+-----+
| novalink> | 1 | running | AIX/Linux | Linux pp> | 2560 | 2 | 0.5 |
| vm1 | 4 | not acti> | AIX/Linux | 00000000 | 1024 | 2 | 0.1 |
+-----------+----+-----------+-----------+-----------+------+-----+-----+

creating a virtual ethernet adapter:
$ pvmctl vswitch list
Virtual Switches
+------------+----+------+---------------------+
| Name | ID | Mode | VNets |
+------------+----+------+---------------------+
| ETHERNET0 | 0 | Veb | VLAN1-ETHERNET0 |
| MGMTSWITCH | 1 | Vepa | VLAN4094-MGMTSWITCH |
+------------+----+------+---------------------+

$ pvmctl vea create --slot 2 --pvid 1 --vswitch ETHERNET0 --parent-id name=vm1

$ pvmctl vea list
Virtual Ethernet Adapters
+------+------------+------+--------------+------+-------+--------------+
| PVID | VSwitch | LPAR | MAC | Slot | Trunk | Tagged VLANs |
+------+------------+------+--------------+------+-------+--------------+
| 1 | ETHERNET0 | 1 | 02224842CB34 | 3 | False | |
| 1 | ETHERNET0 | 4 | 1A05229C5DAC | 2 | False | |
| 1 | ETHERNET0 | 2 | 3E5EBB257C67 | 3 | True | |
| 1 | ETHERNET0 | 3 | 527A821777A7 | 3 | True | |
| 4094 | MGMTSWITCH | 1 | CE46F57C513F | 6 | True | |
| 4094 | MGMTSWITCH | 2 | 22397C1B880A | 6 | False | |
| 4094 | MGMTSWITCH | 3 | 363100ED375B | 6 | False | |
+------+------------+------+--------------+------+-------+--------------+

---------------------------------------------------------------------

pvmctl updating/deleting objects

Update the desired memory on vm1 to 2048 MB:
$ pvmctl lpar update –i name=vm1 –-set-fields PartitionMemoryConfiguration.desired=2048
$ pvmctl lpar update –i id=2 –s PartitionMemoryConfiguration.desired=2048

Delete an LPAR:
$ pvmctl lpar delete -i name=vm4
[PVME01050010-0056] This task is only allowed when the partition is powered off.
$ pvmctl lpar power-off -i name=vm4
Powering off partition vm4, this may take a few minutes.
Partition vm4 power-off successful.
$ pvmctl lpar delete -i name=vm4

---------------------------------------------------------------------

Additional commands

$ pvmctl vios power-off -i name=vios1 <--shutdown VIOS
$ pvmctl lpar power-off –-restart name=vios1 <--restart LPAR

$ mkvterm –m sys_name –p vm1 <--open a console

---------------------------------------------------------------------

viosvrcmd

viosvrcmd runs VIOS commands from Novalink LPAR on the specified VIO server. The underlying RMC is used to pass over the viosvrcmd command to the VIO server.

An example:
Allocating a logical unit from an existing SSP on the VIOS at partition id 2. The allocated logical unit is then mapped to a virtual SCSI adapter in the target LPAR.

$ viosvrcmd --id 2 -c "lu -create -sp pool1 -lu vdisk_vm1 -size 20480" <--create a Logical Unit on VIOS (vdisk_vm1)
Lu Name:vdisk_vm1
Lu Udid:955b26de3a4bd643b815b8383a51b718

$ pvmctl lu list
Logical Units
+-------+-----------+----------+------+------+-----------+--------+
| SSP | Name | Cap (GB) | Type | Thin | Clone | In use |
+-------+-----------+----------+------+------+-----------+--------+
| pool1 | vdisk_vm1 | 20.0 | Disk | True | vdisk_vm1 | False |
+-------+-----------+----------+------+------+-----------+--------+

$ pvmctl scsi create --type lu --lpar name=vm1 --stor-id name=vdisk_vm1 --parent-id name=vios1

---------------------------------------------------------------------

Backups

PowerVM NovaLink automatically backs up hypervisor (LPAR configurations) and VIOS configuration data by using cron jobs. Backup files are stored in the /var/backups/pvm/SYSTEM_MTMS/ directory. VIOS configuration data is copied from the VIOS (/home/padmin/cfgbackups) to Novalink.

$ ls –lR /var/backups/pvm/8247-21L*03212E3CA
-rw-r----- 1 root pvm_admin 2401 Jun 1 00:15 system_daily_01.bak
-rw-r----- 1 root pvm_admin 2401 May 30 00:15 system_daily_30.bak
-rw-r----- 1 root pvm_admin 2401 May 31 00:15 system_daily_31.bak
-rw-r----- 1 root pvm_admin 2401 Jun 1 01:15 system_hourly_01.bak
-rw-r----- 1 root pvm_admin 2401 Jun 1 02:15 system_hourly_02.bak
-rw-r----- 1 root pvm_admin 4915 Jun 1 00:15 vios_2_daily_01.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4914 May 30 00:15 vios_2_daily_30.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4910 May 31 00:15 vios_2_daily_31.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4911 Jun 1 00:15 vios_3_daily_01.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4911 May 30 00:15 vios_3_daily_30.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4910 May 31 00:15 vios_3_daily_31.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4909 Jun 1 01:15 vios_3_hourly_01.viosbr.tar.gz
-rw-r----- 1 root pvm_admin 4909 Jun 1 02:15 vios_3_hourly_02.viosbr.tar.gz

The hypervisor (partition configuration) backup can be manually initiated by using the bkprofdata command:
$ sudo bkprofdata –m gannet –o backup
$ ls –l /etc/pvm
total 8
drwxr-xr-x 2 root root 4096 May 26 17:32 data
-rw-rw---- 1 root root 2401 Jun 2 17:05 profile.bak
$ cat /etc/pvm/profile.bak
FILE_VERSION = 0100
CONFIG_VERSION = 0000000000030003
TOD = 1464901557123
MTMS = 8247-21L*212E3CA
SERVICE_PARTITION_ID = 2
PARTITION_CONFIG =
lpar_id\=1,name\=novalink_212E3CA,lpar_env\=aixlinux,mem_mode\=ded,min_mem\=2048,desired_mem\=2560,max_mem\=16384,hpt_ratio\=6,mem_expansion\=0.00,min_procs\=1,desired_procs\=2,max_procs\=10,proc_mode\=shared,shared_proc_pool_id\=0,sharing_mode\=uncap,min_proc_units\=0.05,desired_proc_units\=0.50,max_proc_units\=10.00,uncap_weight\=128,allow_perf_collection\=0,work_group_id\=none,io_slots\=2101001B/none/0,"virtual_eth_adapters\=3/1/1//0/0/0/B2BBCA66F6F1/all/none,6/1/4094//1/0/1/EA08E1233F8A/all/none","virtual_scsi_adapters\=4/client/2/vios1/2/0,5/client/3/vios2/2/0",auto_start\=1,boot_mode\=norm,max_virtual_slots\=2000,lpar_avail_priority\=127,lpar_proc_compat_mode\=default
PARTITION_CONFIG =
lpar_id\=2,name\=vios1,lpar_env\=vioserver,mem_mode\=ded,min_mem\=1024,desired_mem\=4096,max_mem\=16384,hpt_ratio\=6,mem_expansion\=0.00,min_procs\=2,desired_procs\=2,max_procs\=64,proc_mode\=shared,shared_proc_pool_id\=0,sharing_mode\=uncap,min_proc_units\=0.10,desired_proc_units\=1.00,max_proc_units\=10.00,uncap_weight\ 255,allow_perf_collection\=0,work_group_id\=none,"io_slots\=21010013/none/0,21030015/none/0,2104001E/none/0","virtual_eth_adapters\=3/1/1//1/0/0/36BACB2677A6/all/none,6/1/4094//0/0/1/468CA1242EC8/all/none",virtual_scsi_adapters\=2/server/1/novalink_212E3CA/4/0,auto_start\=1,boot_mo
...
...

The VIOS configuration data backup can be manually initiated by using the viosvrcmd –id X –c “viosbr” command:
$ viosvrcmd –-id 2 –c “viosbr –backup –file /home/padmin/cfgbackups/vios_2_example.viosbr”
Backup of this node (gannet2.pbm.ihost.com.pbm.ihost.com) successful
$ viosvrcmd --id 2 -c "viosbr -view -file /home/padmin/cfgbackups/vios_2_example.viosrb.tar.gz"

$ viosvrcmd –-id X –c “backupios –cd /dev/cd0 –udf -accept” <--creates bootable media
$ viosvrcmd –-id X –c “backupios –file /mnt [-mksysb]” <--for NIM backup on NFS (restore with installios (or mksysb)
$ viosvrcmd –-id X –c “backupios –file /mnt [-mksysb] [-nomedialib]” <--exclude optical media

---------------------------------------------------------------------

↧

Article 3

May 2, 2020, 6:07 am

≫ Next: Article 2

≪ Previous: Article 1

Oracle Basics

Oracle ServerIt is an Oracle instance + an Oracle database
Oracle InstanceIt consists of memory and process structures to provide access to the database
Oracle DatabaseIt consists of data-, control- and redo log files

Oracle has changed the database naming convention starting with Oracle 12.2. Oracle database 18c (year 2018) is the full release of 12.2.0.2. The recommended database product to target would be 19c as it offers a greater duration of support by Oracle to March 2026.

Beginning with release 12.2.0.2, new releases will be annual. The version will be the last two digits of the release year. The release originally
planned as 12.2.0.2 will now be release 18 (for 2018), and the release originally planned as 12.2.0.3 will be release 19. Releases 18 and 19 will be treated as under the umbrella of 12.2 for Lifetime Support purposes.

Instead of Patch Sets, Patch Set Updates, and Database Bundle Patches, the new releases will be maintained with Release Updates (RU) and Release Update Revisions (RUR).

----------------------------

Instance overview

SGA (System Global Area)
The SGA is an area of memory allocated when an Oracle Instance starts up, it consists of a Shared Pool, Database Buffer Cache, Redo log buffer cache etc. The SGA's size and function are controlled by parameters in init.ora or spfile.

PGA (Program Global Area)
PGA is a reserved memory for each user process that connects to an Oracle database. The PGA is allocated when a process is created and deallocated when the process is terminated. In contrast to the SGA, which is shared by several processes, the PGA is an area that is used by only one process.

USER PROCESS --> connection established (session)--> SERVER PROCESS --> ORACLE INSTANCE
When a user connects to the Oracle server a user process is created. After the connection is established a server process is started (PGA) which interacts with the Oracle instance during this session. In a dedicated server configuration, one server process is spawned for each connected user process. In a shared server configuration, user processes are distributed among a pre-defined number of server processes.

BACKGROUND PROCESSES:
Database Writer (DBWn): writes dirty blocks (blocks which have been modified) in the DB. buffer cache to data files.
Log Writer (LGWR): writes from the redo log buffer cache to the redo log file (every 3 seconda, after commit...)
Archiver (ARCn): backs up (archives) the filled online redo log files before they can be reused again
System Monitor (SMON): when the DB. is reopened after a failure, SMON does recovery (uses redo log files to update the database files)
Process Monitor (PMON): when a process fails PMON does clean up (rolling back transactions, releasing locks...)

Log Switch
When an online redo log file is filled up, the Oracle server begins writing to the next online redo log file. The process of switching from one redo log to another is called a log switch. The archiver process (ARCn) initiates backing up (archiving) the filled log files at every log switch. It automatically archives the online redo log before the log can be reused.

Checkpoint (CKPT)
An event called checkpoint occurs when the DBWn writes all the modified database buffers in the SGA, including both committed and uncommitted data, to the data files. At checkpoint the checkpoint number is written into the data file headers and into the control files. Because all the databse changes up to the checkpoint have been recorded in the datafiles, redo log entries before the checkpoint no longer need to be applied to the data files if instance recovery is required.

----------------------------

Database overview

The Oracle database architecture includes logical and physical structures that make up the database:
- physical structure: control files, online redo log files, data files
- logical structure: tablespaces, segments, extents, data blocks

Pysical structure:
Control file: during DB creation a control file is created which contains the name of the DB, location of redo log files, timestamp....
Redo log files: they record all changes made to data and provide a recovery mechanism. Usually they are in /oracle/SID directory.
Data Files: Each tablespace consists of one or more files called data files. These are physical files which contain the data in the DB.

Logical structure:
Tablespace: (one or more) datafailes can be grouped logically into tablespaces (like a vg which can have multiple disks)
Table: the information is stored in tables, which consist of rows and columns (like an lv in the vg)

Data Blocks, Extents, Segments:
Oracle data blocks are the smallest units of storage that the Oracle server can allocate. One data block corresponds to one or more operating system blocks allocated from an existing data file. Above the database blocks are the extents. An extent is a specific number of data blocks that is allocated to store information. When more space is needed it is allocated by extents. (Like we add one or more PPs to an LV)

In a tablespace above extents the data is grouped logically into segments. For example, each table's data is stored in its own data segment, while each index's data is stored in its own index segment. Oracle allocates space for segments in extents. Therefore, when the existing extents of a segment are full, Oracle allocates another extent for that segment. Because extents are allocated as needed, the extents of a segment may or may not be contiguous on disk. The segments also can span files, but the individual extents cannot. A segment cannot span tablespaces; however, a segment can span multiple data files that belong to the same tablespace. Each segment is made up of one or more extents.

----------------------------

PFILE, SPFILE

Oracle parameters are stored in parameter files (pfile, spfile). During start up the parameter file is used to set up Oracle parameters. PFILE is a text file and if it is modified, the instance must be shut down and restarted in order to make the new values effective. (/oracle/SID/920_64/dbs/initSID.ora) SPFILE is a binary file and it is maintained by the Oracle server. The ALTER SYSTEM COMMAND is used to change the value of instance parameters. An SPFILE is created from an initSID.ora file (PFILE) using the CREATE SPFILE command. (/oracle/SID/102_64/dbs/spfileSID.ora) By default, if you do not specify PFILE in your STARTUP command, Oracle will use server parameter file (SPFILE).

----------------------------

Database Start/Stop

Starting an instance includes the following tasks:
-reading the initialization file in the following order: spfileSID.ora, if not found then spfile.ora, initSID.ora
-allocating the SGA
-starting the background processes
-opening the alertSID.log

STARTUP command: STARTUP [FORCE] [RESTRICT] [PFILE=filename] [OPEN [RECOVER] [database] |MOUNT |NOMOUNT]
NOMOUNTcreates the SGA and starts up the background processes but does not provide access to the db
MOUNTmounts the database for certain DBA activities but does not provide user access to the database
OPENenables users to access the database

PFILE=fileenables a nondefault parameterfile to be used to configure the instance
FORCEaborts the running instance before performing a normal startup
RESTRICTenables only users with RESTRICTED SESSION privilege to access the database
RECOVERbegins media recovery when the database starts

It is also possible to open the database in READ WRITE mode, in READ ONLY mode or in restricted mode (RESTRICTED SESSION)

SHUTDOWN command:
NORMAL no new connections, wait for user disconnect, db and redo buffers written to disk, db dismounts
TRANSACTIONAL no new transaction, transaction ends client disconnect, shutdown immediately (no recovery needed)
IMMEDIATE no wait to finish, Oracle rolls back, disconnects clients and dismounts db (no recovery needed)
ABORT no wait to finish, no rollback.....instance recovery neeeded, which occurs automatically

--------------------------

RMAN

The acronym RMAN stands for Oracle's Recovery Manager, with an emphasis on the word Recovery. Backups are worthless if you can't use them to restore lost data! RMAN is Oracle's recommended standard for database backups for any sized organization. The RMAN utility is an executable file, and its task can be automated by scripts using its command-line version.

--------------------------

Relinking Executables

You can relink the product executables manually by using the relink shell script located in the $ORACLE_HOME/bin directory. You must relink the product executables
every time you apply an operating system patch or after an operating system upgrade. Before relinking executables, you must shut down all executables that run in the Oracle home directory that you are relinking. In addition, shut down applications linked with Oracle shared libraries. The relink script does not take any arguments.
Depending on the products that have been installed in the Oracle home directory, the relink script relinks all Oracle product executables.

To relink product executables, run the following command:
$ relink

--------------------------

oraenv and coraenv

The oraenv and coraenv scripts are created during installation. These scripts set environment variables based on the contents of the oratab file and provide a central means of updating all user accounts with database changes and a mechanism for switching between databases specified in the oratab file.

The oraenv or coraenv script is usually called from the user’s shell startup file (for example, .profile or.login). It sets the ORACLE_SID and ORACLE_HOME environment variables and includes the $ORACLE_HOME/bin directory in the PATH environment variable setting. When switching between databases, users can run the oraenv or coraenv script to set these environment variables.

coraenv script: % source /usr/local/bin/coraenv
oraenv script: $ . /usr/local/bin/oraenv

--------------------------

Listener

There is a special process called listener, whose responsibility is to listen for incoming connection requests.
There are 3 operating system configuration files under $ORACLE_HOME/network/admin:
listener.oraconfigures the listener
tnsnames.oracontains a list of service names
sqlnet.oraconatins client side informations (i.e: client domain...)

lsnrctlwith this can start/stop the listener

aix11:orap44 2> lsnrctl
...
LSNRCTL> help
The following operations are available
An asterisk (*) denotes a modifier or extended command:

start stop status
services version reload
save_config trace spawn
change_password quit exit

Oracle recommends that you reserve a port for the listener in the /etc/services file of each Oracle Net Services node on the network. The default port is 1521. The entry lists the listener name and the port number. For example: oraclelistener 1521/tcp

In this example, oraclelistener is the name of the listener as defined in the listener.ora file. Reserve multiple ports if you intend to start multiple listeners.
If you intend to use Secure Sockets Layer, then you should define a port for TCP/IP with Secure Sockets Layer in the /etc/services file. Oracle recommends a value of
2484. For example: oraclelistenerssl 2484/tcps

--------------------------

↧

Article 2

May 2, 2020, 6:29 am

≫ Next: Article 1

≪ Previous: Article 3

Oracle ASM, RAC, Data Guard

ASM (Automatic Storage Management)

ASM is Oracle's recommended storage management solution. Oracle ASM uses disk groups to store data files. A disk group consists of multiple disks and for each ASM disk group, a level of redundancy is defined (normal (mirrored), high (3 mirrors), or external (no ASM mirroring)). When a file is created within ASM, it is automatically striped across all disks allocated to the disk groups. The performance is comparable to the performance of raw devices. ASM allows disk management to be done using SQL statements (such as CREATE, ALTER, and DROP), Enterprise Manager or with command line.

ASM is a single DB instance (as a normal DB instance would be), with its own processes.
# ps -ef | grep asm<--shows what asm uses (it has pmon, smon...)

ASM requires a special type of Oracle instance to provide the interface between a traditional Oracle instance and the storage elements presented to AIX. The Oracle ASM instance mounts disk groups to make ASM files available to database instances. An Oracle ASM instance is built on the same technology as an Oracle Database instance. The ASM software component is shipped with the Grid Infrastructure software.

Most commonly used storage objects that are mapped to ASM disks are AIX raw hdisks and AIX raw logical volumes. The disks or logical volumes are presented by special files in the /dev directory:
- Raw hdisks as /dev/rhdisknn or
- Raw logical volume as /dev/ASMDataLVnn

To properly present those devices to the Oracle ASM instance, they must be owned by the Oracle user (chown oracle.dba /dev/rhdisknn) and the associated file permission must be 660 (chmod 660 /dev/rhdisknn). Raw hdisks cannot belong to any AIX volume group and should not have a PVID defined. One or more raw logical volumes presented to the ASM instance could be created on the hdisks belonging to the AIX volume group.

For systems that do not use external redundancy, ASM provides its own internal redundancy mechanism and additional high availability by way of failure groups. A failure group, which is a subset of a diskgroup, by definition is a collection of disks that can become unavailable due to a failure of one of its associated components; e.g., controllers or entire arrays. Thus, disks in two separate failure groups (for a given diskgroup) must not share a common failure component.

In a diskgroup usually 2 Failgroups are defined, for mirroing purposes inside the ASM. At OS side it looks like:
oradata-PCBDE-rz2-50GB-1<--Failgroup1
oradata-PCBDE-rz3-50GB-1<--Failgroup2

In this case storage extension is possible only by 2 disks at a time (from 2 separate storage box, in optimal case) and in a disk group all the disks should have the same size. When you have 2 disks in a Failgroup, and you create a 50GB tablespace, ASM will stripe it across the disks (25-25GB on each disk). When you add 2 more disks, then ASM starts to rebalancing tha data, so you will have 4x12.5Gb on each disk.

If hdisks are not part of the AIX volume group, its PVIDs can be cleared using the chdev command:
# chdev –l hdiskn –a pv=yes
# chdev –l hdiskn –a pv=clear

PVIDs are physically stored in the first 4k block of the hdisk, which happens to be where Oracle stores the ASM, OCR and/or Voting disk header. For ASM managed disks hdisk numbering is not important. Some Oracle installation documentation recommends temporarily setting PVIDs during the install process (this is not the preferred method). Assigning or clearing a PVID on an existing ASM managed disk will overwrite the ASM header, making data unrecoverable without the use of KFED (See Metalink Note #353761.1)

AIX 5.3 TL07 (and later) has a specific set of Oracle ASM related enhancements. Execution process of the "mkvg" or "extendvg" commands will now check for presence of ASM header before writing PVID information on hdisk. Command will fail and return an error message if ASM header signature is detected:
0516-1339 /usr/sbin/mkvg: Physical volume contains some 3rd party volume group.
0516-1397 /usr/sbin/mkvg: The physical volume hdisk3, will not be added to the volume group.
0516-862 /usr/sbin/mkvg: Unable to create volume group.

The force option (-f) will not work for an hdisk with an ASM header signature. If an hdisk formerly used by ASM need to be used for another purpose, the ASM header area can be cleared using the AIX "dd" command:
# dd if=/dev/zero/ of=/dev/rhdisk3 bs=4096 count=10

Using the chdev utility with pv=yes or pv=clear operations do not check for ASM signature before setting or clearing PVID area.
AIX 6.1 TL06 and AIX 7.1 introduced a rendev command that can be used for permanent renaming of the AIX hdisks.

ASM devices have a header which contains an asm id. To extract, do:
# dd if=/dev/$disk bs=1 skip=72 count=32 2>/dev/null
These ids can be used to map the old and the new devices and therefore create new asm device files which point to the correct, new disks.

rendev,lkdev
An ASM disks have no pvid and so looks like it's unassigned. An AIX admin can therefore mistakenly think the disk is free and add it to a volume group, thus destroying data. Use rendev to rename ASM hdisks to something more obviously ASM, e.g. hdiskasm5, and if necessary update the Oracle ASM device scan path. Also, lkdev can be used as an extra level of protection. The "lkdev" command is used to lock the disk to prevent the device from inadvertently being altered by a system administrator at a later time. It locks the device so that any attempt to modify the device attributes (chdev, chpath) or remove the device or one of its paths (rmdev, rmpath) will be denied. The ASM header name can also be added as a comment when using lkdev, to make it even more obvious.

# rendev -l hdisk4 -n hdiskASMd01
# lkdev -l hdisk4 -n OracleASM

mknod (old)
If rendev is not available, device files are created in /dev using "mknod /dev/asm_disk_name c maj min" to have the same major and minor number as the disk device to be used. The Oracle DBA will use these device names created with mknod.

--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------

Oracle Clusterware

Starting with Oracle Database 11g Release 2, Oracle has packaged Oracle Clusterware, Automatic Storage Management and the listener as a single package called "Oracle Grid Infrastructure".

Oracle Clusterware provides basic clustering services at the operating system level, it is the technology that transforms a server farm into a cluster. Theoretically Oracle Clusterware can be used to provide clustering services to other applications (not Oracle).

With Oracle Clusterware you can provide a cold failover cluster to protect an Oracle instance from a system or server failure. The basic function of a cold failover cluster is to monitor a database instance running on a server, and if a failure is detected, to restart the instance on a spare server in the cluster. Network addresses are failed over to the backup node. Clients on the network experience a period of lockout while the failover takes place and are then served by the other database instance once the instance has started.

It consist of these components:
crsd (Cluster Ready Services):
It manages resources (start/stop of services, failovers...), it requires public and private interfaces and the Virtual IP (VIP) and it runs as root. Failure of the CRS daemon can cause node failure and it automatically reboots nodes to avoid data corruption because of the possible communication failure between the nodes.

ocssd (Oracle Cluster Synchronization Services):
It provides synchronization between the nodes, and manages locking and runs as oracle user. Failure of ocssd causes the machine to reboot to avoid split-brain situation. This is also required in a single instance configuration if ASM is used.

evmd (Event Management Logger):
The Event Management daemon spawns a permanent child process called "evmlogger" and generates the events when things happen. It will restart automatically on failures, and if evmd process fails, it does not halt the instance. Evmd runs as "oracle" user.

oprocd:
Oprocd provides I/O Fencing solution for the Oracle Clusterware. (Fencing is isolating a node when it is malfunctioning.) It is the process monitor for the oracle clusterware. It runs as "root" and failure of the Oprocd process causes the node to restart. (log file is in /etc/oracle/oprocd)

Important components at storage side:
-OCR (Oracle Cluster Repository/Registry)
Any resource that is going to be managed by the Orcle Clusterware needs to be registered as a CRS resource, and then CRS stores the the resource definitions in the OCR.
It is a repository of the cluster, which is a file (disk) in ASM (ocr-rz4-256MB-1).
crsstat<--this will show what OCR consists of
ocrcheck<--shows ocr disks

-VOTE DISK:
It is a file (disk) in ASM, that manages node memberships. It is needed to have the necessary quorum (ora_vot1_raw_256m). 3 disks are needed, in optimal case every disk is from different storage box. If you don't have 3 storage boxes, then create on 2 boxes, and do an nfs mount to RAC nodes for the 3rd voting disk
crsctl query css votedisk <-- shows vote disks

vote disk movement:
create a new voting disk device then: dd if=/dev/<old device> of=/dev/<new device> bs=4096

Oracle Clusterware provides seamless integration with, Oracle Real Application Clusters (Oracle RAC) and Oracle Data Guard. (RAC environment is using shared storage, however in a Data Guard setup each node has its own separate storage.)

Checking CRS network topology:
# /ora_u02/app/oracle/product/crs/bin/oifcfg getif -global
en7 199.206.206.32 global public
en11 112.24.254.8 global cluster_interconnect

--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------

RAC (Real Application Cluster)

RAC is based on Oracle Clusterware and in a RAC environment, two or more computers (each with an instance) concurrently access a single database. This allows an application or user to connect to either computer and have access to the data. It combines the processing power of multiple interconnected computers to provide system redundancy and scalability. Unlike the cold cluster model where one node is completely idle, all instances and nodes can be active to scale your application.

ASM with RAC:

With the release of 12cR1 & 12cR2 Oracle no longer supports the use of raw logical volumes with the DB and RAC (see My Oracle Support note “Announcement of De-Support of using RAW devices in Oracle Database Version 12.1” (Doc ID 578455.1)). Oracle continues to support the coexistence of PowerHA with Oracle clusterware.

If using a file system for your Oracle Database 12c RAC data files (rather than ASM), you’ll need to use a cluster file system. Oracle ACFS allows file system access by all members in a cluster at the same time. That requirement precludes JFS and JFS2 from being used for Oracle Database 12c RAC data files. The IBM Spectrum Scale is an Oracle RAC 12c certified cluster file system.

Finding out the nodes of RAC (olsnodes):
(As oracle user "crstat -t" should work as well)
# /u02/app/oracle/product/10.2/crs/bin/olsnodes
aix001-ora-rac1
aix002-ora-rac2

In Oracle RAC versions prior to 11.2, when a node gets rebooted due do scheduling problems, the process, which would initiate the reboot, is oprocd. When the oprocd process reboots the node there should be only one entry in errpt (SYSTEM SHUTDOWN BY USER). There should not be a 'SYSDUMP' entry since ‘oprocd’ does not initiate a sysdump. A ‘SYSDUMP’ entry is an indication that other problems may be the root cause of node reboots.

In Oracle RAC 11g Release 2, severe operating system scheduling issues are detected by the Oracle cssdagent and cssmonitor processes and the node is rebooted. T

Files to check if oprocd or css... rebooted the node:
before 11: /etc/oracle/oprocd/<node>.oprocd.lgl.<time stamp> .
11GR2: /etc/oracle/lastgasp/cssagent_<node>.lgl, /etc/oracle/lastgasp/cssmonit_<node>.lgl

In the ocssd.log file on the other node (not on the node which was rebooted) could be some entries:
# tail -200 /pscon_u01/app/oracle/product/crs/log/aix12/cssd/ocssd.log
[ CSSD]2010-05-18 01:13:53.446 [4114] >WARNING: clssnmPollingThread: node aix11 (1) at 90 2.481040e-265artbeat fatal, eviction in 1.047 seconds
[ CSSD]2010-05-18 01:13:54.439 [4114] >WARNING: clssnmPollingThread: node aix11 (1) at 90 2.481040e-265artbeat fatal, eviction in 0.054 seconds
[ CSSD]2010-05-18 01:13:54.493 [4114] >TRACE: clssnmPollingThread: Eviction started for node aix11 (1), flags 0x040f, state 3, wt4c 0
..
[ CSSD]2010-05-18 01:13:54.551 [2829] >TRACE: clssnmDiscHelper: aix11, node(1) connection failed, con (1112cb1f0), probe(0)
[ CSSD]2010-05-18 01:13:54.551 [2829] >TRACE: clssnmDeactivateNode: node 1 (aix11) left cluster

Oracle RAC clusterware has strict timeout requirements for VIP address failover in case of a public network failure. When DNS servers are unreachable due to a public network failure, DNS name resolution calls such as getaddrinfo may hang for the default AIX query timeout duration of 5 minutes. Name resolution calls made by Oracle processes can thus delay the VIP failover. To reduce such delays, the DNS query timeout can be reduced to 1 minute, by adding the following options line in /etc/resolv.conf for all RAC cluster nodes:
"options timeout:1"

No reboot is necessary to activate this change. If you need even faster VIP failover the timeout can be further reduced to a value of 0; provided your network infrastructure (network and DNS servers) has the speed to serve name queries within a few (5-6) seconds. If you use a value of 0 for timeout and your DNS or network is slow to respond, DNS name lookups will start to fail prematurely.

--------------------------------------------

Oracle RAC IPs

Summary:
-At least 2 NICs will be needed and /etc/hosts should contain private, public, and virtual IP addresses
-Configure them with Public and Private IPs (ifconfig will show these)
-DNS registration for: Public, VIP
(The virtual IP's do not have to be added to IFCONFIG. This is because the VIPCA takes care of it.)

Public IP: (server IP address from OS side)
- DNS registrations + IP configuration for AIX (as usual)
- servers in cluster should be in same subnet

Virtual IP: (VIP is used by Oracle for RAC failover)
-same subnet as Public IP
-DNS registration needed (not needed to be configured during installation, RAC will take care of them)
-same interface name on each node (like en2)

Private IP: (for RAC hearbeat)
-separate interface from public IP,
-same interface name on each node (like en1)
-separate network from public IP (something like 192.168...)
-no DNS registration

SCAN IP: (Single Client Access Name, managed by Oracle, so users can use only 1 name to reach cluster)
(SCAN works by replacing a hostname or IP list with virtual IP addresses (VIP))
- DNS registration: single DNS domain name that resolves to all of the IP addresses in your RAC cluster (one for each node)
- not needed to be configured during install, RAC will do it
- in /etc/hosts, looks something like this: myscan.mydomain.com IN A 122.22.22.22 IN A 122.22.22.23 IN A 122.22.22.24

aix-sd31:
en0: 10.4.31.254 aix-sd31<--Public (DNS)
en0: 10.4.31.25 aix-sd31-vip <--Virtual IP (DNS)
en0: RACD001.domain.com 10.4.31.26 <--SCAN IP 1 (DNS)
en0: RACD001.domain.com 10.4.31.27<--SCAN IP 2 (DNS)
en0: RACD001.domain.com 10.4.31.28<--SCAN IP 3 (DNS)
en1: 169.254.214.76 aix-sd31-priv <--Private IP

IMPORTANT
!!!!! FOR ORACLE RAC BUILDS PASSWORLESS SSH IS NEEDED TO LOCALHOST AS WELL !!!!!
!!!!! FOR ORACLE AND GRID USER LOCAL AND OTHER NODE PUBLIC KEY SHOULD BE IN AUTHORIZED_KEYS FILE!!!!!

--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------
--------------------------------------------

Data Guard

A Data Guard configuration consists of the primary database that contains the original data and any copy of that data in separate databases (on different servers) that are kept in synch with the primary. In 11gR2 it can consist of up to 30 databases, in any combination of RAC, non-RAC, physical, logical, or snapshot.

In this setup it can be used for failover for the primary database or the copies of the production data can be used in read-only mode for reporting purposes etc.

Transitions from one database role to another are called switchovers (planned events) or failovers (unplanned events), where Data Guard can actually execute all of the tasks of the transition with just a few commands.

Data Guard broker is itself a background Oracle monitor process (DMON) that provides a complex set of role management services governing all of the databases in a configuration. This broker controls the redo transport and is accountable for transmitting defect-free archive logs from any possible archive location. The Log Apply Services within Data Guard are responsible for maintaining the synchronization of transactions between the primary and standbys.

Data Gurad does not use shared storage, it is most applicable for DR scenarios.

Finding out Data Guard primary (prod) or standby (shadow) node:

# ps -ef | grep mrp
orap02 5496874 1 2 Jan 28 - 291:26 ora_mrp0_P02<--if you see this process then it is the standby (mrp: media recovery process)

--------------------------------------------

↧

Article 1

May 2, 2020, 7:18 am

≫ Next: Article 1

≪ Previous: Article 2

Oracle - Tuning

Resource Limits

ulimits (smit chuser or edit /etc/security/limits to create a stanza for Oracle/grid user and set -1 (unlimited) for everything except core.

oracle:
data = -1
stack = -1
fsize_hard = -1
cpu_hard = -1
data_hard = -1
stack_hard = -1
fsize = -1
nofiles = -1
cpu = -1
rss = -1

Soft File Descriptors at least 1024 KB
Hard File Descriptors at least 65536 KB

maxuproc:m aximum number of PROCESSES allowed per user (smit chgsys). Set this value to 16386 (16k)
ncargs: 128

--------------------------------------

IO (FC adapter, disk)

FC adapter:
max_xfer_size should be increased from default 1MB to 2MB. The default adapter DMA memory size is 16 MB which increases to 128 MB when a non default max_xfer_size is used. Larger DMA size can be important for performance with many concurrent large block I/Os.

num_cmd_elems might need to be increased if fcstat -e reports a persistent nonzero value for No Command Resource Count. If fcstat –e reports a persistent, non-zero value for No DMA Resource Count contact support.

Disk:
queue wait and queue overflow detected through iostat –Dl might indicate a need to increase queue depth. max_transfer might need to be adjusted upward depending on the largest I/O requested by Oracle (A typical starting point for Oracle on AIX is 0x100000 (1 MB).)

As of AIX 5.3, the optimal setting for LTG size is dynamically calculated during the varyonvg process and does not need to be manually set. The varyonvg '-M'
parameter should not be used as it will over-ride the dynamically calculated LTG size. It is recommended that all hdisks within a given VG have the same 'max_transfer' (and other attribute) values. In order to change hdisk attribute values, any associated filesystems should be unmounted and the VG varied off.

ASM considerations for standalone Oracle 11gR2:
ASM will use asynchronous I/O by default, so filesystemio_options=ASYNC (default) is appropriate. For clustered ASM (e.g. RAC) configurations, SCSI reservation must be disabled on all ASM hdisk and hdiskpower devices (e.g. reserve_policy=no_reserve). The standalone use of ASM, hdisks and hdiskpower devices does not need to have SCSI reservation disabled.

--------------------------------------

IO (VG, LV, FS)

VG should be created as scalable VG. If ASM is not used, max interpolicy striping (pp spreading) is suggested when logical volumes are created. To get the most benefit from spreading physical partitions across the LUNs, use a small physical partition size, for example, 32 MB or 64 MB.

Buffered file I/O on JFS2
The default filesystemio_options=ASYNC, which means all data spaces, redo log file systems, and control file systems are using the kernel buffers rather than writing directly to disk. In this case, it does not matter whether redo log file systems and control file systems are 512 b or 4 KB block size file systems. Oracle on AIX best performance is, however, usually achieved using CIO (though there are exceptions).

Concurrent I/O (CIO) on JFS2
Set the Oracle parameter filesystemio_options=SETALL, or mount the filesystems with the CIO option. It is not necessary to both SETALL and mount filesystems with the CIO option, although no harm is done either way. Metalink note: 272520.1 indicate that mounting with CIO is needed, while IBM believes it is not needed. IBM is working with Oracle to fix the metalink note.

If using CIO with SETALL, CIO mount or both, you must create separate file systems for redo logs and control files (or a single filesystem for both), with an agblksize of 512 rather than the default 4 KB. The ioo parameters aio_fsfastpath and posix_aio_fsfastpath accelerate CIO. It is enabled by default in AIX 6.1 and 7.1.

With AIX 6.1, IBM introduced a new open flag O_CIOR which is same as O_CIO, but this allows subsequent open calls without CIO. The advantage of this enhancement is that other applications like cp, dd, cpio, dbv can access database files in read only mode without having to open them with CIO. Starting with Oracle 11.2.0.2 when AIX 6.1 is detected, Oracle will use O_CIOR option to open a file on JFS2. Therefore you should no longer mount the filesystems with mount option "-o cio". (
The mount option noatime, suggested for Oracle 10g binaries is fixed in 11.2.0.2.)

IBM mount advice for database files:
- Data files: Use CIO filesystemio_options=SETALL, and default agblksize (4k); mount with no options.
- Redo logs: Create with agblksize of 512 and mount with no options. With SETALL, Oracle is doing direct I/O for Redo logs.
- Control files: Create with agblksize of 512 and mount with no options. With SETALL, Oracle is doing direct I/O for control files.
- Archive logs: Mount -o rbrw . Do not use CIO; use the jfs2 rbrw option
- Dumps: Mount –o rbrw

General rules:
- All vgs scalable
- LUNs no larger than 500G
- Preferred number of LUNs in a vg: 10 and more (with exceptions, see later), minimum 4 for extra small DBs/vgs (like 100GB)
- PP size preferably no larger than 32MB (16MB for smaller LUNS than 250G, 32MB for 250GB to 500GB)
- All LVs with "maximum allocation"
- All jfs2 filesystems with INLINE log
- Filesystems for online redo logs formatted with 512 fragment size

Rules for high volume DBs
- Extra vgs for online log and mirror log – INSToriglogvg, INSTmirrlogvg – small LUNs, minimum number of LUNS in vg at least 4
- Extra vgs for highest volume tablespaces, same rules as for general vgs apply – 1 filesystem per vg

--------------------------------------

File System Options

The DIO and CIO features included in AIX improve file system performance to a level comparable to raw logical volumes. Before Oracle Database 11g, DIO and CIO could not be enabled at the file level on JFS/JFS2. Therefore, the Oracle home directory and data files had to be placed in separate file systems for optimal performance. The Oracle home directory was placed on a file system mounted with default options, with the data files and logs on file systems mounted using the dio or cio options.

With Oracle Database 11g, you can enable DIO and CIO on JFS/JFS2 at the file level. You can do this by setting the FILESYSTEMIO_OPTIONS parameter in the server parameter file to setall or directIO. This enables CIO on JFS2 and DIO on JFS for all data file Input-Output. Because the directIO setting disables asynchronous
Input-Output it should normally not be used. As a result of this 11g feature, you can place data files on the same JFS/JFS2 file system as the Oracle home directory and still use DIO or CIO for improved performance.

However you should still place Oracle Database logs on a separate JFS2 file system for optimal performance. The optimal configuration is to create the file system using the agblksize=512 option and to mount it with the cio option. Redo is a natural bottleneck for high-update databases because Oracle redo disk must accept the sum of all disk update rates. After redo and disks are optimized the only way to relieve redo bottlenecks is faster redo storage.

For improved performance, create separate file systems for redo logs and control files (or a single file system for both), with an agblksize of 512 bytes rather than the default of 4 KB.

Note: To use the Oracle RAC option, you must place data files on an ASM disk group or on a GPFS file system. You cannot use JFS or JFS2. DIO is implicitly enabled when you use GPFS.

--------------------------------------

Asynchronous I/O

Asynchronous I/O (AIO) allows a program to initiate an I/O operation then continue with other work in parallel to the I/O operation. Oracle Database 12c often requires multiple server and user processes running at the same time. Therefore Oracle Database 12c takes full advantage of AIO services provided by AIX. AIO is implemented with AIO server processes. The configuration values of: minservers, maxservers and maxreqs control the AIO server configuration of AIX.

AIO kernel extensions are loaded at system boot (always loaded), AIO servers stay active as long as there are service requests, and the number of AIO servers is dynamically increased or reduced based on demand of the workload. The aio_server_inactivity parameter defines after how many seconds idle time an AIO server will exit. AIO tunables are now based on logical CPU count, and hence it is usually not necessary to tune minservers, maxservers, and maxreqs as in the past.

For Oracle Database 12c, the database defaults to asynchronous I/O (AIO) enabled and concurrent I/O (CIO) disabled. In general, a good starting point is to set the filesystemio_options=setall, in your init*.ora configuration file. This setting will enable AIO (which is the default) and CIO operation. CIO operation is built
upon direct I/O (DIO) with the additional function of inode locking. Note, there may be workloads (eg. sequential reads) where cached I/O performs better than CIO.
When using CIO/DIO, the Oracle setting of DB_FILE_MULTIBLOCK_READ_COUNT (the maximum number of blocks read in one I/O operation during a sequential scan) needs to be considered. Also, the alignment of the database blocksize and the file system block size (agblksize) has to be considered.

From Oracle Database 11g Release 2 version 11.2.0.2 and later, Oracle opens the files using "O_CIOR" which is similar to "O_CIO", but allows subsequent open calls without CIO, so that you no longer need to mount the JFS2 filesystems with mount option "-o cio" and other OS tools and third part tools can access the database files without any issues.

To display the number of asynchronous Input-Output servers running, enter the following commands as the root user:
# pstat -a | grep -c aios
# ps -k | grep aioserver

Check the number of active asynchronous Input-Output servers periodically, and change the values of the minservers and maxservers parameters if required. The
changes take place when the system is restarted.

--------------------------------------

IOCP (I/O Completion Ports)

On AIX on POWER systems, enable I/O completion ports (IOCP) to ensure successful database and grid infrastructure installation.
To check if the IOCP module is enabled, run the following command and look for status "Available" in the output,

$ lsdev |grep iocp
Iocp0 Available I/O Completion Ports.

If IOCP is in "Defined" state, enable it (using "smitty").

Activate iocp:
# lsdev -Cc iocp; lsattr -El iocp0
# mkdev -l iocp0; chdev -l iocp0 -P -a autoconfig='available'

--------------------------------------

Oracle Block Size

During read operations, entire operating system blocks are read from the disk. If the database block size is smaller than the operating system file system block size, then Input-Output bandwidth is inefficient. If you set Oracle Database block size to be a multiple of the file system block size, then you can increase performance by up to 5 percent. The DB_BLOCK_SIZE initialization parameter sets the database block size. However, to change the value of this parameter, you must re-create the database. To see the current value of the DB_BLOCK_SIZE parameter, run the SHOW PARAMETER DB_ BLOCK_SIZE command in SQL*Plus.

You can configure Oracle Database block size for better Input-Output throughput. On AIX, you can set the value of the DB_BLOCK_SIZE initialization parameter to between 2KB and 32 KB, with a default of 4 KB. For databases on raw partitions, Oracle Database block size is a multiple of the operating system physical block size (512 bytes on AIX). Oracle recommends smaller Oracle Database block sizes (2 KB or 4 KB) for online transaction processing or mixed workload environments and larger block sizes (8 KB, 16 KB, or 32 KB) for decision support system workload environments.

--------------------------------------

Log Archive Buffers

By increasing the LOG_BUFFER size, you may be able to improve the speed of archiving the database, particularly if transactions are long or numerous. Monitor the log file Input-Output activity and system throughput to determine the optimum LOG_BUFFER size. Tune the LOG_BUFFER parameter carefully to ensure that the overall performance of normal database activity does not degrade.

--------------------------------------

Server Side Caching

The Server Side Caching is a new feature introduced in AIX 7.1 TL04 SP02 and AIX 7.2. This feature is supported to use with Oracle Database to improve the performance of read I/O intensive workloads on AIX. Server-side caching provides the capability to cache the application data stored in SAN to Solid State Devices (SSD) or Flash Storage LUNs or Virtual Disks provided by VIOS on the AIX server. After Server Side Caching is enabled in AIX, all the read I/O requests are first redirected to the caching area created with the fast SSDs or Flash Storage or VIOS virtual disk on the server. This feature can be enabled or disabled dynamically, no reboot is required and changes are transparent to the running application or workload. This works only with Oracle Database Non-RAC environment.

--------------------------------------

Write Behind

The write behind feature enables the operating system to group write Input-Output together, up to the size of a partition. You can improve performance by doing this,
because the number of Input-Output operations is reduced. The file system divides each file into 16 KB partitions to increase write performance, limit the number of dirty pages in memory, and minimize disk fragmentation. The pages of a particular partition are not written to disk until the program writes the first byte of the next 16KB partition. To set the size of the buffer for write behind to eight 16 KB partitions, enter the following command:
# /usr/sbin/vmo -o numclust=8

--------------------------------------

Sequential Read Ahead

Note: The information in this section applies only to file systems, and only when neither DIO nor CIO are used.

The VMM anticipates the need for pages of a sequential file. It observes the pattern in which a process accesses a file. When the process accesses two consecutive pages of the file, the VMM assumes that the program continues to access the file sequentially, and schedules additional sequential reads of the file. These reads overlap the program processing and make data available to the program faster. The following VMM thresholds, implemented as kernel parameters, determine the number of pages it reads
ahead:
- minpgahead: it stores the number of pages read ahead when the VMM first detects the sequential access pattern.
- maxpgahead: it stores the maximum number of pages that VMM reads ahead in a sequential file.

Set the minpgahead and maxpgahead parameters to appropriate values for an application. The default values are 2 and 8 respectively. Use the vmo command to change these values. You can use higher values for the maxpgahead parameter in systems where the sequential performance of striped logical volumes is of paramount importance.

--------------------------------------

Disk IO Pacing

Disk IO pacing is an AIX mechanism that enables to limit the number of pending IO requests to a file. This prevents disk IO intensive processes from saturating the CPU. Therefore, the response time of interactive and CPU-intensive processes does not deteriorate. You can achieve disk IO pacing by adjusting two system parameters: the high-water mark and the low-water mark. When a process writes to a file that has a pending high-water mark IO request, the process is put to sleep. The process wakes up when the number of outstanding IO requests falls lower than or equals the low-water mark.

You can use the smit command to change the high and low-water marks. Determine the water marks through trial-and-error. Use caution when setting the water marks, because they affect performance. Tuning the high and low-water marks has less effect on disk Input-Output larger than 4 KB.

You can determine disk IO saturation by analyzing the result of iostat, in particular, the percentage of iowait and tm_act. A high iowait percentage combined
with high tm_act percentages on specific disks is an indication of disk saturation. (A high iowait alone is not necessarily an indication of an Input-Output bottleneck.)

--------------------------------------

IOO tunables j2_nBufferPerPagerDevice and j2_dynamicBufferPreallocation

Do not change these unless there is a high delta in vmstat –v external pager filesystem I/Os blocked with no fsbuf. If this value is high, first increase
j2_dynamicBufferPreallocation from 16 (16k slabs) to 32; monitor. If increasing this does not help, then consider raising the value of j2nBufferPerPagerDevice
which is the starting value for dynamic buffer allocation.

Do not change AIX restricted tunables without the advice from IBM AIX support. In AIX 6.1 j2_nBufferPerPagerDevice is a restricted tunable, while j2_dynamicBufferPreallocation is not.

Here are some default values for three ioo parameters:
- j2_dynamicBufferPreallocation=128
- numfsbufs=1024 (legacy jfs)
- maxpgahead=16 (legacy jfs)

--------------------------------------

Resilvering (mirroring) with Oracle Database

If you disable mirror write consistency for an Oracle data file allocated on a raw logical volume, then the Oracle Database crash recovery process uses resilvering to
recover after a system failure. This resilvering process prevents database inconsistencies or corruption.

During crash recovery, if a data file is allocated on a logical volume with multiple copies, then the resilvering process performs a checksum on the data blocks of all the copies. It then performs one of the following:
- If the data blocks in a copy have valid checksums, then that copy is used to update the copies that have invalid checksums.
- If all copies have blocks with invalid checksums, then the blocks are rebuilt using the redo log file.

On AIX, the resilvering process works only for data files allocated on raw logical volumes for which mirror write consistency is disabled. Resilvering is not required for data files on mirrored logical volumes with mirror write consistency enabled, because mirror write consistency ensures that all copies are synchronized. If the system fails where which mirror write consistency was disabled, then run the syncvg command to synchronize the mirrored logical volume before starting Oracle Database.

Note: If a disk drive fails, then resilvering does not occur. You must run the syncvg command before you can reactivate the logical volume. Oracle supports resilvering for data files only. Do not disable mirror write consistency for redo log file

--------------------------------------

Paging space

Oracle documentation suggests the following values as a starting point for an Oracle Database:

RAM Swap Space
Between 1 GB and 2 GB 1.5 times the size of RAM
Between 2 GB and 16 GB Equal to the size of RAM
More than 16 GB 16 GB

--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------

MEMORY

In general, AIX support suggests AIX 7.1 defaults for Oracle.

--------------------------------------

Oracle Large Page Usage

The general recommendation for most Oracle databases on AIX is to utilize 64KB page size and not 16MB page size for the SGA.

AIX 6.1 and 7.1 support three or four page sizes, depending on the hardware: 4 KB (default), 64 KB (medium), 16 MB (large), and 16 GB(huge).
Page sizes 64 KB and 16 MB have been shown to benefit Oracle performance by reducing kernel lookaside processing to resolve virtual to physical addresses. Oracle 11g uses 64 KB pages for dataspaces by default. Oracle Automatic Memory Management (AMM) uses the 64KB page size by default for the SGA and database (with the exception of the TNS Listener). This is the suggested value, since it has been found that 64 KB pages yield nearly the same performance benefit as 16 MB pages and require no special management.

64 KB page size for data, text, and stack regions is useful in environments with a large (for example. 64 KB+) SGA and many online transaction processing (OLTP) users. For smaller Oracle instances, 4 KB is sufficient for data, text, and stack. 64 KB page use for data, text, and stack is implemented separately from 64 KB pages for the SGA, and is done by means of an environment variable exported on behalf of the Oracle user. AME by default uses 4k page size.
$ export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K oracle

--------------------------------------

SGA tuning

LOCK_SGA = FALSE is the default, this means that the SGA is not pinned in memory. AIX performance support generally suggests not to pin SGA. This is the suggested value, since it has been found that 64 KB pages provide nearly the same performance benefit as 16 MB pages, require no special management and minimize potential of negative impact of incorrectly configuring SGA size.

Some additional info:
Oracle versions prior to 10.2.0.4 will allocate only two types of pages for the SGA, 4KB and 16MB. The SGA initialization process during the startup of the instance will try to allocate 16MB pages for the shared memory if LOCK_SGA is set to TRUE. If the LOCK_SGA is set to FALSE, the 4KB page will be used and no pinning will occur.

The primary motivation for considering the pinning of SGA memory is to prevent Oracle SGA from ever being paged out. In a properly tuned Oracle on AIX environment there should not be any paging activity to begin with, so SGA related pages should stay resident in physical memory even without explicitly pinning them. In improperly configured or tuned environments where the demand for computational pages exceeds the physical memory available to them, SGA pinning will not address the underlying issue and will merely cause other computational pages (e.g. Oracle server or user processes) to be paged out. This can potentially have as much or more impact on overall Oracle performance as the paging of infrequently used SGA pages.

When we say that memory is "pinned" it actually means that the page table prohibits page stealing and swapping. In other words the page stealing daemon can not throw pages from this page table out and replace them with other pages.

If not done properly, Oracle SGA pinning and/or the use of large pages can potentially result in significant performance issues and/or system crashes. And, for many Oracle workloads, SGA pinning is unlikely to provide significant additional benefits. It should therefore only be considered where there is a known performance issue that could not be addressed through other options, such as VMM parameter tuning.

You can determine the SGA size by running the ipcs command as the oracle user.

Use the svmon command to monitor the use of pinned memory during the operation of the system. Oracle Database attempts to pin memory only if the LOCK_SGA parameter is
set to true. If the SGA size exceeds the size of memory available for pinning, then the portion of the SGA exceeding these sizes is allocated to ordinary shared memory.

svmon reports an "available" metric. This metric can be used to more easily determine how much remaining memory is available to applications. The available metric reports the amount additional amount of physical memory that can be used for applications without incurring paging. When the amount of available memory gets low, this is an indication that the system is close to paging.

# svmon -G -O unit=auto
Unit: auto
--------------------------------------------------------------------------------------
size inuse free pin virtual available mmode
memory 2.00G 578.04M 1.44G 430.34M 463.48M 1.47G Ded
pg space 512.00M 4.10M

work pers clnt other
pin 354.30M 0K 14.3M 61.8M
in use 463.48M 0K 114.56M

--------------------------------------

Oracle process memory footprint

The AIXTHREAD_SCOPE environment variable can be used for control if an AIX process runs with process-wide contention scope (the default) or with system-wide contention scope. System-wide contention scope significantly reduces the memory required for each database process. AIX operates most effectively with Oracle Database 12c and Oracle RAC when using system-wide contention scope (AIXTHREAD_SCOPE=S). Both AIX 7.1 and AIX 6.1 specify the default environmental variable of AIXTHREAD_SCOPE=S (1:1). Oracle recommends system wide scope (AIXTHREAD_SCOPE=S) so this environmental variable is no longer required to be specifically set.

Sone additional info:
The default value of the AIXTHREAD_SCOPE environment variable is P, which specifies process-wide contention scope. When using process-wide contention scope, Oracle threads are mapped to a pool of kernel threads. When Oracle is waiting on an event and its thread is swapped out, it may return on a different kernel thread with a different thread ID. Oracle uses the thread ID to post waiting processes, so it is important for the thread ID to remain the same. When using systemwide contention scope, Oracle threads are mapped to kernel threads statically, one to one. For this reason, Oracle recommends that you use systemwide contention. The use of systemwide contention is especially critical for Oracle Real Application Clusters (Oracle RAC) instances.

If you set systemwide contention scope, then significantly less memory is allocated to each Oracle process.

Bourne, Bash, or Korn shell:
Add to the ~/.profile or /usr/local/bin/oraenv script: AIXTHREAD_SCOPE=S; export AIXTHREAD_SCOPE

C shell:
Add to the ~/.login or /usr/local/bin/coraenv script: setenv AIXTHREAD_SCOPE S

--------------------------------------

AME

In the initial AME implementation 64k pages were not supported when AME was enabled which can have a significant impact on Oracle database performance, so the initial AME implementation was not certified for use with the Oracle database. When AME is enabled today, AIX always uses 4k page size instead of 64k page size for the Oracle database. Starting in AIX 7.2 TL1 or newer AIX supports 64K pages using a hardware compression engine. This is the what is currently being certified for use with Oracle database.

--------------------------------------

Virtual processor folding

This is a feature of Power Systems in which unused virtual processors are taken offline until the demand requires that they be activated. The default is to allow virtual processor folding, and this should not be altered without consulting AIX support. (schedo parameter vpm_fold_policy=2).

For Oracle database environments it is strongly suggested to set schedo parameter vpm_xvcpus to a value of 2 (schedo -p -o vpm_xvcpus=2) as we have seen AIX incorrectly folding too many processors if the parameter is left at default of 0. It is dynamic, not requiring reboot. This is a critical setting in a RAC environment when using LPARs with processor folding enabled. If this setting is not adjusted, there is a high risk of RAC node evictions under light database workload conditions.

This setting says that a minimum of 2 additional vp's will be online (e.g. not folded / disabled) at all times. With a shared processor systems using RAC, the minimum recommended value for vpm_vxcpus is 2, meaning there will be a minimum of 3 unfolded CPUs (the default 1 plus the 2 additional ones). This is recommended to avoid RAC reboot issues. A resource issue can be created when one Oracle process enters a tight loop polling on a fd and the Oracle process that is supposed to send to that fd does not get scheduled. Once that sending event occurs, things go back to normal and AIX housekeeping can run also.

--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------

NETWORK

These values are generally suggested for Oracle, and can be considered as starting points. Pls note all udp settings are specific for RAC, the RAC interconnect uses UDP for interprocess communications:
sb_max >= 1MB (1048576) and must be greater than maximum tpc or udp send or recvspace
tcp_sendspace = 262144
tcp_recvspace = 262144
udp_sendspace = db_block_size * db_file_multiblock_read_count
udp_recvspace= 10 * (udp_sendspace)
rfc1323 = 1 (see Recent suggestions and open issues)
tcp_fastlo = 1. This is new in AIX 7.1 (no –p –o tcp_fastlo=1). The tcp_fastlo default setting is off or ‘0’. (test it first)

Ephemerals (non-defaults suggested for a large number of connecting hosts or a high degree of parallel query; also to avoid install-time warnings)
tcp_ephemeral_low=32768
tcp_ephemeral_high=655535
udp_ephemeral_low=32768
udp_ephemeral_high=65535

Some additional consideration for RAC network as part of the 10 GigE:
LACP timeout: Use the “long timeout” switch setting for the amount of time to wait before sending LACPDUs.
Flow control: Enable flow control at the switch port and on the server side ports (using HMC) for the 10GE adapter or 10GE HEA configuration.
UDP tuning: Tune the udp_sendspace and udp_recvspace until there are no “socket buffer overflows” in netstat -s
Jumbo frames: Enable Jumbo frames on every hop (server side, switch side)

MTU adapter port specific settings will be overridden with setting ‘mtu_bypass = ON’. This is complemented with ‘tcp_pmtu_discover = 1’ for MTU path discovery.

Network tunables (with command):
# no -p -o udp_sendspace=262144; no -p -o udp_recvspace=655360; no -p -o tcp_sendspace=262144; no -p -o tcp_recvspace=262144
# no -p -o rfc1323=1; no -p -o sb_max=4194304; no -r -o ipqmaxlen=512 #(needs reboot); no -p -o use_isno=1

for each active network interface (i.e. en0, en1, en2 .. etc.):
# chdev -l enX -a state='up' -a rfc1323='1' -a tcp_mssdflt='1448' -a tcp_nodelay='1' -a tcp_recvspace='262144' -a tcp_sendspace='262144'

--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------
--------------------------------------

Oracle DB parameters

DB_BLOCK_SIZE
Specifies the default size of Oracle database blocks. This parameter cannot be changed after the database has been created, so it is vital that the correct value is chosen at the beginning. Optimal
DB_BLOCK_SIZE values vary depending on the application. (Typical values are 8KB for OLTP workloads and 16KB to 32KB for DSS workloads. If you plan to use a 2KB DB_BLOCK_SIZE with JFS2
file systems, be sure to create the file system with agblksize=2048.)

DB_BLOCK_BUFFERS or DB_CACHE_SIZE
The primary purpose of the DB buffer cache area(s) is to cache frequently used data (or index) blocks in memory in order to avoid or reduce physical I/Os to disk. In general, you want just enough DB buffer cache allocated to achieve the optimum buffer cache hit rate. Increasing the size of the buffer cache beyond this point may actually degrade performance due to increased overhead of managing the larger cache memory area.

DISK_ASYNCH_IO
AIX fully supports Asynchronous I/O for file systems (JFS, JFS2, GPFS and Veritas) as well as raw devices. This parameter should always be set to TRUE (the default value).

FILESYSTEMIO_OPTIONS
Setting the FILESYSTEMIO_OPTIONS parameter in the server parameter file to SETALL or DIRECTIO, enables CIO on JFS2 and DIO on JFS for all data file IO. Because the DIRECTIO setting disables asynchronous IO it should normally not be used. As a result of this 12c feature, you can place data files on the same JFS/JFS2 file system as the Oracle home directory and still use DIO or CIO for improved performance. (You should still place Oracle Database logs on a separate JFS2 file system for optimal performance.)

DB_WRITER_PROCESS and DBWR_IO_SLAVES
These parameters specify how many database writer processes are used to update the database disks when disk block buffers in database buffer cache are modified. Multiple database writers are often used to get around the lack of Asynchronous I/O capabilities in some operating systems, although it still works with operating systems that fully support Asynchronous I/O, such as AIX.
Normally, the default values for these parameters are acceptable and should only be overridden in order to address very specific performance issues and/or at the recommendation of Oracle Support.

SHARED_POOL_SIZE
An appropriate value for the SHARED_POOL_SIZE parameter is very hard to determine before statistics are gathered about the actual use of the shared pool. The good news is that starting with Oracle 9i, it is dynamic, and the upper limit of shared pool size is controlled by the SGA_MAX_SIZE parameter. So, if you set the SHARED_POOL_SIZE to an initial value and you determine later that this value is too low; you can change it to a higher one, up to the limit of SGA_MAX_SIZE. Remember that the shared pool includes the data dictionary cache (the tables about the tables and indexes), the library cache (the SQL statements and execution plans), and also the session data if the shared server is used. Thus, it is not difficult to run out of space. Its size can vary from a few MB to very large, such as 20 GB or more, depending on the applications’ use of SQL statements. It depends mostly on the number of tables in the databases; the data dictionary will be larger for a lot of tables and the number of the different SQL statements that are active or used regularly.

SGA_MAX_SIZE
Starting with Oracle 9i, the Oracle SGA size can be dynamically changed. It means the DBA just needs to set the maximum amount of memory available to Oracle (SGA_MAX_SIZE) and the initial values of the different pools: DB_CACHE_SIZE, SHARED_POOL_SIZE, LARGE_POOL_SIZE etc… The size of these individual pools can then be increased or decreased dynamically using the ALTER SYSTEM
command, provided the total amount of memory used by the pools does not exceed SGA_MAX_SIZE. If LOCK_SGA = TRUE, his parameter defines the amount of memory Oracle allocates at DB startup in “one piece”! Also, SGA_TARGET is ignored for the purpose of memory allocation in this case.

SGA_TARGET
SGA_TARGET specifies the total size of all SGA components. If SGA_TARGET is specified, then the following memory pools are automatically sized: Buffer cache (DB_CACHE_SIZE), Shared pool (SHARED_POOL_SIZE), Large pool (LARGE_POOL_SIZE), Java pool (JAVA_POOL_SIZE), Streams pool (STREAMS_POOL_SIZE)

MEMORY_TARGET, MEMORY_MAX_TARGET (11g)
MEMORY_TARGET specifies the Oracle system-wide usable memory. The database tunes memory to the MEMORY_TARGET value, reducing or enlarging the SGA and PGA as needed. If MEMORY_TARGET parameter is used, memory cannot be pinned. It is not recommended to use the MEMORY_TARGET parameter together with the SGA_MAX_SIZE and SGA_TARGET.

PGA_AGGREGATE_TARGET
PGA_AGGREGATE_TARGET specifies the target aggregate PGA memory available to all server processes attached to the instance. Setting PGA_AGGREGATE_TARGET to a nonzero value has the effect of
automatically setting the WORKAREA_SIZE_POLICY parameter to AUTO. This means that SQL working areas used by memory-intensive SQL operators (such as sort, group-by, hash-join, bitmap merge, and bitmap create) will be automatically sized. A nonzero value for this parameter is the default since, unless you specify otherwise, Oracle sets it to 20% of the SGA or 10 MB, whichever is greater.

PRE_PAGE_SGA
The setting for 12.1 defaults to ‘TRUE’ which allocates all segments to the maximum. Prior to 12.1 the default was set to ‘FALSE’. With setting this to true, all segments are allocated to the MAXIMUM. PRE_PAGE_SGA (at startup) will read “touching” all the memory pages. This can result in slower start up times but advantage is that all further requests to SGA memory are supposed to hit real physical memory and AIX will not need to do any additional allocations after startup.

It now takes more time for ANY ORACLE process to start as this “touching” of memory segments which is not done just during instance startup but also occurs for any new ORACLE process (i.e. a new database connection shadow process). The efficiency of this "touching" will depend on the page size used for the SGA. For example, an 80MB SGA using 64KB pages would need to "touch" 1250 pages whereas an SGA using 16MB pages would only need to "touch" 5 pages. To pin memory you set lock_sga to 'TRUE'. To use 16M pages one also needs to pin memory. If persistent memory usage issues are encountered overriding the default of pre_page_sga of ‘TRUE’ and setting it to ‘FALSE’ may be beneficial.
If you are planning to use the “In-Memory” feature of Oracle database 12c, 18c or 19c it is recommended to set the pre_page_sga = TRUE (default)

Adaptive Features
It has been found helpful to test turning this feature off to eliminate it as a cause of performance related issues.
Try setting: OPTIMIZER_ ADAPTIVE_FEATURES to FALSE.

--------------------------------------

↧

Article 1

June 3, 2020, 7:00 am

≫ Next: iSCSI (NetApp)

≪ Previous: Article 1

fcstat

The fcstat command reports statistics directly from the FC adapter firmware and the FC driver. Protocols such as TCP/IP are designed to tolerate packet loss and out-of-order packets with minimal disruption, but the FC protocol is in-tolerant of missing, damaged or out-of-order frames and is incapable of re-transmitting a single missing frame.

This moves error recovery into the SCSI layer and can result in waiting for commands to timeout. In some cases an error frame is not detected by either the target or the initiator, so it just waits for completion until 30 or 60 seconds to timeout. These are often the result of a physical layer problems such as a damaged fibre channel cable, faulty or degraded laser in SFP’s (in a storage controller, switch or host) or perhaps a failing a ASIC in a switch or a slow draining device causing frames to be discarded. Regardless of the cause, identifying and resolving fibre channel transport related problems are necessary before any I/O performance tuning is attempted.

It is also important to ensure the SCSI layer does not overwhelm the Target Ports or LUNs with excessive I/O requests. Increasing num_cmd_elems may result in driving more I/O to a storage device resulting in even worse I/O service times. (errpt, and iostat can help uncover some of these problems.) However acceptable I/O service time can differ. For example, some shops demand less than 2 ms service times where others may tolerate 11 ms. The disk technology affects expected I/O service time, as does the availability of write and/or read cache.

If queuing in the disk driver is occurring, (iostat shows non-zero value in qfull) this should be resolved first like increasing queue_depth, or adding additional storage resources (if io service times are too high). After ensuring there are no fibre channel physical layer problems, average I/O response times are in good range (not exceeding 15 ms) and there is no queuing (qfull) in the disk driver, then we can tune the adapter.

-----------------------------------

In normal way fcstat resets statistics when server is rebooted or the fcs device is reconfigured. fcstat -Z fcsX can be useful for daily monitoring because it resets statistics.

fcstat fcsX shows fc adapter statistics
fcstat -D fcsX shows additional fcs related details
fcstat -e fcsX shows all stats, which includes the device-specific statistics (driver statistics, link statistics, and FC4 types)
fcststat -Z fcsx resets statistics

-----------------------------------

root@aix1:/ # fcstat fcs0
FIBRE CHANNEL STATISTICS REPORT: fcs0
Device Type: 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
(adapter/pciex/df1000f114108a0)
Serial Number: 1C041083F7
Option ROM Version: 02781174
ZA: U2D1.11X4 <--firmware version
World Wide Node Name: 0x20000000C9A8C4A6 <--adapter WWN
World Wide Port Name: 0x10000000C9A8C4A6 <--adapter WWPN
FC-4 TYPES:
Supported: 0x00000120000000000000000000000000000000000000
Active: 0x00000100000000000000000000000000000000000000
Class of Service: 3
Port Speed (supported): 8 GBIT <--8Gb adapter
Port Speed (running): 8 GBIT <--running at 8Gb
Port FC ID: 0x6df640 <--adapter FC ID (first 2 digits after x will show switch id, here 6d)
Port Type: Fabric <--connected in Fabric
Attention Type: Link Up <--link status

Seconds Since Last Reset: 270300 <--adapter is collecting stats since this amount seconds

Transmit Statistics Receive Statistics
------------------- ------------------
Frames: 2503792149 704083655
Words: 104864195328 437384431872

LIP Count: 0
NOS Count: 0
Error Frames: 0 <--affects io when frames are damaged or discarded
Dumped Frames: 0   <--affects io when frames are damaged or discarded
Link Failure Count: 0
Loss of Sync Count: 8
Loss of Signal: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 31 <--fast increase may result in buffer to buffer credit problems, damaged FC frames, discards
Invalid CRC Count: 0   <--affects io when frames are damaged or discarded

...
Elastic buffer overrun count: 0 <--may occur with link failures

IP over FC Adapter Driver Information
No DMA Resource Count: 3207
No Adapter Elements Count: 126345

FC SCSI Adapter Driver Information
No DMA Resource Count: 3207   <--IOs queued at the adapter due to lack of DMA resources (increase max_xfer_size)
No Adapter Elements Count: 126345 <--IO was temporarily blocked/queued (increase num_cmd_elems)
No Command Resource Count: 133 <--there was no free cmd_elems (increase num_cmd_elems)

IP over FC Traffic Statistics
Input Requests: 0
Output Requests: 0
Control Requests: 0
Input Bytes: 0
Output Bytes: 0

FC SCSI Traffic Statistics
Input Requests: 6777091279
Output Requests: 2337796
Control Requests: 116362
Input Bytes: 57919837230920
Output Bytes: 39340971008

Adapter Effective max transfer value: 0x100000 <--value set in the kernel regardless of ODM (must be equal or greater than hdisk max_coalesce)

-----------------------------------

Port FC ID
We can get some information about the switch in hexa. Here 0x6df640, which are six hexa digits:
1st 2 digits after x: domain id of the SAN switch, we can call it "switch id" (here 6d)
2nd 2 digits after x: port ID (but could be some virtualized interpretation as well, here f6),
3rd 2 digits after x: loop id if in loop mode (00)

Checking "switch id", will show if ports of an FC adapter are connected to different fabrics (switches) or not. Keep in mind, that there may be more switches in a Fabric, so multiple "switch ids" are not guarantee for multiple Fabrics.

If we check a 4-port adapter, and if the first 2 hexa digits are the same, we can say that we are connected to the same switch.
fcs0: Port FC ID: 0xd1e6c0 <--Fabric 1 (switch id: d1)
fcs1: Port FC ID: 0xd1e7c0 <--Fabric 1 (switch id: d1)
fcs2: Port FC ID: 0x6de6c0 <--Fabric 2 (switch id: 6d)
fcs3: Port FC ID: 0x6de7c0 <--Fabric 2 (switch id: 6d)

Error frames, Dumped frames, Invalid CRC count:
These may be the result of a physical transport layer problem which may result in damaged fiber channel frames as they arrive at the adapter. These are usually not incrementing on frames being transmitted but rather frames received.

For each CRC errors, AIX will log an errpt entry indicating a damaged frame. CRC errors can occur anywhere in the fabric and are usually related to a bad sfp or bad FC cable. These errors will affect I/O processing for a single read or write operation but the driver will retry these. These are the most difficult to troubleshoot.

Link Failure Count, Loss of Sync Count, Loss of Signal:
It indicates the health of the physical link between the switch and the host HBA. If these error counters increase daily we generally suspect a problem with an sfp or FC cable between the switch and the FC HBA. These can affect I/O processing on the host.

Invalid Tx Word Count:
These are incremented when the HBA receives damaged words from the switch. In many cases this will not affect I/O processing but is an early indication of a problem. On certain switch models this may be due to an improper port fill word setting. If not, this may indicate a bad sfp or cable between the HBA and the switch. This error counter is only relevant for communications at the physical layer / Tx / Rx between the switch and the HBA.

Elastic buffer overrun count:
This counter could increment due to Link Failure Count, Loss of Sync Count, Loss of Signal, Invalid Tx Word Count or old unsupported host HBA adapter firmware levels.

-----------------------------------

No DMA Resource Count:
It means additional I/O DMA memory is needed to initiate (larger) I/O’s from the adapter. When the adapter driver is unable to initiate an I/O request due to no free DMA resource, the "No DMA Resource" counter is incremented and the I/O request waits. Increasing max_xfer_size can help in this situation.

No Adapter Elements Count:
number of times since boot, an IO was temporarily blocked due to an inadequate num_cmd_elems. If it shows non-zero values increaseing num_cmd_elems can help.

No Command Resource Count:
When the adapter driver is unable to initiate an I/O request due to no free cmd_elems (num_cmd_elems), the "No Command Resource" counter is incremented and the I/O request waits for adapter buffer resources (checking for free command elements for the adapter). Resources will be available when a currently running I/O request is completed. Increasing num_cmd_elems can help to avoid this situation.

If the "No Command Resource Count" and/or the "No DMA Resource Count" continues to increment, (and the max_xfer_size and num_cmd_elems are set to maximum values), then the adapter I/O workload capability has been exceeded. In this case I/O load should be reduced by moving load to additional resources, like adding additional FC adapters and balancing the I/O work load. Another workaround would be to reduce the num_cmd_elems.

-----------------------------------

fcstat -D fcsX can display additional info:
(Values preceded by a 0x are in hex. All values below are reported in hex, not decimal.)

Driver Statistics:
Number of interrupts: 76534
Number of spurious interrupts: 0
Long term DMA pool size: 0x800000
I/O DMA pool size: 0x1000000 <--currently active I/O DMA pool size in the driver

FC SCSI Adapter Driver Queue Statistics <--adapter driver
Number of active commands: 0
High water mark of active commands: 11
Number of pending commands: 0
High water mark of pending commands: 1
Number of commands in the Adapter Driver Held off queue: 0
High water mark of number of commands in the Adapter Driver Held off queue: 0

FC SCSI Protocol Driver Queue Statistics <--protocol driver
Number of active commands: 0
High water mark of active commands: 11
Number of pending commands: 4
High water mark of pending commands: 5

Number of active commands:
Represents the I/O workload”. Active commands are commands that have left the adapter driver and have been handed off to the adapter hardware for transport to the end device. These commands have not received a completion status and are considered active.

High watermark of active commands:
The "high water mark of active commands" represents the peak (highest) number of active commands. If I/O service times are low and if the high water mark of active commands is around the num_cmd_elems then increasing the num_cmd_elems may improve I/O performance. In certain error recovery scnerios the "high water mark of active commands" could increase up to the num_cmd_elems limit. When tuning, clear these counters and monitor them for few days, that there are no errors.

High watermark of pending commands:
The "high water mark of pending commands" represents the peak (highest) number of pending commands. (These are pending because the number of active commands reached the num_cmd_limits and the additional commands above that limit are pending.)

If high water mark for active + pending is near to or is exceeding the num_cmd_elems, we recommend increasing num_cmd_elems to cover this water mark to improve the IO performance. Rule to follow: num_cmd_elems > (High water mark for active commands + High water mark for pending commands)

The increase for num_cmd_elems is always recommended to be done gradually until 'No Command Resource Count' counter stops increasing.

If with large sequenial IOs (like backups), there are high avg read and write service timees and number of active/peak commands are also high (but there are no physical layer problems, no queuing in the adapter and disk) then the storage server is unable to service these I/O requests in a timely manner or the I/O load is greater than the LUN / storage controller capability (like handling within a ~15ms window). Solution could be adding additional storage resources, like distributing the I/O work load to additional LUNs and/or storage controllers

-----------------------------------

Link to some IBM desctiptions: https://www.ibm.com/support/pages/node/6198385

-----------------------------------

Adabter busy %

There are no busy% for adapters in AIX. They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.
So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....

There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on-board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing.

-----------------------------------

↧

iSCSI (NetApp)

April 5, 2023, 1:57 pm

≫ Next: RPM - DNF

≪ Previous: Article 1

iSCSI

iSCSI (Internet SCSI) provides access to storage devices by carrying SCSI commands over a TCP/IP network. iSCSI was developed by IBM and Cisco in 1998 and submitted as a draft standard in March 2000.

NetApp is one of the leader company in storage hardware industry. In the early 1990s, NetApps's storage systems offered NFS and SMB protocols (based on TCP/IP) and in 2002 NetApp added Fibre Channel (FC) and iSCSI protocols. iSCSI protocol is configured to use TCP port number 3260.

The iSCSI protocol allows clients (called initiators) to send SCSI commands to storage devices (targets) on remote servers. It competes with Fibre Channel, but unlike Fibre Channel which usually requires dedicated cabling, iSCSI can be run over long distances using existing network infrastructure.

Differences between iSCSI, NAS and SAN

NAS and iSCSI are both using TCP/IP (LAN), but the protocols are different. NAS is using NFS and CIFS/SMB, but iSCSI is using a variant of the SCSI protocol, so iSCSI has a close relationship with SAN. Actually, only the transmission medium is different, because for SAN, the SCSI protocol is packaged in Fibre Channel, for iSCSI it is packaged in TCP/IP. In both cases, blocks are transferred, iSCSI is therefore not file-based (like NAS), but a block-based transmission, (so we will see hdisk devices in AIX). (Many NAS systems can offer file-based services such as SMB/CIFS and NFS, but also can offer block-based iSCSI.)

--------------------------------------------

Initiator / Target / IQN

To transport SCSI commands over the IP network, an iSCSI driver must be installed, which creates the initiator (the iscsi0 device). In AIX 7.2TL3, MPIO (Multipathing) support for iSCSI software initiator is introduced.

iSCSI targets are devices that respond to iSCSI commands. An iSCSI device can be a storage device, or it can be an intermediate device such as a bridge between IP and Fibre Channel devices.

iSCSI Initiator <--client (LUNs are mapped here), connected to a storage device (via Net. Switch), sends SCSI commands to SCSI target

iSCSI Target <--server (storage system), responds to iSCSI commands and exports local volumes (LUNs) to the initiator node.

Each initiator and target has a unique iSCSI name: iSCSI qualified name (IQN). AnIQN represents a worldwide unique name for each initiator or target in the same way that worldwide node names (WWNNs) are used to identify devices in a Fibre Channel fabric. IQN has a reversed hostname format like: iqn.2018-06.com.ibm.stglabs.aus.p9fwc-iscsi2.hostid.2

Some notes:

Performance considerations:

https://www.ibm.com/docs/en/aix/7.2?topic=considerations-iscsi-performance

VG Considerations:

https://www.ibm.com/docs/en/aix/7.2?topic=target-iscsi-software-initiator-considerations

Configure volume groups on iSCSI devices, to be in an inactive state after reboot and manually activate the iSCSI-backed volume groups.Volume groups are activated during a different boot phase than the iSCSI software driver, for this reason, it is not possible to activate iSCSI volume groups during the boot process.

-----------------------------------------------

sanlun lun show short overview about the Netapp LUNs (sanlun command comes with Netapp Utility package)

sanlun lun show -v detailed overview about the Netapp LUNs

lsmpio -l hdisk81 shows path details (shows which path using which Controller IP)

lsmpio -ql hdisk81 it is a very good command to check if LUN is really there or not !!!!!!!!!!!!

(it does not use ODM, it makes a real IO operation to reach the LUN and get some details)

lsmpio -Sl hdisk81 shows statistics (Adapter/SCSI errors....)

lsattr -El iscsi0 show settings of iscsi0

lsdev -p iscsi0 show LUNs of iscsi0

lsiscsi show target ip, iqn, port

iscsi_st -f -i 10.10.128.15 show target IQN info and tcp ports (IP is the target IP)

netstat -an | grep 3260 shows which IP is used locally and to which IP it is connected remote (target)

tcpdump -i en1 host 10.10.128.15 check if iscsi packates appear in output (IP is the target IP)

on VIOS check paths for all LUNs which are used as vsscsi devices:

for i in `/usr/ios/cli/ioscli lsmap -all | grep hdisk | awk '{ print $3 }'`; do lsmpio -l $i; done

lspath check of a LUN (use the whole "connection" column for the below command):

lspath -l hdisk81 -HF "name path_id parent connection path_status status"

more info about the path:

lspath -AHE -l hdisk81 -p iscsi0 -w "iqn.1782-08.com.netapp:sn.e1787cbba71411eab2b6d939eb1588a7:vs.29,10.10.128.15,0xcbb,0x50000000000000"

for enabling failed paths:

lspath | grep -v Ena | awk '{print "chpath -s enable -l " $2 " -p " $3 }'

-----------------------------------------------

iSCSI related filesets on AIX (with Netapp Storage):

# lslpp -l | grep -i iscsi

NetApp.MPIO_Host_Utilities_Kit.iscsi Kit iSCSI Disk ODM Stanzas

devices.common.IBM.iscsi.rte 7.2.5.200 APPLIED Common iSCSI Files

devices.iscsi.disk.rte 7.2.5.0 APPLIED iSCSI Disk Software

devices.iscsi.tape.rte 7.2.0.0 COMMITTED iSCSI Tape Software

devices.iscsi_sw.rte 7.2.5.200 APPLIED iSCSI Software Device Driver

devices.pci.14102203.diag 7.2.0.0 COMMITTED IBM 1 Gigabit-TX iSCSI TOE

devices.pci.14102203.rte 7.2.0.0 COMMITTED IBM 1 Gigabit-TX iSCSI TOE

devices.pci.1410cf02.diag 7.2.0.0 COMMITTED 1000 Base-SX PCI-X iSCSI TOE

devices.pci.1410cf02.rte 7.2.0.0 COMMITTED 1000 Base-SX PCI-X iSCSI TOE

devices.pci.1410d002.diag 7.2.4.0 COMMITTED 1000 Base-TX PCI-X iSCSI TOE

devices.pci.1410d002.rte 7.2.0.0 COMMITTED 1000 Base-TX PCI-X iSCSI TOE

devices.pci.1410e202.diag 7.2.0.0 COMMITTED IBM 1 Gigabit-SX iSCSI TOE

devices.pci.1410e202.rte 7.2.0.0 COMMITTED IBM 1 Gigabit-SX iSCSI TOE

devices.pci.77102e01.diag 7.2.0.0 COMMITTED 1000 Base-TX PCI-X iSCSI TOE

devices.pci.77102e01.rte 7.2.0.0 COMMITTED PCI-X 1000 Base-TX iSCSI TOE

-----------------------------------------------

iSCSI configuration

(there are 2 types of dicovery methods: file based or odm based. I used odm based below, for file based add the iscsi target info in ‘/etc/iscsi/targets’ file)

0. for MPIO (at least) 2 IPs should be configured in different subnets (these are our paths and 2 additional IPs are needed from Storage Team at Storage side too)

1. check iscsi drivers and ping target (by default port 3260 is used,not blocked by firewall)

# lslpp -l | grep -i iscsi ; ping <target>

2. set iqn for iscsi0, lsattr -El iscsi0 will show this value (storage team can give the initiator details)

# chdev -l iscsi0 -a initiator_name=iqn.2018-06.com.ibm.my_host1:0ab116bb

3. check discovery policy, I used odm (if needs to be changed: chdev -a disc_policy=odm -l iscsi0)

# lsattr -El iscsi0 | grep disc_policy

4. mkiscsi commands will add iSCSI target data to ODM, for each path needs an mkiscsi command

# mkiscsi -l iscsi0 -g static -t iqn.1986-03.com.ibm:2145.d59-v7k2.node1 -n 3260 -i 10.10.10.10

# mkiscsi -l iscsi0 -g static -t iqn.1986-03.com.ibm:2145.d59-v7k2.node2 -n 3260 -i 10.10.20.10

(the details should be given by storage team, -i iSCSI target IP -t iSCSI target name)

5. discover disks

cfgmgr -vl iscsi0

-----------------------------------------------

Netapp recommendations for iSCSI LUNs

https://docs.netapp.com/us-en/ontap-sanhost/hu_aix_72.html#installing-the-aixvios-host-utilities

-----------------------------------------------

Reset interface:

If there are problems, resetting the interface used for iscsi may help:

ifconfig en0 down

rmdev -l en0

cfgmgr

For info:

Once we had many disk/io/path errors in errpt, which came hourly/daily and we checked with storage team and they did not find any erros, we did reset but it also did not help.

We asked Network team, and they saw CRC errors on Network switch port, and changing cable/SFP at their side helped.

-----------------------------------------------

↧

RPM - DNF

January 25, 2024, 5:30 am

≪ Previous: iSCSI (NetApp)

DNF (Dandified YUM)

DNF is the next version of YUM (YUM is a package manager for RPMs).

dnf roughly maintains CLI compatibility with yum and existing AIX Toolbox repositories created for yum are working good with dnf too, so no changes needed in repository side.

yum is based on python2 and python2 is out of support, so there was a need to move to python3, dnf works with python3.

AIX Toolbox News: https://www.ibm.com/support/pages/node/6833478

DNF install: https://community.ibm.com/community/user/power/blogs/sangamesh-mallayya1/2021/05/28/dnf-is-now-available-on-aix-toolbox

DNF config details: https://developer.ibm.com/tutorials/awb-configuring-dnf-create-local-repos-ibm-aix/

Just in case if all rpm need to be removed: https://community.ibm.com/community/user/power/blogs/jan-harris1/2022/05/25/destroyrpms

On AIX dnf uses the repository conf file: /opt/freeware/etc/dnf/dnf.conf

[AIX_Toolbox]

name=AIX generic repository

baseurl= file:///export/rpms/AIX_Toolbox/

enabled=1

...

[AIX_Toolbox_73]

name=AIX 7.3 specific repository

baseurl= file:///export/rpms/AIX_Toolbox_73/

enabled=0

[CUST_RPMS]

name=Customer specific RPMs

baseurl= file:///export/rpms/CUST_RPMS

enabled=1

By default dnf will cache data to the /var/cache/dnf directory, such as package and repository data. This speeds up dnf so that it doesn’t have to keep querying this information from the Internet.

There are times when you may want to delete this cached data, such as if a repository has updated packages but your system has incorrect or stale cached data which may cause various problems when attempting to install a package: dnf clean all

The dnf cache will be automatically built up over time when you perform various dnf queries such as installing or updating packages, however we have the option to manually make the cache so that future actions will be quicker with the ‘makecache’ argument: dnf makecache

-------------------------------------

DNF options

Special options can be used with the dnf commands, for example: dnf install httpd --verbose

some dnf options:

--cacheonly <--run from system cache, don’t update the cache and use it even it is expired. (DNF uses a separate cache for each user, the root user cache is the system cache. )

--disablerepo / enablerepo=<repoid> <--temporarily disable/enable active repositories for the purpose of the current dnf command.

--downloadonly <--download without performing any rpm transaction (install/upgrade/erase).

--downloaddir / destdir=<path> <-- download packages to this dir (it has to be used with --downloadonly, the download, modulesync, reposync or system-upgrade commands (dnf-plugins-core).

--exclude=<package-file-spec> <--exclude packages specified by <package-file-spec> from the operation.

--nogpgcheck <--skip checking GPG signatures on packages (if RPM policy allows).

--refresh <--set metadata (cache) as expired before running the command.

--repo / repoid=<repoid> <--enable just specific repositories by an id or a glob.

--showduplicates <--show duplicate packages in repositories. Applicable for the list and search commands.

--verbose <--Verbose operation, show debug messages.

--version <--Show DNF version and exit. (-v can be used aswell)

--assumeyes <--Automatically answer yes for all questions. (-y can be used as well)

--setopt=<parameter=value> <--override the configuration file (for example: --setopt=sslverify=false)

-------------------------------------

DNF Plugins

The core DNF functionality can be extended with plugins.

There are officially supported Core DNF plugins and also third-party Extras DNF Plugins.

dnf-plugins-core can be installed on Linux to have Core DNF plugins: download, repomange, reposync...

dnf download ... <--download binary (rpm) or source packages without installing it (!!! this is missing on AIX)

dnf repomanage ... <--repomanage prints newest or older packages in a repository specified by <path> for easy piping to xargs or similar programs.

dnf repodiff ... <--list of differences between two or more repositories

dnf reposync ... <--makes local copies of remote repos (sync a remote repo to a local dir, packages that are already present locally are not downloaded again)

On AIX most of plugins are in dnf-utils. dnf-plugins-core on AIX is not a real rpm, it creates dependencies for example to this: python3.9-dnf-plugins-core-4.0.16-32_52.aix7.2.ppc.rpm, which contains python files of repodiff, reposync...

#dnf repoquery -l dnf-utils

...

/opt/freeware/bin/repodiff

/opt/freeware/bin/repomanage

/opt/freeware/bin/repoquery

/opt/freeware/bin/reposync

/opt/freeware/libexec/dnf-utils

!!! createrepo command comes from the separate createrepo_c package not as dnf plugin (it is not in dnf-utils)

======================================================

/opt/freeware/etc/dnf/dnf.conf <--main dnf configuration file (dnf commands by default use this file)

dnf repolist <--list repositories

dnf search bash* <--list packgages starting with bash... in the repo

dnf list bash* <--list packgages starting with bash... which are installed + in the repo

dnf list installed <--list installed packages (a package is installed if it is in RPMDB, same as rpm -qa)

dnf list installed x* <--list installed packages starting with "x"

dnf list available <--list pacckages that are available to install (a package is available if it is not installed but present in a repo)

dnf list upgrades <--list updates available for installed packages ("update", "updates" are depreciated)

dnf check-upgrade <--same as above

Officially "installed", "available"... actions should have 2 dash (--) in front, like --installed, --available...

A dnf list command should look like: dnf [options] list --installed (e.g.: dnf --config=/tmp/dnf.conf.remote list --upgrades)

By default "dnf list" uses the "--all" option, which list all packages present in rpmdb, in a repo or both (installed + available = full repo content)

dnf install <package> <--install a package + dependencies (more packages: dnf install package1 package2 …)

dnf install <package> -y <--install a package without asking anything before install (assumes yes)

dnf install <package> -v <--install with verbose output

dnf install <package_name>-<version_info> <--install a specific version (like: dnf install gcc-6.3.0-1)

dnf localinstall </path/to/package> <--install a package from local path instead of a repository

dnf install httpd-1.4.rpm <--install a local rpm file with dnf

dnf reinstall httpd <--if a package has a problem it can be reinstalled

dnf remove <package> <--remove a package

dnf upgrade <--upgrade all possible installed packages ("update" is depreciated)

dnf upgrade <package> <--upgrde a package (with its dependencies if needed)

dnf upgrade -x httpd <--exclude httpd packeage from the update

dnf downgrade <package> <--downgrade a package

dnf history <--lists history of yum actions (same as yum history list all)

dnf history info <transaction_ID> <--gives details about the specified history transaction id

dnf history undo <transaction_ID> <--roll backs the given tranaction id

dnf info bash <--show infow about specific package

dnf provides /opt/freeware/bin/bash <--list the package which provides that file (command) ("dnf repoquery --file..." or "rpm -qf ..." show some info as well)

dnf repoquery -l bash <--list files in a package (--list or "rpm -ql bash" is the same if it is already installed)

dnf makecache <--download and caches metadata for repositories

dnf makecache --refresh

dnf clean all <--cleans up cache

dnf --config /tmp/dnf.conf check-upgrade <--list for available updates for our installed packages (it will not do the update)

dnf --disablerepo="*" --enablerepo="epel" list available <--lists packages only in a specific repo (use output of "yum repolist" )

dnf --disablerepo=* --enablerepo=LIVE* list Centrify* <--lists installed and available packages from LIVE* repos

Plugins:

createrepo --checksum sha --update /etc/repo <--update repo after a new package is copied there (dnf createrepo ... should work as well)

createrepo --quiet --update --skip-stat /export/rpms/AIX_Toolbox_72

--quiet <--run quietly

--update <--if metadata exists and rpm is unchanged (based on file size and mtime) then reuse the existing metadata rather then recalculating it

--skip-stat <--skip stat() function call on files using --update (assumes if the file name is the same the the file is still the same)

dnf download httpd <--!!NOT on AIX!!! download the rpm without installing it (download plugin can be installed: dnf install dnf-plugins-core)

dnf repomanage --old /export/rpms/AIX_Toolbox <--list older packages

dnf repomanage --new /export/rpms/AIX_Toolbox <--list newest packages

download (sync) a repo locally from a conf file (this conf file contains the ibm repo address):

dnf reposync --newest-only --downloadcomps --download-metadata --download-path=/export/rpms --config=/tmp/dnf.ibm.conf --repoid=AIX_Toolbox_72 --arch=ppc

same as above just --urls will not download anything it will just show the urls for the packages:

dnf reposync --newest-only --downloadcomps --download-metadata --download-path=/home/tmp/dnf --config=/tmp/dnf.ibm.conf --repoid=AIX_Toolbox --arch=ppc --urls

dnf reposync \

--newest-only \ <--download only newest packages per repo

--downloadcomps \ <--download and uncompress comps.xml. Consider using --download-metadata which downloads all available repo metadata

--download-metadata \ <--download repository metadata. Downloaded copy is instantly usable as a repository, no need to run createrepo_c on it

--download-path=/export/rpms/ \ <--path under which the downloaded repositories are stored

--config=/opt/freeware/etc/dnf/dnf.conf.remote \ <--config file to use

--repoid=AIX_Toolbox_72 <--which repo to synchronize

--arch=ppc <--download packages of given architectures

repodiff

dnf repodiff --repofrompath=o,file:///export/rpms/AIX_Toolbox/ --repofrompath=n,https://public.dhe.ibm.com/aix/aixtoolbox/RPMS/ppc/ --repo-old=o --repo-new=n

repodiff without ssl verification:

dnf repodiff --setopt=sslverify=false --repofrompath=o,file:///export/rpms/AIX_Toolbox/ --repofrompath=n,https://public.dhe.ibm.com/aix/aixtoolbox/RPMS/ppc/ --repo-old=o --repo-new=n

↧