Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 9 hours 2 min ago

Error getting repository data for ol6_x86_64_userspace_ksplice, repository not found

Thu, 2020-05-28 01:58

During ODA deployment I could see that starting 18.7, immediately after reimaging or patching the ODA, I was getting some regular errors in the root mail box. The error message came every hour at 13 minutes and 43 minutes.

Problem analysis

ksplice is now implemented and use in ODA Version 18.7. It is an open-source extension of the Linux kernel that allows security patches to be applied to a running kernel without the need for reboots, avoiding downtimes and improving availability.

Unfortunately, there is some implementation bug and an email alert is generated every 30 mins in the root linux user mailbox.

The message error is the following one :
1 Cron Daemon Mon Jan 6 18:43 26/1176 "Cron export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && [ -x /usr/bin/ksplice ] && (/usr/bin/ksplice --cron user upgrade; /usr/bin/ksp"
 
 
Message 910:
From root@ODA01.local Mon Jan 6 17:43:02 2020
Return-Path:
Date: Mon, 6 Jan 2020 17:43:02 +0100
From: root@ODA01.local (Cron Daemon)
To: root@ODA01.local
Subject: Cron export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && [ -x /usr/bin/ksplice ] && (/usr/bin/ksplice --cron user upgrade; /usr/bin/ksplice --cron xen upgrade)
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
X-Cron-Env:
Status: R
 
Error getting repository data for ol6_x86_64_userspace_ksplice, repository not found

During my troubleshooting, I could find some community discussion : https://community.oracle.com/thread/4300505?parent=MOSC_EXTERNAL&sourceId=MOSC&id=4300505
This is considered by oracle as a bug in the 18.7 Release and tracked under Bug 30147824 :
Bug 30147824 – ENABLING AUTOINSTALL=YES WITH KSPLICE SENDS FREQUENT EMAIL TO ROOT ABOUT MISSING OL6_X86_64_USERSPACE_KSPLICE REPO

Workaround

Waiting for a final solution, following workaround can be implemented. Oracle Workaround is just not to execute ksplice.

[root@ODA01 ~]# cd /etc/cron.d
 
[root@ODA01 cron.d]# ls -l
total 20
-rw-r--r--. 1 root root 113 Aug 23 2016 0hourly
-rw-r--r--. 1 root root 818 Dec 18 19:08 ksplice
-rw-------. 1 root root 108 Mar 22 2017 raid-check
-rw-------. 1 root root 235 Jan 25 2018 sysstat
-rw-r--r--. 1 root root 747 Dec 18 19:08 uptrack
 
[root@ODA01 cron.d]# more ksplice
# Replaced by Ksplice on 2019-12-18
# /etc/cron.d/ksplice: cron job for the Ksplice client
#
# PLEASE DO NOT MODIFY THIS CRON JOB.
# Instead, contact Ksplice Support at ksplice-support_ww@oracle.com.
#
# The offsets below are chosen specifically to distribute server load
# and allow for Ksplice server maintenance windows. This cron job
# also only contacts the Ksplice server every Nth time it runs,
# depending on a load control setting on the Ksplice server.
#
# If you would like to adjust the frequency with which your
# systems check for updates, please contact Ksplice Support at
# ksplice-support_ww@oracle.com
13,43 * * * * root export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && [ -x /usr/bin/ksplice ] && (/usr/bin/ksplice --cron user upgrade; /usr/bin/ksplice --cron xen upgrade)
 
[root@ODA01 cron.d]# mv ksplice /root/Extras/
 
[root@ODA01 cron.d]# ls -l
total 16
-rw-r--r--. 1 root root 113 Aug 23 2016 0hourly
-rw-------. 1 root root 108 Mar 22 2017 raid-check
-rw-------. 1 root root 235 Jan 25 2018 sysstat
-rw-r--r--. 1 root root 747 Dec 18 19:08 uptrack
 
[root@ODA01 cron.d]# ls -l /root/Extras/ksplice
-rw-r--r--. 1 root root 818 Dec 18 19:08 /root/Extras/ksplice

Cet article Error getting repository data for ol6_x86_64_userspace_ksplice, repository not found est apparu en premier sur Blog dbi services.

Documentum Upgrade – Switch from Lockbox to AEK key

Wed, 2020-05-27 13:00

As you probably know already, Documentum removed the support for RSA Libraries and RSA Lockbox starting with Documentum version 16.7. This means that if you are planning to upgrade to 16.7 or higher versions, you will first need to remove the lockbox from your installation and extract your AEK key from it before starting the upgrade process. To be completely exact, upgrading to 16.7 or above in place (using VMs for example) is normally going to do that for you but if you are using the docker images from OpenText, it’s not going to happen. Therefore, it is always good to know how to do it anyway.

The process to “downgrade” from the Lockbox to the AEK key (the other way around wasn’t really an upgrade in terms of security anyway…) is pretty simple and if I’m not mistaken, it is now in the Upgrade & Migration Guide of Documentum 16.7. This can be prepared while the repository is running but it will require a quick restart to be applied. If you are facing any issue, you can also go back to the Lockbox first and figuring out what the issue is later.

It’s very simple and straightforward to extract the AEK key from the Lockbox but there is one requirement to met. Indeed, you will need the latest version of the dm_crypto_create utility which has been packaged starting with:

  • Documentum 7.2 P42
  • Documentum 7.3 P23
  • Documentum 16.4 P10

If you are using an older version or an older patch, you will first need to upgrade/patch to one of these versions. It **might** be possible to just take the binary from one of these patches and use it on older versions of Documentum but that is probably not supported and it would need to be tested first to make sure it doesn’t break in the process. If you want to test that, just make sure to use the correct version of Lockbox libraries (version 3.1 for CS 7.3 and lower // version 4.0 for CS 16.4).

Once you are with the correct version/patch level, the extraction is just one simple command. Therefore, it can be automated easily in the upgrade process. Before starting, let’s setup/prepare the environment and making sure all is currently working with the Lockbox. I’m using a demo environment I built a few weeks on Kubernetes to show how it works:

[dmadmin@cs-0 ~]$ cd $DOCUMENTUM/dba
[dmadmin@cs-0 dba]$
[dmadmin@cs-0 dba]$ cp -R secure secure_$(date "+%Y%m%d")
[dmadmin@cs-0 dba]$
[dmadmin@cs-0 dba]$ cd secure
[dmadmin@cs-0 secure]$ ls -ltr
total 24
-rw-rw-r-- 1 dmadmin dmadmin 3750 Mar 30 13:30 lockbox.lb
-rw-rw-r-- 1 dmadmin dmadmin    3 Mar 30 13:30 lockbox.lb.FCD
drwxrwx--- 2 dmadmin dmadmin  152 Mar 30 17:41 ldapdb
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ gr_repo="GR_REPO1"
[dmadmin@cs-0 secure]$ s_ini="$DOCUMENTUM/dba/config/${gr_repo}/server.ini"
[dmadmin@cs-0 secure]$ 
[dmadmin@cs-0 secure]$ grep "crypto" ${s_ini}
crypto_mode = AES256_RSA1024_SHA256
crypto_keystore = Local
crypto_lockbox = lockbox.lb
crypto_keyname = CSaek
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ lb_name=$(grep "^crypto_lockbox" ${s_ini} | sed 's,crypto_lockbox[[:space:]]*=[[:space:]]*,,')
[dmadmin@cs-0 secure]$ aek_name=$(grep "^crypto_keyname" ${s_ini} | sed 's,crypto_keyname[[:space:]]*=[[:space:]]*,,')
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ echo "Lockbox Name: ${lb_name} -- AEK Name: ${aek_name}"
Lockbox Name: lockbox.lb -- AEK Name: CSaek
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ read -s -p "  --> Please put here the Lockbox Passphrase and press Enter: " lb_pp; echo
  --> Please put here the Lockbox Passphrase and press Enter:
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ read -s -p "  --> Please put here the AEK Passphrase and press Enter: " aek_pp; echo
  --> Please put here the AEK Passphrase and press Enter:
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ # If the below command returns '0', then the Lockbox name, AEK passphrase and the AEK Key name are correct
[dmadmin@cs-0 secure]$ # but it doesn't really test the Lockbox passphrase
[dmadmin@cs-0 secure]$ dm_crypto_create -lockbox ${lb_name} -lockboxpassphrase ${lb_pp} -keyname ${aek_name} -passphrase ${aek_pp} -check


Key - CSaek uses algorithm AES_256_CBC.

** An AEK store with the given passphrase exists in lockbox lockbox.lb and got status code returned as '0'.
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ # If the below command returns 'Reset host done', then the Lockbox name and Lockbox passphrase are both correct
[dmadmin@cs-0 secure]$ dm_crypto_manage_lockbox -lockbox ${lb_name} -lockboxpassphrase ${lb_pp} -resetfingerprint
Lockbox lockbox.lb
Lockbox Path $DOCUMENTUM/dba/secure/lockbox.lb
Reset host done
[dmadmin@cs-0 secure]$

 

Once everything is ready & verified, extracting the AEK key is a piece of cake:

[dmadmin@cs-0 secure]$ # AEK passphrase isn't needed to extract the AEK key
[dmadmin@cs-0 secure]$ dm_crypto_create -lockbox ${lb_name} -lockboxpassphrase ${lb_pp} -keyname ${aek_name} -removelockbox -output ${aek_name}


Retrieved key 'CSaek' from lockbox 'lockbox.lb' and stored as '$DOCUMENTUM/dba/secure/CSaek'
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ ls -ltr
total 24
-rw-rw-r-- 1 dmadmin dmadmin 3750 Mar 30 13:30 lockbox.lb
-rw-rw-r-- 1 dmadmin dmadmin    3 Mar 30 13:30 lockbox.lb.FCD
drwxrwx--- 2 dmadmin dmadmin  152 Mar 30 17:41 ldapdb
-rw-r----- 1 dmadmin dmadmin  144 May  8 21:52 CSaek
[dmadmin@cs-0 secure]$

 

When it’s done, a new file has been created with the name specified in the “-output” parameter (${aek_name} above, meaning “CSaek“). The only remaining step is reflecting this change in the server.ini and restarting the repository:

[dmadmin@cs-0 secure]$ grep "crypto" ${s_ini}
crypto_mode = AES256_RSA1024_SHA256
crypto_keystore = Local
crypto_lockbox = lockbox.lb
crypto_keyname = CSaek
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ sed -i 's,^crypto_lockbox,#&,' ${s_ini}
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ grep "crypto" ${s_ini}
crypto_mode = AES256_RSA1024_SHA256
crypto_keystore = Local
#crypto_lockbox = lockbox.lb
crypto_keyname = CSaek
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ 
[dmadmin@cs-0 secure]$ $DOCUMENTUM/dba/dm_shutdown_${gr_repo}
Stopping Documentum server for repository: [GR_REPO1]


        OpenText Documentum iapi - Interactive API interface
        Copyright (c) 2018. OpenText Corporation
        All rights reserved.
        Client Library Release 16.4.0200.0080

Connecting to Server using docbase GR_REPO1.GR_REPO1
[DM_SESSION_I_SESSION_START]info:  "Session 010f123450009905 started for user dmadmin."

Connected to OpenText Documentum Server running Release 16.4.0200.0256  Linux64.Oracle
Session id is s0
API> shutdown,c,T,T
...
OK
API> exit
Bye
Waiting for 90 seconds for server pid, 4188, to disappear.

Fri May  8 21:53:39 UTC 2020: Waiting for shutdown of repository: [GR_REPO1]
Fri May  8 21:53:39 UTC 2020: checking for pid: 4188

Fri May  8 21:53:49 UTC 2020: Waiting for shutdown of repository: [GR_REPO1]
Fri May  8 21:53:49 UTC 2020: checking for pid: 4188

repository: [GR_REPO1] has been shutdown
checking that all children (4214 4218 4219 4263 4305 4348 4542 4557 4584 4766) have shutdown
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ $DOCUMENTUM/dba/dm_start_${gr_repo}
starting Documentum server for repository: [GR_REPO1]
with server log: [$DOCUMENTUM/dba/log/GR_REPO1.log]
server pid: 5055
[dmadmin@cs-0 secure]$
[dmadmin@cs-0 secure]$ head -20 $DOCUMENTUM/dba/log/${gr_repo}.log


    OpenText Documentum Content Server (version 16.4.0200.0256  Linux64.Oracle)
    Copyright (c) 2018. OpenText Corporation
    All rights reserved.

2020-05-08T21:54:02.033346      5055[5055]      0000000000000000        [DM_SERVER_I_START_SERVER]info:  "Docbase GR_REPO1 attempting to open"

2020-05-08T21:54:02.033471      5055[5055]      0000000000000000        [DM_SERVER_I_START_KEY_STORAGE_MODE]info:  "Docbase GR_REPO1 is using database for cryptographic key storage"

2020-05-08T21:54:02.033507      5055[5055]      0000000000000000        [DM_SERVER_I_START_SERVER]info:  "Docbase GR_REPO1 process identity: user(dmadmin)"

2020-05-08T21:54:02.833702      5055[5055]      0000000000000000        [DM_SESSION_I_INIT_BEGIN]info:  "Initialize Post Upgrade Processing."

2020-05-08T21:54:02.835032      5055[5055]      0000000000000000        [DM_SESSION_I_INIT_BEGIN]info:  "Initialize Base Types."

2020-05-08T21:54:02.837725      5055[5055]      0000000000000000        [DM_SESSION_I_INIT_BEGIN]info:  "Initialize dmRecovery."

2020-05-08T21:54:02.846022      5055[5055]      0000000000000000        [DM_SESSION_I_INIT_BEGIN]info:  "Initialize dmACL."

[dmadmin@cs-0 secure]$

 

That’s basically it. You are now running with the plain AEK Key just like before the introduction of the Lockbox a few years ago with the Documentum 7.x version.

 

Maybe one last note on the “dfc.crypto.repository” property of the dfc.properties. As you might know, this property has been introduced to support the move to the lockbox so that Documentum knows which repository should be used for all crypto needs. This parameter should apparently stay, even if you remove the lockbox and continue with the AEK file only because the decryption logic is on the server side and so the DFC Clients still need to know which repository can help on that part. Maybe that will change though…

 

Cet article Documentum Upgrade – Switch from Lockbox to AEK key est apparu en premier sur Blog dbi services.

티베로 – The AWR-like “Tibero Performance Repository”

Tue, 2020-05-26 08:56
By Franck Pachot

.
In a previous post I introduced Tibero as The most compatible alternative to Oracle Database. Compatibility is one thing but one day you will want to compare the performance. I’ll not do any benchmark here but show you how you we can look at the performance with TPR – the Tibero Performance Repository – as an equivalent of AWR – the Oracle Automatic Workload Repository. And, as I needed to run some workload, I attempted to run something that has been written with Oracle Database in mind: the Kevin Closson SLOB – Silly Little Oracle Benchmark. The challenge is to make it run on Tibero and get a TPR report.

SLOB

I’ve downloaded SLOB from:


git clone https://github.com/therealkevinc/SLOB_distribution.git
tar -zxvf SLOB_distribution/2019.11.18.slob_2.5.2.tar.gz

and I’ll detail what I had to change in order to have it running on Tibero.

sqlplus

The client command line is “tbsql” and has a very good compatibility with sqlplus:


[SID=t6a u@h:w]$ tbsql -h
Usage: tbsql [options] [logon] [script]
options
-------
  -h,--help        Displays this information
  -v,--version     Displays version information
  -s,--silent      Enables silent mode. Does not display the 
                   start-up message, prompts and commands
  -i,--ignore      Ignore the login script (eg, tbsql.login)
logon
-----
  [username[/password[@connect_identifier]]]
script
------
  @filename[.ext] [parameter ...]
[SID=t6a u@h:w]$ tbsql
tbSQL 6
TmaxData Corporation Copyright (c) 2008-. All rights reserved.
SQL> help
HELP
 ----
 Displays the Help.
{H[ELP]|?} topic
where topic is
 ! {exclamation} % {percent} @ {at}   @@ {double at}
 / {slash}       ACCEPT      APPEND   ARCHIVE LOG
 CHANGE          CLEAR       COLUMN   CONNECT
 DEFINE          DEL         DESCRIBE DISCONNECT
 EDIT            EXECUTE     EXIT     EXPORT
 HELP            HISTORY     HOST     INPUT
 LIST            LOADFILE    LOOP     LS
 PASSWORD        PAUSE       PING     PRINT
 PROMPT          QUIT        RESTORE  RUN
 SAVE            SET         SHOW     TBDOWN
 SPOOL           START       UNDEFINE VARIABLE
 WHENEVER

However, there are a few things I had to change.

The silent mode is “-s” instead of “-S”

The “-L” (no re-prompt if the connection fails) doesn’t exist: tbsql does not re-prompt, and leaves you in the CLI rather than exiting.

sqlplus does not show feedback for less than 5 rows. tbsql shows it always by default. We can get the same sqlplus output by setting SET FEEDBACK 6

tbsql returns “No Errors” where sqlplus returns “No errors”

You cannot pass additional spaces in the connection string. Sqlplus ignores them bit tbsql complains:

All those were easy to change in the setup.sh and runit.sh scripts.
I actually defined a sqlplus() bash function to handle those:


sqlplus(){
 set -- ${@// /}
 set -- ${@/-L/}
 set -- ${@/-S/-s}
 sed \
 -e '1 i SET FEEDBACK 6' \
 tbsql $* | sed \
 -e 's/No Errors/No errors/'
 echo "--- call stack ---  ${FUNCNAME[0]}()$(basename $0)#${LINENO}$(f=0;while caller $f;do((f++));done|awk '{printf " &2
 echo "--- parameters ---  tbsql $*" >&2
}

As you can see I’ve added the print of the bash callstack which I used to find those issues. Here is the idea:

# print call stack in bash:
echo "=== call stack === ${FUNCNAME[0]}()$(basename $0)#${LINENO}$(f=0;while caller $f;do((f++));done|awk '{printf " <- "$2"()@"$3"#"$1}')" >&2 pic.twitter.com/qhVEMSJkeU

— Franck Pachot (@FranckPachot) January 5, 2020

tnsping

The equivalent of TNSPING is TBPROBE. It takes a host:port and display nothing but returns a 0 return code when the connection is ok or 3 when it failed. Note that there are other status like 1 when the connection is ok but the database is read-only, 2 when in mount or nomount. You see here an architecture difference with Oracle: there is no listener but it is the database that listens on a port.
A little detail with no importance here, my database port is 8629 as mentioned in the previous post but tbprobe actually connects to 8630:


[SID=t6a u@h:w]$ strace -fyye trace=connect,getsockname,recvfrom,sendto tbprobe localhost:8629
connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(4, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
getsockname(4, {sa_family=AF_NETLINK, nl_pid=41494, nl_groups=00000000}, [12]) = 0
sendto(4, {{len=20, type=RTM_GETADDR, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1589797671, pid=0}, {ifa_family=AF_UNSPEC, ...}}, 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20
connect(4, {sa_family=AF_INET, sin_port=htons(8630), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(4127.0.0.1:8630]>, {sa_family=AF_INET, sin_port=htons(50565), sin_addr=inet_addr("127.0.0.1")}, [28->16]) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(8630), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
recvfrom(4127.0.0.1:8630]>, "\0\0\0\0\0\0\0@\0\0\0\0\0\0\0\0", 16, 0, NULL, NULL) = 16
recvfrom(4127.0.0.1:8630]>, "\0\0\0\2\0\0\0\17\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6"..., 64, 0, NULL, NULL) = 64
sendto(4127.0.0.1:8630]>, "\0\0\0\204\0\0\0\f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 28, 0, NULL, 0) = 28
recvfrom(4127.0.0.1:8630]>, "\0\0\0\204\0\0\0\f\0\0\0\0\0\0\0\0", 16, 0, NULL, NULL) = 16
recvfrom(4127.0.0.1:8630]>, "\0\0\0e\0\0\0\0\0\0\0\1", 12, 0, NULL, NULL) = 12
+++ exited with 0 +++

The return code is correct. What actually happens is that Tibero does not seem to use Out-Of-Band but another port to be able to communicate if the default port is in use. And this is the next port number as mentioned in my instance process list as “listener_special_port”:


[SID=t6a u@h:w]$ cat /home/tibero/tibero6/instance/t6a/.proc.list
Tibero 6   start at (2019-12-16 14:03:00) by 54323
shared memory: 140243894161408 size: 3221225472
shm_key: 847723696 1 sem_key: -1837474876 123 listener_pid: 7026 listener_port: 8629 listener_special_port: 8630 epa_pid: -1
7025 MONP
7027 MGWP
7028 FGWP000
7029 FGWP001
7030 PEWP000
7031 PEWP001
7032 PEWP002
7033 PEWP003
7034 AGNT
7035 DBWR
7036 RCWP

This means that when my database listener is 8629 TBPROBE will return “ok” for 8628 and 8629


[SID=t6a u@h:w]$ for p in {8625..8635} ; do tbprobe  localhost:$p ; echo "localhost:$p -> $?" ; done
localhost:8625 -> 3
localhost:8626 -> 3
localhost:8627 -> 3
localhost:8628 -> 0
localhost:8629 -> 0
localhost:8630 -> 3
localhost:8631 -> 3
localhost:8632 -> 3
localhost:8633 -> 3
localhost:8634 -> 3
localhost:8635 -> 3

Anyway, this has no importance here and I just ignore the tnsping test done by SLOB:


tnsping(){
 true
}
DCL and DDL

Tibero SQL does not allow to create a user with the grant statement and needs explicit CREATE. In setup.sh I did the following replacement


--GRANT CONNECT TO $user IDENTIFIED BY $user;
CREATE USER $user IDENTIFIED BY $user;
GRANT CONNECT TO $user;

The implicit creation of a user with a grant statement is an oraclism that is not very useful anyway.
PARALLEL and CACHE attributes are not allowed at the same place as in Oracle:


-- PARALLEL CACHE PCTFREE 99 TABLESPACE $tablespace
-- STORAGE (BUFFER_POOL KEEP INITIAL 1M NEXT 1M MAXEXTENTS UNLIMITED);
PCTFREE 99 TABLESPACE $tablespace
STORAGE (BUFFER_POOL KEEP INITIAL 1M NEXT 1M MAXEXTENTS UNLIMITED) PARALLEL CACHE;

--NOCACHE PCTFREE 99 TABLESPACE $tablespace
--STORAGE (BUFFER_POOL KEEP INITIAL 1M NEXT 1M MAXEXTENTS UNLIMITED);
PCTFREE 99 TABLESPACE $tablespace
STORAGE (BUFFER_POOL KEEP INITIAL 1M NEXT 1M MAXEXTENTS UNLIMITED);

They just go at the end of the statement and that’s fine.
Tibero has no SYSTEM user by default, I create one. And by the same occasion create the IOPS tablespace:


tbsql sys/tibero <<SQL
 create user system identified by manager;
 grant dba to system;
 create tablespace IOPS ;
SQL

those are the changes I’ve made to be able to run SLOB setup.sh and runit.sh

I also had to change a few things in slob.sql

There is no GET_CPU_TIME() in DBMS_UTILITY so I comment it out:


--v_end_cpu_tm := DBMS_UTILITY.GET_CPU_TIME();
--v_begin_cpu_tm := DBMS_UTILITY.GET_CPU_TIME();

I didn’t check for an equivalent here and just removed it.

The compatibility with Oracle is very good so that queries on V$SESSION are the same, The only thing I changed is the SID userenv that is called TID in Tibero:


SELECT ((10000000000 * (SID + SERIAL#)) + 1000000000000) INTO v_my_serial from v$session WHERE sid = ( select sys_context('userenv','tid') from dual);

I got a TBR-11006: Invalid USERENV parameter before this change.

This session ID is larger so I removed the precision:


--v_my_serial NUMBER(16);
v_my_serial NUMBER;

I got a TBR-5111: NUMBER exceeds given precision. (n:54011600000000000, p:16, s:0) before changing it.

The dbms_output was hanging and I disable it:


--SET SERVEROUTPUT ON   ;

I didn’t try to understand the reason. That’s a very bad example of troubleshooting but I just want it to run now.

Again, without trying to understand further, I replaced all PLS_INTEGER by NUMBER as I got: TBR-5072: Failure converting NUMBER to or from a native type

setup.sh

In slob.conf I changed only UPDATE_PCT to 0 for a read-only workload and changed “statspack” to “awr”, and I was able to create 8 schemas:


./setup.sh IOPS 8 </dev/null

(I redirect /dev/null to stdin because my sqlplus() function above reads it)

runit.sh

Now ready to run SLOB.
This is sufficient to run it but I want to collect statistics from the Tibero Performance Repository (TPR).

Tibero Performance Repository is the equivalent of AWR. The package to manage it is DBMS_TPR instead of DBMS_WORKLOAD_REPOSITORY. It has a CREATE_SNAPSHOT procedure, and a REPORT_LAST_TEXT to generate the report between the two last snapshots, without having to get the snapshot ID.
I’ve replaced the whole statistics() calls in runit.sh by:


tbsql system/manager <<<'exec dbms_tpr.create_snapshot;';

before
and:


tbsql system/manager <<<'exec dbms_tpr.create_snapshot;';
tbsql system/manager <<<'exec dbms_tpr.report_text_last;';

after.

Now running:


sh runit.sh 4 </dev/null

and I can see the 4 sessions near 100% in CPU.
Note that they are threads in Tibero, need to tun “top -H” to see the detail:

TPR (Tibero Performance Repository) Report 1 session LIO

Here is the report for 5 minutes of running on one session:


--------------------------------------------------------------------------------
               *** TPR (Tibero Performance Repository) Report ***
--------------------------------------------------------------------------------

              DB Name : t6a
                  TAC : NO
                  TSC : NO
         Instance Cnt : 1
              Release : 6   r166256 ( LINUX_X86_64 )
            Host CPUs : 48

   Interval condition : 758 (snapshot id) (#=1 reported)
Report Snapshot Range : 2020-05-26 20:02:37 ~ 2020-05-26 20:07:38
  Report Instance Cnt : 1 (from Instance NO. 'ALL')
         Elapsed Time : 5.02 (mins)
              DB Time : 4.99 (mins)
       Avg. Session # : 1.00

I’m running a simple installation. No TSC (Tibero Standby Cluster, the Active Data Guard equivalent) and no TAC (the Oracle RAC equivalent).
This report is a run with one session only (DB time = Elapsed time) and Avg. Session # is 1.00.


================================================================================
 2.1 Workload Summary
================================================================================

                            Per Second           Per TX         Per Exec         Per Call
                       ---------------  ---------------  ---------------  ---------------
          DB Time(s):             1.00             7.68             0.00            42.80
           Redo Size:         3,096.93        23,901.92             0.52       133,167.86
       Logical Reads:       394,911.05     3,047,903.26            66.40    16,981,175.29
       Block Changes:            26.60           205.33             0.00         1,144.00
      Physical Reads:             0.00             0.00             0.00             0.00
     Physical Writes:             1.31            10.08             0.00            56.14
          User Calls:             0.02             0.18             0.00             1.00
              Parses:             0.02             0.18             0.00             1.00
         Hard Parses:             0.00             0.00             0.00             0.00
              Logons:             0.01             0.05             0.00             0.29
            Executes:         5,947.03        45,898.85             1.00       255,722.14
           Rollbacks:             0.00             0.00             0.00             0.00
        Transactions:             0.13             1.00             0.00             5.57

394,911 logical reads per second is comparable to what I can get from Oracle on this Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz but let’s see what happens with some concurrency. I miss the DB CPU(s) which can quickly show that I am running mostly on CPU but this can be calculated from other parts of the report.(DB CPU time is 299,134 which covers the 5 minutes elapsed time). The “Per Call” is interesting as it has more meaning than the “Per Execution” one which counts the recursive executions.

4 sessions LIO

Another report from the run above with 4 concurrent sessions:


--------------------------------------------------------------------
               *** TPR (Tibero Performance Repository) Report ***
--------------------------------------------------------------------
DB Name : t6a
                  TAC : NO
                  TSC : NO
         Instance Cnt : 1
              Release : 6   r166256 ( LINUX_X86_64 )
            Host CPUs : 48
Interval condition : 756 (snapshot id) (#=1 reported)
Report Snapshot Range : 2020-05-2619:54:48 ~ 2020-05-2619:59:49
  Report Instance Cnt : 1 (from Instance NO. 'ALL')
         Elapsed Time : 5.02 (mins)
              DB Time : 19.85 (mins)
       Avg. Session # : 1.00

Ok, there’s a problem in the calculation of “Avg. Session #” which should be 19.85 / 5.02 = 3.95 for my 4 sessions.

About the Host:


================================================================================
 1.1 CPU Usage
================================================================================

                Total       B/G           Host CPU Usage(%)
               DB CPU    DB CPU  ----------------------------------
Instance#    Usage(%)  Usage(%)   Busy   User    Sys   Idle  IOwait
-----------  --------  --------  -----  -----  -----  -----  ------
          0       7.9       7.9    7.9    7.8    0.1   92.1     0.0

mostly idle as I have 38 vCPUs there.


====================================================================
 1.2 Memory Usage
====================================================================
HOST Mem Size :    322,174M
                      Total SHM Size :      3,072M
                   Buffer Cache Size :      2,048M
               Avg. Shared Pool Size :     288.22M
                  Avg. DD Cache Size :       8.22M
                  Avg. PP Cache Size :      63.44M
                       DB Block Size :          8K
                     Log Buffer Size :         10M

My workload size (80MB defined in slob.conf) is much smaller than the buffer cache.
and then I expect an average of 4 active sessions in CPU:


====================================================================
 2.1 Workload Summary
====================================================================
Per Second          Per TX        Per Call
                  --------------- --------------- ---------------
      DB Time(s):            3.96           38.41           79.38
       Redo Size:        3,066.07       29,770.52       61,525.73
   Logical Reads:      427,613.99    4,151,993.87    8,580,787.33
   Block Changes:           26.36          255.90          528.87
  Physical Reads:            0.00            0.00            0.00
 Physical Writes:            0.99            9.61           19.87
      User Calls:            0.05            0.48            1.00
          Parses:            0.03            0.29            0.60
     Hard Parses:            0.00            0.00            0.00
          Logons:            0.02            0.16            0.33
        Executes:       67,015.04      650,694.45    1,344,768.53
       Rollbacks:            0.00            0.00            0.00
    Transactions:            0.10            1.00            2.07

3.96 average session is not so bad. But 427,613 LIOPS is only a bit higher than the 1 session run, but now with 4 concurrent sessions. 4x CPU usage for only 10% higher throughput…

At this point I must say that I’m not doing a benchmark here. I’m using the same method as I do with Oracle, for which I know quite well how it works, but here Tibero is totally new for me. I’ve probably not configured it correctly and the test I’m doing may not be correct. I’m looking at the number here only to understand a bit more how it works.

The Time Model is much more detailed than Oracle one:


================================================================================
 2.2 Workload Stats
================================================================================

                                                                         DB         DB
            Category                                       Stat         Time(ms)   Time(%)
--------------------  -----------------------------------------  ---------------  --------
Request Service Time  -----------------------------------------        1,190,760    100.00
                                            SQL processing time        1,200,818    100.84
                           reply msg processing time for others                0      0.00
                                             commit by msg time                0      0.00
                              SQL processing (batchupdate) time                0      0.00
                                           rollback by msg time                0      0.00
                                                   msg lob time                0      0.00
                                                    msg xa time                0      0.00
                                                   msg dpl time                0      0.00
                                            msg dblink 2pc time                0      0.00
                                                  msg tsam time                0      0.00
                                             msg long read time                0      0.00
      SQL Processing  -----------------------------------------        1,200,818    100.84
                                       SQL execute elapsed time          369,799     31.06
                                          csr fetch select time          355,744     29.88
                                             parse time elapsed           83,992      7.05
                                       sql dd lock acquire time           25,045      2.10
                                                ppc search time            5,975      0.50
                                          csr fetch insert time              115      0.01
                                          csr fetch delete time               37      0.00
                                        hard parse elapsed time                2      0.00
                                        total times to begin tx                1      0.00
                                      failed parse elapsed time                1      0.00
                                      sql dml lock acquire time                0      0.00
                                              cursor close time                0      0.00
                        stat load query hard parse elapsed time                0      0.00
                        stat load query soft parse time elapsed                0      0.00
                              csr fetch direct path insert time                0      0.00
                                           csr fetch merge time                0      0.00
                                                 optimizer time                0      0.00
                                          csr fetch update time                0      0.00
                                 stat load query row fetch time                0      0.00
              Select  -----------------------------------------               17      0.00
                                           table full scan time               11      0.00
                                                 hash join time                6      0.00
                                                      sort time                0      0.00
                                        op_proxy execution time                0      0.00
              Insert  -----------------------------------------                1      0.00
                                              tdd mi total time                1      0.00
                                            tdi insert key time                0      0.00
                                            tdd insert row time                0      0.00
                                              tdi mi total time                0      0.00
                                            tdi fast build time                0      0.00
              Update  -----------------------------------------                0      0.00
                                             tdd update rp time                0      0.00
                                            tdd update row time                0      0.00
                                              tdd mu total time                0      0.00
                                    idx leaf update nonkey time                0      0.00
              Delete  -----------------------------------------               15      0.00
                                            tdd delete row time               15      0.00
                                             tdd delete rp time                0      0.00
                                              tdd md total time                0      0.00
                                              tdi md total time                0      0.00
                                            tdi delete key time                0      0.00

I am surprised to spend 7% of the time in parsing, as I expect the few queries to be parsed only once there, which is confirmed by the Workload Summary above.

Having a look at the ratios:


================================================================================
 3.1 Instance Efficiency
================================================================================

                              Value
                           --------
      Buffer Cache Hit %:    100.00
           Library Hit %:    100.00
                PP Hit %:     99.54
             Latch Hit %:     98.44
        Redo Alloc Hit %:    100.00
         Non-Parse CPU %:     92.95

confirms that 7% of the CPU time is about parsing.

We have many statistics. Here are the timed-base ones:


================================================================================
 6.1 Workload Stats (Time-based)
================================================================================

                                     Stat         Time(ms)    Avg. Time(ms)              Num             Size
-----------------------------------------  ---------------  ---------------  ---------------  ---------------
                      SQL processing time        1,200,818        120,081.8               10                0
                Inner SQL processing time        1,200,817        120,081.7               10                0
                         req service time        1,190,760         79,384.0               15                0
                              DB CPU time        1,120,908              1.6          705,270                0
                 SQL execute elapsed time          369,799              0.0       20,174,294                0
                    csr fetch select time          355,744              0.0       20,174,264                0
                         tscan rowid time          187,592              0.0       20,174,102                0
               PSM execution elapsed time          129,820              0.0       60,514,594                0
                       parse time elapsed           83,992              0.0       20,174,308                0
                     tdi fetch start time           47,440              0.0       20,175,956        1,660,966
                 sql dd lock acquire time           25,045              0.0       20,171,527                0
                 isgmt get cr in lvl time           25,004              0.0       20,390,418       22,268,417
                        isgmt get cr time           21,500              0.0       22,479,405                0
                tscan rowid pick exb time           18,734              0.0      106,200,851                0
                    tscan rowid sort time           15,529              0.0       20,173,326                0
                         ppc search time            5,975              0.0       20,174,325                0
                           dd search time            4,612              0.0       40,344,560                0
...
                  hard parse elapsed time                2              0.1               13                0
...
                failed parse elapsed time                1              0.1                5                0
...

This parse time is not hard parse.
The non-timed-based statistics are conveniently ordered by number which is more useful than by alphabetical order:


================================================================================
 6.2 Workload Stats (Number-based)
================================================================================

                                     Stat              Num             Size         Time(ms)
-----------------------------------------  ---------------  ---------------  ---------------
                     candidate bh scanned      128,715,142      153,638,765                0
               consistent gets - no clone      128,700,485                0                0
  fast examines for consistent block gets      128,700,414                0                0
                    consistent block gets      128,698,143                0                0
                 block pin - not conflict      128,691,338                0              796
                              block unpin      128,691,332                0              333
                      rowid sort prefetch      106,200,861       18,945,339                0
                     tscan rowid pick exb      106,200,851                0           18,734
                          dd search count       40,344,560                0            4,612
Number of conflict DBA while scanning can       24,917,669                0                0

...
                    tdi fetch start total       20,175,956        1,660,966           47,440
           parse count (for all sessions)       20,174,308                0           83,992
                         csr fetch select       20,174,264                0          355,744
                              tscan rowid       20,174,102                0          187,592
                         tscan rowid sort       20,173,326                0           15,529
               memory tuner prof register       20,171,661       20,171,661              198
                            execute count       20,171,528                0                0
                      sql dd lock acquire       20,171,527                0           25,045
...
                      parse count (total)                9                0                0
...

and this clearly means that I had as many parses as counts. Then I realized that there was nothing about a parse-to-execute ratio in the “Instance Efficiency” which is the only ratio I read at in Oracle “Instance Efficiency”. Even if the terms are similar there’s something different from Oracle.
The only documentation I’ve found is in Korean: https://technet.tmaxsoft.com/download.do?filePath=/nas/technet/technet/upload/kss/tdoc/tibero/2015/02/&fileName=FILE-20150227-000002_150227145000_1.pdf
According to this, “parse time elapsed” is the time spent in “parse count (for all sessions)”, and covers parsing, syntax and semantic analysis which is what we call soft parse in Oracle. “parse count (total)” is parsing, transformation and optimization, which is what we call hard parse in Oracle. With this very small knowledge, it looks like, even if called from PL/SQL, a soft parse occurred for each execution. Look also at the “dd’ statistics: “sql dd lock acquire time” is 2.1% of DB time and looks like serialization on the Data Dictionary. And, even if not taking lot of time (“dd search time” is low) the “dd search count” is called 2 times per execution: soft parse of the small select has to read the Data Dictionary definitions.

Of course we have wait events (not timed events as this section does not include the CPU):


================================================================================
 3.2 Top 5 Wait Events by Wait Time
================================================================================

                                                  Time          Wait          Avg           Waits       DB
                         Event            Waits  -outs(%)       Time(s)      Wait(ms)           /TX   Time(%)
------------------------------  ---------------  --------  ------------  ------------  ------------  --------
           spinlock total wait          787,684      0.00         45.47          0.06  2,540,916.13      3.82
          dbwr write time - OS                7      0.00          0.16         23.14         22.58      0.01
spinlock: cache buffers chains            9,837      0.00          0.12          0.01     31,732.26      0.01
               redo write time               36      0.00          0.03          0.78        116.13      0.00
     log flush wait for commit               31      0.00          0.03          0.81        100.00      0.00

Only “spinlock total wait” is significant here.
Here is the section about latches:


================================================================================
 7.7 Spinlock(Latch) Statistics
================================================================================

                                            Get    Get     Avg.Slps       Wait           Nowait   Nowait     DB
                          Name          Request   Miss(%)     /Miss       Time(s)       Request   Miss(%)   Time(%)
------------------------------  ---------------  --------  --------  ------------  ------------  --------  --------
         SPIN_SPIN_WAITER_LIST        2,190,429      4.07      3.60         51.69             0      0.00      4.34
                    SPIN_ALLOC      161,485,088     10.40      3.71         41.42             0      0.00      3.48
                SPIN_LC_BUCKET       60,825,921      0.53     25.55          3.31             0      0.00      0.28
               SPIN_ROOT_ALLOC      126,036,981      0.00     79.52          0.16   126,040,272      0.00      0.01
               SPIN_BUF_BUCKET      257,429,840      0.00     15.82          0.12            34      0.00      0.01
               SPIN_RECR_UNPIN      121,038,780      0.00     57.14          0.08        36,039      0.01      0.01
                 SPIN_SQLSTATS       40,383,027      0.36      0.61          0.07             0      0.00      0.01

This looks like very generic Latch protected by spinlock.

While I was there I ran a Brendan Gregg Flamegraph during the run:


perf script -i ./perf.data | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl --width=1200 --hash --cp > /tmp/tibero-slob.svg

if you ever traced some Oracle PL/SQL call stacks, you can see that Tibero is completely different implementation. But the features are very similar. TPR is really like AWR. And there is also an ASH equivalent. You may have seen in the previous post that I have set ACTIVE_SESSION_HISTORY=Y in the .tip file (Tibero instance parameters). You can query a v$active_session_history.

SQL_TRACE

Given the similarity with Oracle, let’s do a good old ‘tkprof’.
I run the SLOB call with SQL_TRACE enabled:


[SID=t6a u@h:w]$ tbsql user1/user1

tbSQL 6

TmaxData Corporation Copyright (c) 2008-. All rights reserved.

Connected to Tibero.

SQL> alter session set sql_trace=y;

Session altered.

SQL> @slob 1 0 0 300 10240 64 LITE 0 FALSE 8 16 3 0 .1 .5 0

PSM completed.

SQL> alter session set sql_trace=y;

Session altered.

The raw trace has been generated, which contains things like this:


=========================================================
PARSING IN CSR=#8 len=87, uid=172, tim=1590498896169612
SELECT COUNT(c2)
                        FROM cf1
                        WHERE ( custid > ( :B1 - :B2 ) ) AND  (custid < :B1)
END OF STMT
[PARSE] CSR=#8 c=0, et=29, cr=0, cu=0, p=0, r=0, tim=1590498896169655
[EXEC] CSR=#8 c=0, et=48, cr=0, cu=0, p=0, r=0, tim=1590498896169720
[FETCH] CSR=#8 c=0, et=307, cr=66, cu=0, p=0, r=1, tim=1590498896170044
[PSM] OBJ_ID=-1, ACCESS_NO=0 c=0, et=14, cr=0, cu=0, p=0, r=0, tim=1590498896170090
[STAT] CSR=#8 ppid=4826 cnt_in_L=1 cnt=1 dep=0 'column projection (et=2, cr=0, cu=0, co=63, cpu=0, ro=1)'
[STAT] CSR=#8 ppid=4826 cnt_in_L=64 cnt=1 dep=1 'sort aggr (et=6, cr=0, cu=0, co=63, cpu=0, ro=1)'
[STAT] CSR=#8 ppid=4826 cnt_in_L=64 cnt=64 dep=2 'table access (rowid) CF1(3230) (et=258, cr=64, cu=0, co=63, cpu=0, ro=60)'
[STAT] CSR=#8 ppid=4826 cnt_in_L=0 cnt=64 dep=3 'index (range scan) I_CF1(3235) (et=20, cr=2, cu=0, co=2, cpu=0, ro=60)'
CSR_CLOSE #8
[PSM] OBJ_ID=-1, ACCESS_NO=0 c=0, et=62, cr=0, cu=0, p=0, r=0, tim=1590498896170170
=========================================================

And all this can be aggregated by TBPROF:

tbprof tb_sqltrc_17292_63_2107964.trc tb_sqltrc_17292_63_2107964.txt sys=yes sort="prscnt"

Which gets something very similar to Oracle tkprof:


SELECT COUNT(c2)
                        FROM cf1
                        WHERE ( custid > ( :B1 - :B2 ) ) AND  (custid < :B1)

 stage     count       cpu   elapsed   current     query      disk      rows
-----------------------------------------------------------------------------
 parse         1      0.00      0.00         0         0         0         0
  exec    829366     28.41     27.68         0         0         0         0
 fetch    829366    137.36    135.18         0  55057089         0    829366
-----------------------------------------------------------------------------
   sum   1658733    165.77    162.86         0  55057089         0    829366

    rows  u_rows        execution plan
  ----------------------------------------------------------
   829366       -       column projection (et=859685, cr=0, cu=0, co=52250058, cpu=0, ro=829366)
   829366       -        sort aggr (et=3450526, cr=0, cu=0, co=52250058, cpu=0, ro=829366)
   53079424       -       table access (rowid) CF1(3230) (et=106972426, cr=53079424, cu=0, co=52250058, cpu=0, ro=49761960)
   53079424       -        index (range scan) I_CF1(3235) (et=9256762, cr=1977665, cu=0, co=1658732, cpu=0, ro=49761960)

But here I’m puzzled. From the TPR statistics, I got the impression that each SQL in the PSM (Persistent Stored Procedure – the PL/SQL compatible procedural language) was soft parsed but I was wrong: I see only one parse call here. Is there something missing there? I don’t think so. The total time in this profile is 162.86 seconds during a 300 seconds run without any think time. Writing the trace file is not included there. if I compare the logical reads per second during the SQL time: 55057089/162.86=338064 I am near the value I got without sql_trace.

I leave it with unanswered questions about the parse statistics. The most important, which was the goal of this post, is that there’s lot of troubleshooting tools, similar to Oracle, and I was able to run something that was really specific to Oracle, with sqlplus, SQL, and PL/SQL without the need for many changes.

Cet article 티베로 – The AWR-like “Tibero Performance Repository” est apparu en premier sur Blog dbi services.

How to add storage on ODA X8-2M

Tue, 2020-05-26 06:30

Recently I had to add some storage on an ODA X8-2M that I deployed early February. At that time the last available release was ODA 18.7. In this post I would like to share my experience and the challenge I could face.

ODA X8-2M storage extension

As per Oracle datasheet we can see that we have initially 2 NVMe SSDs installed. With an usable capacity of 5.8 TB. We can extend up to 12 NVMe SSDs per slot of 2 disks, which can bring the ASM storage up to 29.7 TB as usable capacity.
In my configuration we were already having initally 4 NVME SSDs disk and we wanted to add 2 more.

Challenge

During the procedure to add the disk, I surprisingly could see that with release 18.7 the common expand storage command was not recognized.

[root@ODA01 ~]# odaadmcli expand storage -ndisk 2
Command 'odaadmcli expand storage' is not supported

What hell is going here? This was always possible on previous ODA generations and previous releases!
Looking closer to the documentation I could see the following note :
Note:In this release, you can add storage as per your requirement, or deploy the full storage capacity for Oracle Database Appliance X8-2HA and X8-2M hardware models at the time of initial deployment of the appliance. You can only utilize whatever storage you configured during the initial deployment of the appliance (before the initial system power ON and software provisioning and configuration). You cannot add additional storage after the initial deployment of the X8-2HA and X8-2M hardware models, in this release of Oracle Database Appliance, even if the expandable storage slots are present as empty.

Hmmm, 18.5 was still allowing it. Fortunately, the 18.8 version just got released at that time and post installation storage expansion is again possible with that release.
I, then, had to first patch my ODA with release 18.8. A good blog for ODA 18.8 patching from one of my colleague can be found here : Patching ODA from 18.3 to 18.8. Coming from 18.3, 18.5, or 18.7 would follow the same process.

Adding disks on the ODA Checking ASM usage

Let’s first check the current ASM usage :

grid@ODA01:/home/grid/ [+ASM1] asmcmd
 
ASMCMD> lsdg
State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL N 512 512 4096 4194304 12211200 7550792 3052544 2248618 0 Y DATA/
MOUNTED NORMAL N 512 512 4096 4194304 12209152 2848956 3052032 -102044 0 N RECO/

Check state of the disk

Before adding a new disk, all current disks need to be healthy.

[root@ODA01 ~]# odaadmcli show disk
NAME PATH TYPE STATE STATE_DETAILS
 
pd_00 /dev/nvme0n1 NVD ONLINE Good
pd_01 /dev/nvme1n1 NVD ONLINE Good
pd_02 /dev/nvme3n1 NVD ONLINE Good
pd_03 /dev/nvme2n1 NVD ONLINE Good

We are using 2 ASM groups :
[root@ODA01 ~]# odaadmcli show diskgroup
DiskGroups
----------
DATA
RECO

Run orachk

It is recommended to run orachk and be sure the ODA is healthy before adding some new disk :

[root@ODA01 ~]# cd /opt/oracle.SupportTools/orachk/oracle.ahf/orachk
[root@ODA01 orachk]# ./orachk -nordbms

Physical disk installation

In my configuration I have already 4 disks. The 2 additional disks will then be installed in slot 4 and 5. After the disk is plugged in we need to power it on :

[root@ODA01 orachk]# odaadmcli power disk on pd_04
Disk 'pd_04' already powered on

It is recommended to wait at least one minute before plugging in the next disk. The LED of the disk should also shine green. Similarly we can power on the next disk once plugged in the slot 5 of the server :

[root@ODA01 orachk]# odaadmcli power disk on pd_05
Disk 'pd_05' already powered on

Expand the storage

Following command will be used to expand the storage with 2 new disks :
[root@ODA01 orachk]# odaadmcli expand storage -ndisk 2
Precheck passed.
Check the progress of expansion of storage by executing 'odaadmcli show disk'
Waiting for expansion to finish ...

Check expansion

At the beginning of the expansion, we can check and see that the 2 new disks have been seen and are in the process to be initialized :
[root@ODA01 ~]# odaadmcli show disk
NAME PATH TYPE STATE STATE_DETAILS
 
pd_00 /dev/nvme0n1 NVD ONLINE Good
pd_01 /dev/nvme1n1 NVD ONLINE Good
pd_02 /dev/nvme3n1 NVD ONLINE Good
pd_03 /dev/nvme2n1 NVD ONLINE Good
pd_04 /dev/nvme4n1 NVD UNINITIALIZED NewDiskInserted
pd_05 /dev/nvme5n1 NVD UNINITIALIZED NewDiskInserted

Once the expansion is finished, we can check that all our disk, including the new ones, are OK :
[root@ODA01 ~]# odaadmcli show disk
NAME PATH TYPE STATE STATE_DETAILS
 
pd_00 /dev/nvme0n1 NVD ONLINE Good
pd_01 /dev/nvme1n1 NVD ONLINE Good
pd_02 /dev/nvme3n1 NVD ONLINE Good
pd_03 /dev/nvme2n1 NVD ONLINE Good
pd_04 /dev/nvme4n1 NVD ONLINE Good
pd_05 /dev/nvme5n1 NVD ONLINE Good

We can also query the ASM instance and see that the 2 new disks in slot 4 and 5 are online :
SQL> col PATH format a50
SQL> set line 300
SQL> set pagesize 500
SQL> select mount_status, header_status, mode_status, state, name, path, label from v$ASM_DISK order by name;
 
MOUNT_S HEADER_STATU MODE_ST STATE NAME PATH LABEL
------- ------------ ------- -------- ------------------------------ -------------------------------------------------- -------------------------------
CACHED MEMBER ONLINE NORMAL NVD_S00_PHLN9440011FP1 AFD:NVD_S00_PHLN9440011FP1 NVD_S00_PHLN9440011FP1
CACHED MEMBER ONLINE NORMAL NVD_S00_PHLN9440011FP2 AFD:NVD_S00_PHLN9440011FP2 NVD_S00_PHLN9440011FP2
CACHED MEMBER ONLINE NORMAL NVD_S01_PHLN94410040P1 AFD:NVD_S01_PHLN94410040P1 NVD_S01_PHLN94410040P1
CACHED MEMBER ONLINE NORMAL NVD_S01_PHLN94410040P2 AFD:NVD_S01_PHLN94410040P2 NVD_S01_PHLN94410040P2
CACHED MEMBER ONLINE NORMAL NVD_S02_PHLN9490009MP1 AFD:NVD_S02_PHLN9490009MP1 NVD_S02_PHLN9490009MP1
CACHED MEMBER ONLINE NORMAL NVD_S02_PHLN9490009MP2 AFD:NVD_S02_PHLN9490009MP2 NVD_S02_PHLN9490009MP2
CACHED MEMBER ONLINE NORMAL NVD_S03_PHLN944000SQP1 AFD:NVD_S03_PHLN944000SQP1 NVD_S03_PHLN944000SQP1
CACHED MEMBER ONLINE NORMAL NVD_S03_PHLN944000SQP2 AFD:NVD_S03_PHLN944000SQP2 NVD_S03_PHLN944000SQP2
CACHED MEMBER ONLINE NORMAL NVD_S04_PHLN947101TZP1 AFD:NVD_S04_PHLN947101TZP1 NVD_S04_PHLN947101TZP1
CACHED MEMBER ONLINE NORMAL NVD_S04_PHLN947101TZP2 AFD:NVD_S04_PHLN947101TZP2 NVD_S04_PHLN947101TZP2
CACHED MEMBER ONLINE NORMAL NVD_S05_PHLN947100BXP1 AFD:NVD_S05_PHLN947100BXP1 NVD_S05_PHLN947100BXP1
CACHED MEMBER ONLINE NORMAL NVD_S05_PHLN947100BXP2 AFD:NVD_S05_PHLN947100BXP2 NVD_S05_PHLN947100BXP2

CACHED MEMBER ONLINE DROPPING SSD_QRMDSK_P1 AFD:SSD_QRMDSK_P1 SSD_QRMDSK_P1
CACHED MEMBER ONLINE DROPPING SSD_QRMDSK_P2 AFD:SSD_QRMDSK_P2 SSD_QRMDSK_P2
 
14 rows selected.

The operation system will recognize the disks as well :
grid@ODA01:/home/grid/ [+ASM1] cd /dev
 
grid@ODA01:/dev/ [+ASM1] ls -l nvme*
crw-rw---- 1 root root 246, 0 May 14 10:31 nvme0
brw-rw---- 1 grid asmadmin 259, 0 May 14 10:31 nvme0n1
brw-rw---- 1 grid asmadmin 259, 1 May 14 10:31 nvme0n1p1
brw-rw---- 1 grid asmadmin 259, 2 May 14 10:31 nvme0n1p2
crw-rw---- 1 root root 246, 1 May 14 10:31 nvme1
brw-rw---- 1 grid asmadmin 259, 5 May 14 10:31 nvme1n1
brw-rw---- 1 grid asmadmin 259, 10 May 14 10:31 nvme1n1p1
brw-rw---- 1 grid asmadmin 259, 11 May 14 14:38 nvme1n1p2
crw-rw---- 1 root root 246, 2 May 14 10:31 nvme2
brw-rw---- 1 grid asmadmin 259, 4 May 14 10:31 nvme2n1
brw-rw---- 1 grid asmadmin 259, 7 May 14 14:38 nvme2n1p1
brw-rw---- 1 grid asmadmin 259, 9 May 14 14:38 nvme2n1p2
crw-rw---- 1 root root 246, 3 May 14 10:31 nvme3
brw-rw---- 1 grid asmadmin 259, 3 May 14 10:31 nvme3n1
brw-rw---- 1 grid asmadmin 259, 6 May 14 10:31 nvme3n1p1
brw-rw---- 1 grid asmadmin 259, 8 May 14 10:31 nvme3n1p2
crw-rw---- 1 root root 246, 4 May 14 14:30 nvme4
brw-rw---- 1 grid asmadmin 259, 15 May 14 14:35 nvme4n1
brw-rw---- 1 grid asmadmin 259, 17 May 14 14:38 nvme4n1p1
brw-rw---- 1 grid asmadmin 259, 18 May 14 14:38 nvme4n1p2
crw-rw---- 1 root root 246, 5 May 14 14:31 nvme5
brw-rw---- 1 grid asmadmin 259, 16 May 14 14:35 nvme5n1
brw-rw---- 1 grid asmadmin 259, 19 May 14 14:38 nvme5n1p1
brw-rw---- 1 grid asmadmin 259, 20 May 14 14:38 nvme5n1p2

Check ASM space

Querying the ASM disk groups we can see that both Volumes have got additional space in relation of the corresponding pourcentage assigned to DATA and RECO disk group during appliance creation. In my case it was 50-50 for DATA and RECO repartition.

grid@ODA01:/dev/ [+ASM1] asmcmd
 
ASMCMD> lsdg
State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED NORMAL Y 512 512 4096 4194304 18316288 13655792 3052544 5301118 0 Y DATA/
MOUNTED NORMAL Y 512 512 4096 4194304 18313216 8952932 3052032 2949944 0 N RECO/
ASMCMD>

Conclusion

Adding some new disks on an ODA is quite easy and fast. Surprisingly with ODA release 18.7 you are not able to expand ASM storage once the appliance is installed. This is really a regression where you will lose the ability to extend the storage. Fortunately, this has been solved in ODA version 18.8.

Cet article How to add storage on ODA X8-2M est apparu en premier sur Blog dbi services.

6 things Oracle could do for a better ODA

Tue, 2020-05-26 04:44
Introduction

With the latest ODA X8 range, at least 80% of the customers could find an ODA configuration that fits their needs. For the others, either they can’t afford it, either they are already in the Cloud, or they need extremely large storage or EXADATA performance. Among these 80% of customers, only a few choose ODA. Let’s see how Oracle could improve the ODA to make it a must.

1) Make robust and reliable releases

This is the main point. ODA is built on Linux, Grid Infrastructure and database software, nearly identical to what you can find on a classic linux server. But it comes bundled and with odacli, a central CLI to manage deployment, database creations, updates, migrations and so on. And it sometimes has annoying bugs. More reliable releases could also make patching less tricky, and customers much more confident in this kind of operation.

It could also be nice to have long-term releases on ODA, like on the database. One long-term release each year, with only bug fixes and deeply tested, for those customers who prefer stability and reliability: most of them.

2) Make a real GUI

An appliance is something that eases your life. You unpack the server, you plug it, you press the power button, and you start configuring it with a smart GUI. ODA is not yet that nice. GUI is quite basic, and most of us use the CLI to have a complete control over all the features. So please Oracle, make the GUI a real strength of the ODA, with a pinch of Cloud Control features but without the complexity of Cloud Control. That would be a game-changer.

3) Integrate Data Guard management

Data Guard works fine on ODA, but you’ll have to setup the configuration yourself. Most of the customers plan to use Data Guard if they are using Enterprise Edition. And actually, ODA doesn’t know about Data Guard. You’ll need to configure everything like if it were a standard server. In my dreams, an ODA could be paired up with another one, and standby databases automatically created on the paired ODA, duplicated and synchronized with the primaries. Later we could easily switchover and switchback from the GUI, without any specific knowledge.

There is a lot of work to achieve this, but it could be a killer feature.

4) Get rid of GI for ODA lites

Yes, Grid Infrastructure adds complexity. And a “lite” appliance means simplified appliance. GI is needed mainly because we need ASM redundancy, and ASM is really nice. It’s actually better than RAID. But do you remember how ASM was configured in 10g? Just by deploying a standalone DBhome and creating a pfile with instance_type=ASM. That’s it. No dependencies between ASM and the other DBHomes. Once ASM is on the server, each instance can use it. And it could make patching easier for sure.

5) Make IPs modifiables

Because sometimes you would need to change the public IP address of an ODA, or its name. Moving to another datacenter is a good example. For now, changing IPs is only possible when appliance is not yet deployed, meaning unused. You can eventually change the network configuration manually, but don’t consider future patches will work. An easy function to change the network configuration on a deployed ODA would be welcome.

6) Be proud of this product!

Last but not least. Yes, Cloud is the future. And Oracle Cloud Infrastructure is a great piece of Cloud. But it will take time for customers to migrate to the Cloud. Some of them are even not considering Cloud at all for the moment. They want on-premise solutions. ODA is THE solution that perfectly fits between OCI and EXADATA. It’s a great product, it’s worth the money and it has many years to live. To promote these appliances, maybe Oracle could make ODA better integrated with OCI, as a cross-technology solution. Being able to backup and restore the ODA configuration to the Cloud, to put a standby database in OCI from the GUI, to duplicate a complete environment to the Cloud for testing purpose, …

Conclusion

ODA is a serious product, but it still needs several improvements to amplify its popularity.

Cet article 6 things Oracle could do for a better ODA est apparu en premier sur Blog dbi services.

Automate AWS deployments with Ansible + Terraform

Tue, 2020-05-26 01:40
Automate AWS deployments with Ansible + Terraform

This installation is made from a bastion server already available with the proper network permissions.
For different deployment types, you should adapt it to your need.

Install requirements

Ansible installation:

sudo apt update
sudo apt install -y software-properties-common
sudo apt-add-repository --yes --update ppa:ansible/ansible
sudo apt install -y ansible

ref: https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html#installing-ansible-on-ubuntu

Terraform installation:

wget https://releases.hashicorp.com/terraform/0.12.24/terraform_0.12.24_linux_amd64.zip
sudo unzip -d /usr/local/bin/ ./terraform_0.12.24_linux_amd64.zip

ref: https://www.techrepublic.com/article/how-to-install-terraform-on-ubuntu-server/

AWS Client installation:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

ref: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html

Connect your environment to your AWS cloud
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: eu-central-a
Default output format [None]: json

ref: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration

Create a new Ansible project

That is my favorite Ansible layout. The one I’ve seen as the best so far:

mkdir -p ~/my_aws_project/inventory
mkdir -p ~/my_aws_project/playbooks
mkdir -p ~/my_aws_project/roles
Add the EC2 dynamic inventory

Ansible can work with a different kind of inventory called the dynamic inventory.
Instead of having a static declaration in a static file of your inventory, you can generate it from a source.
This source can be a database, an active directory, etc. A dynamic inventory is a scrip that outputs a JSON in a structure that Ansible can handle. We could then develop a script that discovers our EC2 infrastructure that would take some time. Or we can use the one already provide with Ansible:

Install prerequisites:

sudo apt install -y python3-pip
sudo pip3 install boto
sudo pip3 install ansible
sudo rm /usr/bin/python
sudo ln -s /usr/bin/python3 /usr/bin/python

Get the EC2 dynamic inventory:

wget -O ~/my_aws_project/inventory/ec2.py \
https://raw.githubusercontent.com/ansible/ansible/stable-2.9/contrib/inventory/ec2.py
wget -O ~/my_aws_project/inventory/ec2.ini \
https://raw.githubusercontent.com/ansible/ansible/stable-2.9/contrib/inventory/ec2.ini
chmod +x ~/my_aws_project/inventory/ec2.py

There are multiple configuration options you can do with the ini file. For this blog I’ll change those vars:

regions = eu-central-1
vpc_destination_variable = private_ip_address

Test the inventory script:

~/my_aws_project/inventory/ec2.py ## ---> return JSON description of your AWS infrastructure

Because I want to work on one AWS region in the private network only. Since my bastion is already in
the AWS infrastructure.

Add a role for our deployment

I’ll create a role with the only purpose to deploy my infrastructure into AWS.

ansible-galaxy init --init-path ~/my_aws_project/roles ec2_instances_dep
Cable the components (Ansible configuration)

To have that layout working fine and the simpliest way, I use that configuration:

## file ~/my_aws_project/ansible.cfg
[defaults]
roles_path = ./roles
inventory  = ./inventory/ec2.py

Test the ansible inventory:

cd ~/my_aws_project
ansible-inventory --graph ## ---> return the Ansible interpreted inventory
Terraform with Ansible

When I need to do something with Ansible, I first check in the list of modules is the work is already done.
And, nicely, there is a module for Terraform.

So I can add this module in my task main file of my role:

## ~/my_aws_project/roles/ec2_instances_dep/tasks/main.yml
---
- name: I create a directory to store my Terraform config
  file:
    path: "~/aws_terraform"
    state: directory
    recurse: yes

- name: I copy my Terraform template into the working directory create above
  template:
    src: "my_ec2_infra.tf"
    dest: "~/aws_terraform/my_ec2_infra.tf"

- name: I deploy my configuration into AWS from Ansible
  terraform:
    project_path: "~/aws_terraform"
    force_init: true
    state: "present"
  register: r_aws

- name: I do whatever I need to do in my EC2 infrastructure
  debug: msg="update, install, create user, start services, etc"

- name: I destroy my AWS infrastructure 
  terraform:
    project_path: "~/aws_terraform"
    state: "absent"
Terraform content

Add this file into the template directory: ~/my_aws_project/roles/ec2_instances_dep

## file ~/my_aws_project/roles/ec2_instances_dep/my_ec2_infra.tf
provider "aws" {
  region = "eu-central-1"
}

resource "aws_instance" "dba-essential" {
  count                       = "5"
  ami                         = "ami-0e342d72b12109f91"
  availability_zone           = "eu-central-1a"
  instance_type               = "t2.micro"
  associate_public_ip_address = false
  security_groups             = ["my_sg_01"]
  vpc_security_group_ids      = ["sg-602eff2724d52a0b7"]
  key_name                    = "my_key_01"

  root_block_device {
    delete_on_termination = true
    encrypted             = false
    volume_size           = 15
    volume_type           = "gp2"
  }

  tags = {
    Owner           = "Nicolas"
    Name            = "crash-test-${count.index + 1}"
  }

}
Create a playbook to call the role
## file ~/my_aws_project/playbooks/deploy.yml
---
- name: Deploy my infrastructure
  hosts: localhost
  roles:

    - ec2_instances_dep
Run the playbook
cd my_aws_project
ansible-playbook playbooks/deploy.yml

Boom! Here it is. Now imagine that you can generate a unique key and unique directory for each deployment and you can deploy as much infrastructure as your credit card will accept it.

I hope this helps, and please comment below for any questions.

Cet article Automate AWS deployments with Ansible + Terraform est apparu en premier sur Blog dbi services.

Issue deleting a database on ODA?

Mon, 2020-05-25 15:33

I have recently faced an issue deleting database on an ODA. I was getting following error whatever database I wanted to delete : DCS-10001:Internal error encountered: null.

Through this blog, I would like to share with you my experience on this case hoping it will help you if you are facing same problem. On this project I was using ODA Release 18.5 and 18.8 and faced the same problem on both versions. On 18.3 and previous releases this was not the case.

Deleting the database

With odacli I tried to delete my TEST database, running following commands :

[root@ODA01 bin]# odacli delete-database -in TEST -fd
{
"jobId" : "bcdcbf59-0fe6-44b7-af7f-91f68c7697ed",
"status" : "Running",
"message" : null,
"reports" : [ {
"taskId" : "TaskZJsonRpcExt_858",
"taskName" : "Validate db d6542252-dfa4-47f9-9cfc-22b4f0575c51 for deletion",
"taskResult" : "",
"startTime" : "May 06, 2020 11:36:38 AM CEST",
"endTime" : "May 06, 2020 11:36:38 AM CEST",
"status" : "Success",
"taskDescription" : null,
"parentTaskId" : "TaskSequential_856",
"jobId" : "bcdcbf59-0fe6-44b7-af7f-91f68c7697ed",
"tags" : [ ],
"reportLevel" : "Info",
"updatedTime" : "May 06, 2020 11:36:38 AM CEST"
} ],
"createTimestamp" : "May 06, 2020 11:36:38 AM CEST",
"resourceList" : [ ],
"description" : "Database service deletion with db name: TEST with id : d6542252-dfa4-47f9-9cfc-22b4f0575c51",
"updatedTime" : "May 06, 2020 11:36:38 AM CEST"
}

The job was failing with DCS-10001 Error :

[root@ODA01 bin]# odacli describe-job -i "bcdcbf59-0fe6-44b7-af7f-91f68c7697ed"
 
Job details
----------------------------------------------------------------
ID: bcdcbf59-0fe6-44b7-af7f-91f68c7697ed
Description: Database service deletion with db name: TEST with id : d6542252-dfa4-47f9-9cfc-22b4f0575c51
Status: Failure
Created: May 6, 2020 11:36:38 AM CEST
Message: DCS-10001:Internal error encountered: null.
 
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
database Service deletion for d6542252-dfa4-47f9-9cfc-22b4f0575c51 May 6, 2020 11:36:38 AM CEST May 6, 2020 11:36:50 AM CEST Failure
database Service deletion for d6542252-dfa4-47f9-9cfc-22b4f0575c51 May 6, 2020 11:36:38 AM CEST May 6, 2020 11:36:50 AM CEST Failure
Validate db d6542252-dfa4-47f9-9cfc-22b4f0575c51 for deletion May 6, 2020 11:36:38 AM CEST May 6, 2020 11:36:38 AM CEST Success
Database Deletion May 6, 2020 11:36:39 AM CEST May 6, 2020 11:36:39 AM CEST Success
Unregister Db From Cluster May 6, 2020 11:36:39 AM CEST May 6, 2020 11:36:39 AM CEST Success
Kill Pmon Process May 6, 2020 11:36:39 AM CEST May 6, 2020 11:36:39 AM CEST Success
Database Files Deletion May 6, 2020 11:36:39 AM CEST May 6, 2020 11:36:40 AM CEST Success
Deleting Volume May 6, 2020 11:36:47 AM CEST May 6, 2020 11:36:50 AM CEST Success
database Service deletion for d6542252-dfa4-47f9-9cfc-22b4f0575c51 May 6, 2020 11:36:50 AM CEST May 6, 2020 11:36:50 AM CEST Failure

Troubleshooting

In the dcs-agent.log, located in /opt/oracle/dcs/log folder, you might see following errors :

2019-11-27 13:54:30,106 ERROR [database Service deletion for 89e11f5d-9789-44a3-a09d-2444f0fda99e : JobId=05a2d017-9b64-4e92-a7df-3ded603d0644] [] c.o.d.c.j.JsonRequestProcessor: RPC request invocation failed on request: {"classz":"com.oracle.dcs.agent.rpc.service.dataguard.DataguardActions","method":"deleteListenerEntry","params":[{"type":"com.oracle.dcs.agent.model.DB","value":{"updatedTime":1573023492194,"id":"89e11f5d-9789-44a3-a09d-2444f0fda99e","name":"TEST","createTime":1573023439244,"state":{"status":"CONFIGURED"},"dbName":"TEST","databaseUniqueName":"TEST_RZB","dbVersion":"11.2.0.4.190115","dbHomeId":"c58cdcfd-e5b2-4041-b993-8df5a5d5ada4","dbId":null,"isCdb":false,"pdBName":null,"pdbAdminUserName":null,"enableTDE":false,"isBcfgInSync":null,"dbType":"SI","dbTargetNodeNumber":"0","dbClass":"OLTP","dbShape":"odb1","dbStorage":"ACFS","dbOnFlashStorage":false,"level0BackupDay":"sunday","instanceOnly":true,"registerOnly":false,"rmanBkupPassword":null,"dbEdition":"SE","dbDomainName":"ksbl.local","dbRedundancy":null,"dbCharacterSet":{"characterSet":"AL32UTF8","nlsCharacterset":"AL16UTF16","dbTerritory":"AMERICA","dbLanguage":"AMERICAN"},"dbConsoleEnable":false,"backupDestination":"NONE","cloudStorageContainer":null,"backupConfigId":null,"isAutoBackupDisabled":false}}],"revertable":false,"threadId":111}
! java.lang.NullPointerException: null
! at com.oracle.dcs.agent.rpc.service.dataguard.DataguardOperations.deleteListenerEntry(DataguardOperations.java:2258)
! at com.oracle.dcs.agent.rpc.service.dataguard.DataguardActions.deleteListenerEntry(DataguardActions.java:24)
! at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
! at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
! at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.lang.reflect.Method.invoke(Method.java:498)
! at com.oracle.dcs.commons.jrpc.JsonRequestProcessor.invokeRequest(JsonRequestProcessor.java:33)
! ... 23 common frames omitted
! Causing: com.oracle.dcs.commons.exception.DcsException: DCS-10001:Internal error encountered: null.
! at com.oracle.dcs.commons.exception.DcsException$Builder.build(DcsException.java:68)
! at com.oracle.dcs.commons.jrpc.JsonRequestProcessor.invokeRequest(JsonRequestProcessor.java:45)
! at com.oracle.dcs.commons.jrpc.JsonRequestProcessor.process(JsonRequestProcessor.java:74)
! at com.oracle.dcs.agent.task.TaskZJsonRpcExt.callInternal(TaskZJsonRpcExt.java:65)
! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:182)
! at com.oracle.dcs.agent.task.TaskZJsonRpc.call(TaskZJsonRpc.java:26)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:82)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:37)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:39)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:10)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:82)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:37)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:39)
! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:64)
! at com.oracle.dcs.agent.task.TaskZLockWrapper.call(TaskZLockWrapper.java:21)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:82)
! at com.oracle.dcs.commons.task.TaskApi.call(TaskApi.java:37)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:39)
! at com.oracle.dcs.commons.task.TaskSequential.call(TaskSequential.java:10)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:82)
! at com.oracle.dcs.commons.task.TaskWrapper.call(TaskWrapper.java:17)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
! at java.lang.Thread.run(Thread.java:748)
2019-11-27 13:54:30,106 INFO [database Service deletion for 89e11f5d-9789-44a3-a09d-2444f0fda99e : JobId=05a2d017-9b64-4e92-a7df-3ded603d0644] [] c.o.d.a.z.DCSZooKeeper: DCS node id is - node_0
2019-11-27 13:54:30,106 DEBUG [database Service deletion for 89e11f5d-9789-44a3-a09d-2444f0fda99e : JobId=05a2d017-9b64-4e92-a7df-3ded603d0644] [] c.o.d.a.t.TaskZJsonRpc: Task[TaskZJsonRpcExt_124] RPC request 'Local:node_0@deleteListenerEntry()' completed: Failure

The key error to note would be : Local:node_0@deleteListenerEntry()’ completed: Failure

Explaination

This problem comes from the fact that the listener.ora file has been customized. As per Oracle Support, on an ODA, the listener.ora should never be customized and default listener.ora file should be used. I still have a SR opened with Oracle Support to clarify the situation as I’m fully convinced that this is a regression :

  1. It was always possible in previous ODA versions to delete a database with a customized listener file
  2. We need to customize the listener when setting Data Guard on Oracle 11.2.0.4 Version (still supported on ODA)
  3. We need to customize the listener when doing duplication as dynamic registration is not possible when the database is in nomount state and database is restarted during the duplication.

Moreover other ODA documentations are still referring customization of the listener.ora file when using ODA :
White paper : STEPS TO MIGRATE NON-CDB DATABASES TO ACFS ON ORACLE DATABASEAPPLIANCE 12.1.2
Deploying Oracle Data Guard with Oracle Database Appliance – A WhitePaper (2016-7) (Doc ID 2392307.1)

I will update the post as soon as I have some feedback from Oracle support on this.

The workaround would be to set back the default listener.ora file time of the deletion, which would request a maintenance windows for some customer.

Solution/Workaround Backup of the current listener configuration

OK, so let’s backup our current listener configuration first :

grid@ODA01:/home/grid/ [+ASM1] cd $TNS_ADMIN
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] cp -p listener.ora ./history/listener.ora.20200506

Default ODA listener configuration

The backup of the default listener configuration is the following one :

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] cat listener19071611AM2747.bak
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
ASMNET1LSNR_ASM=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_ASMNET1LSNR_ASM=ON # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_ASMNET1LSNR_ASM=SUBNET # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_LISTENER=SUBNET # line added by Agent

Stopping the listener

Let’s stop the listener :

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl stop listener -listener listener
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl status listener -listener listener
Listener LISTENER is enabled
Listener LISTENER is not running

Put default listener configuration

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] mv listener.ora listener.ora.before_db_del_20200506
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] cp -p listener19071611AM2747.bak listener.ora

Start the listener

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl start listener -listener listener
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl status listener -listener listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): oda01

Delete database

We will try to delete the database again by running the same odacli command :

[root@ODA01 bin]# odacli delete-database -in TEST -fd
{
"jobId" : "5655be19-e0fe-4452-b8a9-35382c67bf96",
"status" : "Running",
"message" : null,
"reports" : [ {
"taskId" : "TaskZJsonRpcExt_1167",
"taskName" : "Validate db d6542252-dfa4-47f9-9cfc-22b4f0575c51 for deletion",
"taskResult" : "",
"startTime" : "May 06, 2020 11:45:01 AM CEST",
"endTime" : "May 06, 2020 11:45:01 AM CEST",
"status" : "Success",
"taskDescription" : null,
"parentTaskId" : "TaskSequential_1165",
"jobId" : "5655be19-e0fe-4452-b8a9-35382c67bf96",
"tags" : [ ],
"reportLevel" : "Info",
"updatedTime" : "May 06, 2020 11:45:01 AM CEST"
} ],
"createTimestamp" : "May 06, 2020 11:45:01 AM CEST",
"resourceList" : [ ],
"description" : "Database service deletion with db name: TEST with id : d6542252-dfa4-47f9-9cfc-22b4f0575c51",
"updatedTime" : "May 06, 2020 11:45:01 AM CEST"
}

Unfortunately the deletion will fail with another error : DCS-10011:Input parameter ‘ACFS Device for delete’ cannot be NULL.

This is due to the fact that previous deletion has already removed the corresponding ACFS volume for the database (DATA and REDO). We will have to create them manually again. I have already described this solution in a previous post : Database deletion stuck in deleting-status.

After restoring the corresponding ACFS Volume, we can retry our database deletion again :

[root@ODA01 bin]# odacli delete-database -in TEST -fd
{
"jobId" : "5e227755-478b-46c5-a5cd-36687cb21ed8",
"status" : "Running",
"message" : null,
"reports" : [ {
"taskId" : "TaskZJsonRpcExt_1443",
"taskName" : "Validate db d6542252-dfa4-47f9-9cfc-22b4f0575c51 for deletion",
"taskResult" : "",
"startTime" : "May 06, 2020 11:47:53 AM CEST",
"endTime" : "May 06, 2020 11:47:53 AM CEST",
"status" : "Success",
"taskDescription" : null,
"parentTaskId" : "TaskSequential_1441",
"jobId" : "5e227755-478b-46c5-a5cd-36687cb21ed8",
"tags" : [ ],
"reportLevel" : "Info",
"updatedTime" : "May 06, 2020 11:47:53 AM CEST"
} ],
"createTimestamp" : "May 06, 2020 11:47:53 AM CEST",
"resourceList" : [ ],
"description" : "Database service deletion with db name: TEST with id : d6542252-dfa4-47f9-9cfc-22b4f0575c51",
"updatedTime" : "May 06, 2020 11:47:53 AM CEST"
}

Which this time will be successful :

[root@ODA01 bin]# odacli describe-job -i "5e227755-478b-46c5-a5cd-36687cb21ed8"
 
Job details
----------------------------------------------------------------
ID: 5e227755-478b-46c5-a5cd-36687cb21ed8
Description: Database service deletion with db name: TEST with id : d6542252-dfa4-47f9-9cfc-22b4f0575c51
Status: Success
Created: May 6, 2020 11:47:53 AM CEST
Message:
 
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Validate db d6542252-dfa4-47f9-9cfc-22b4f0575c51 for deletion May 6, 2020 11:47:53 AM CEST May 6, 2020 11:47:53 AM CEST Success
Database Deletion May 6, 2020 11:47:53 AM CEST May 6, 2020 11:47:54 AM CEST Success
Unregister Db From Cluster May 6, 2020 11:47:54 AM CEST May 6, 2020 11:47:54 AM CEST Success
Kill Pmon Process May 6, 2020 11:47:54 AM CEST May 6, 2020 11:47:54 AM CEST Success
Database Files Deletion May 6, 2020 11:47:54 AM CEST May 6, 2020 11:47:54 AM CEST Success
Deleting Volume May 6, 2020 11:48:01 AM CEST May 6, 2020 11:48:05 AM CEST Success
Delete File Groups of Database TEST May 6, 2020 11:48:05 AM CEST May 6, 2020 11:48:05 AM CEST Success

Restore our customized listener configuration

We can now restore our customized configuration as follows :

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl stop listener -listener listener
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl status listener -listener listener
Listener LISTENER is enabled
Listener LISTENER is not running
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] mv listener.ora.before_db_del_20200506 listener.ora
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl start listener -listener listener
 
grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] srvctl status listener -listener listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): oda01

We could also confirm that the listener started successfully by displaying the tnslsnr running processes :

grid@ODA01:/u01/app/18.0.0.0/grid/network/admin/ [+ASM1] ps -ef | grep tnslsnr | grep -v grep
grid 14922 1 0 10:52 ? 00:00:00 /u01/app/18.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid 97812 1 0 12:07 ? 00:00:00 /u01/app/18.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit

Conclusion

Starting ODA Release 18.5, database deletion will fail if the listener has been customized. Workaround is to to restore the listener default configuration for executing the deletion. This might imply for some customers to have a maintenance windows.

Cet article Issue deleting a database on ODA? est apparu en premier sur Blog dbi services.

How to use DBMS_SCHEDULER to improve performance ?

Sun, 2020-05-24 18:28

From an application point of view, the oracle scheduler DBMS_SCHEDULER allows to reach best performance by parallelizing your process.

Let’s start with the following PL/SQL code inserting in serial several rows from a metadata table to a target table. In my example, the metadata table does not contain “directly” the data but a set a of sql statement to be executed and for which the rows returned must be inserted into the target table My_Target_Table_Serial :

Let’s verify the contents of the source table called My_Metadata_Table:

SQL> SELECT priority,dwh_id, amq_name, sql_statement,scope from dwh_amq_v2;
ROWNUM  DWH_ID  AMQ_NAME SQL_STATEMENT          SCOPE
1	7	AAA1	 SELECT SUM(P.age pt.p	TYPE1
2	28	BBB2  	 SELECT CASE WHEN pt.p	TYPE1
3	37	CCC3	 "select cm.case_id fr"	TYPE2
4	48	DDD4	 "select cm.case_id fr"	TYPE2
5	73	EEE5	 SELECT DISTINCT pt.p	TYPE1
6	90	FFF6 	 SELECT LAG(ORW pt.p	TYPE1
7	114	GGG7	 SELECT distinct pt.	TYPE1
8	125	HHH8	 SELECT DISTINCT pt.p	TYPE1
...
148    115     ZZZ48    SELECT ROUND(TO_NUMBER TYPE2

Now let’s check the PL/SQL program :

DECLARE
  l_errm VARCHAR2(200);
  l_sql  VARCHAR2(32767) := NULL;
  sql_statement_1  VARCHAR2(32767) := NULL;
  sql_statement_2  VARCHAR2(32767) := NULL;
  l_amq_name VARCHAR2(200);
  l_date NUMBER;
BEGIN
  SELECT TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS')) INTO l_date FROM dual;
  FOR rec IN (SELECT dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP
    ...
    sql_statement_1 := substr(rec.sql_statement, 1, 32000);
    sql_statement_2 := substr(rec.sql_statement, 32001);
    IF rec.SCOPE = 'TYPE1' THEN 
      -- TYPE1 LEVEL SELECT
      l_sql := 'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)'||CHR(13)|| 'SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''),'||rec.dwh_id|| ',''' ||rec.amq_name ||''', case_id, 1'||CHR(13)
      || ' FROM (SELECT dwh_pit_date FROM dwh_code) code, ('||sql_statement_1;
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ')';
      COMMIT;    
    ELSE 
      -- TYPE2 LEVEL SELECT
      l_sql :=  'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)
      SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''), '||rec.dwh_id|| ',''' ||rec.amq_name || ''', cm.case_id, cm.enterprise_id'||CHR(13)
      || '  FROM (SELECT dwh_pit_date FROM dwh_code) code, v_sc_case_master cm, v_sc_case_event ce, ('||sql_statement_1;
              
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ') pt'||CHR(13)
      || ' WHERE cm.case_id = ce.case_id'||CHR(13) 
      || '   AND cm.deleted IS NULL AND cm.state_id <> 1'||CHR(13)
      || '   AND ce.deleted IS NULL AND ce.pref_term = pt.pt_name';
      COMMIT;         
    END IF;
    ...
   END LOOP:
END;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 16:17:46
END : 16:57:42
Total :  40 mins

 

As we can see, each Sql Statement is executed in serial, let’s check the audit table recording the loading time (Insert Time) and the “scheduling”   :

CREATE_DATE		NAME	START_DATE		END_DATE            LOADING_TIME
22.05.2020 16:46:34	AAA1	22.05.2020 16:46:34	22.05.2020 16:57:42    11.08mins
22.05.2020 16:42:05	BBB2	22.05.2020 16:42:05	22.05.2020 16:46:34    04.29mins
22.05.2020 16:41:15	CCC3	22.05.2020 16:41:15	22.05.2020 16:42:05    50sec
22.05.2020 16:40:42	DDD4	22.05.2020 16:40:42	22.05.2020 16:41:15    32sec
22.05.2020 16:40:20	EEE5	22.05.2020 16:40:20	22.05.2020 16:40:42    22sec
22.05.2020 16:37:23	FFF6	22.05.2020 16:37:23	22.05.2020 16:40:20    02.57mins
22.05.2020 16:37:12	GGG7	22.05.2020 16:37:12	22.05.2020 16:37:23    11sec
...
22.05.2020 16:36:03	ZZZ148	22.05.2020 16:17:35	22.05.2020 16:17:46    11sec

To resume :

  • The 148 rows (148 Sql Statement) coming from the source table are loaded in serial in 40mins.
  • The majority of rows have taken less than 01 min to load (Ex. : Name = CCC3,DDD4,EEE5,GGG7 and ZZZ148)
  • Few rows have taken more than a couple of minutes to load.
  • The maximum loading time is 11.08mins for the Name “AA1”.
  • Each row must wait the previous row complete his loading before to start his loading (compare END_DATE previous vs START_DATE current).

To optimize the process, let’s trying to load all the rows coming from the source table in parallel by using the oracle scheduler DBMS_SCHEDULER.

Instead to execute directly the Insert command in the loop, let’s create a job through DBMS_SCHEDULER:

FOR rec IN (SELECT priority,dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP

     l_amq_name := rec.amq_name;
       IF rec.SCOPE = 'TYPE1' THEN 
        -- TYPE1 LEVEL SELECT
         ...
  
            --Execute Job to insert the AMQ : Background process
            DBMS_SCHEDULER.CREATE_JOB (
            job_name             => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
            job_type             => 'PLSQL_BLOCK',
            job_action           => 'BEGIN
                                      LOAD_DATA(''CASE'','||''''||l_amq_name||''''||','||rec.priority||','||l_date||','||v_SESSION_ID||','||i||');
                                     END;',
            start_date    =>  sysdate,  
            enabled       =>  TRUE,  
            auto_drop     =>  TRUE,  
            comments      =>  'job for amq '||l_amq_name);
          END IF;
        ELSE 
            ...
            END IF;
        END IF; 
      i := i +1;
  END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 08:14:03
END : 08:42:32
Total :  27.57 mins

To resume :

  • The 148 rows (148 Sql Statement) coming from the source table are loaded now in parallel in 27.57mins instead of 40mins in serial.
  • The options of DBMS_SCHEDULER are  :
    • As we are limited in number of character for the parameter “job_action”, we have to insert the data through a PL/SQL procedure LOAD_DATA.
    • The job is executed immediately (start_date=sysdate) and purged immediately after his execution (auto_drop=TRUE).

Let’s check now how the jobs are scheduled. Since we do a loop of 148 times, I expect to have 148 jobs:

First, let’s check now if the rows (Remember, One Row = One Insert Into Target Table From Source Table) are loaded in parallel :

CREATE_DATE 	    NAME START_DATE 	        END_DATE 				       
22.05.2020 16:46:34 AAA1 23.05.2020 08:14:04	23.05.2020 08:21:19
22.05.2020 16:42:05 BBB2 23.05.2020 08:14:04	23.05.2020 08:20:43
22.05.2020 16:41:15 CCC3 23.05.2020 08:14:04	23.05.2020 08:21:59
22.05.2020 16:40:42 DDD4 23.05.2020 08:14:03	23.05.2020 08:15:29
22.05.2020 16:40:20 EEE5 23.05.2020 08:14:03	23.05.2020 08:15:05
22.05.2020 16:37:23 FFF6 23.05.2020 08:14:03	23.05.2020 08:14:47
22.05.2020 16:37:12 GGG7 23.05.2020 08:14:03	23.05.2020 08:15:59
...                     
22.05.2020 16:36:03 ZZZ148 22.05.2020 16:17:35 22.05.2020 16:17:46

This is the case, all rows have the same start_date, meaning all rows start in parallel. Let’s verify into “all_scheduler_job_run_details” to check we have our 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200523081403';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200523081403';
LOG_DATE		JOB_NAME		        STATUS		REQ_START_DATE
23-MAY-20 08.42.41	AMQ_P3J147_20200523081403	SUCCEEDED	23-MAY-20 02.42.32
23-MAY-20 08.42.32	AMQ_P2J146_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.56	AMQ_P2J145_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.33	AMQ_P2J144_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.22	AMQ_P2J143_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.03	AMQ_P2J141_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.36.50	AMQ_P2J142_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.33.57	AMQ_P2J140_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
--Only the first 8 rows are displayed

To resume :

  • We have 148 jobs all started, most of the time in parallel (job with same REQ_START_DATE, oracle parallelizes jobs per block randomly).
  • My PL/SQL process now took 27.57 mins instead of 40mins.

But if we have a look in details, we have a lot of small jobs. Those are jobs where run_duration is less than 01 mins:

SQL> select run_duration from all_scheduler_job_run_details where job_name like '%20200523081403' order by run_duration;

RUN_DURATION
+00 00:00:04.000000
+00 00:00:07.000000
+00 00:00:09.000000
+00 00:00:10.000000
+00 00:00:13.000000
+00 00:00:15.000000
+00 00:00:20.000000
+00 00:00:27.000000
+00 00:00:33.000000
+00 00:00:35.000000
+00 00:00:36.000000
+00 00:00:38.000000
+00 00:00:43.000000
+00 00:00:46.000000
+00 00:00:51.000000
+00 00:00:52.000000

As we have a lot of small jobs (short-lived jobs), it will be more interesting to use lightweight jobs instead of regular jobs.

In contrary of regular jobs, lightweight jobs :

  • Require less meta data, so they have quicker create and drop times.
  • Suited for short-lived jobs (small jobs, jobs where run_duration is low).

Let’s rewrite our PL/SQL process using lightweight jobs :

To use lightweight jobs, first create a program suitable for a lightweight job :

begin
dbms_scheduler.create_program
(
    program_name=>'LIGHTWEIGHT_PROGRAM',
    program_action=>'LOAD_AMQ',
    program_type=>'STORED_PROCEDURE',
    number_of_arguments=>6, 
    enabled=>FALSE);
END;

Add the arguments (parameters) and enable the program :

BEGIN
dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>1,
argument_type=>'VARCHAR2',
DEFAULT_VALUE=>NULL);

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>2,
argument_type=>'VARCHAR2');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>3,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>4,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>5,
argument_type=>'VARCHAR');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>6,
argument_type=>'NUMBER');

dbms_scheduler.enable('lightweight_program');  
end;

Into the PL/SQL code, let’s create the lightweight job without forget to set the argument value before running the job:

DECLARE
...
BEGIN
....
LOOP
DBMS_SCHEDULER.create_job (
job_name        => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
program_name    => 'LIGHTWEIGHT_PROGRAM',
job_style       => 'LIGHTWEIGHT',
enabled         => FALSE);
                  
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 1,
   argument_value          => rec.scope);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 2,
   argument_value          => l_amq_name);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 3,
   argument_value          => rec.priority);

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 4,
   argument_value          => l_date);   

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 5,
   argument_value          => v_SESSION_ID);  

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 6,
   argument_value          => i); 

dbms_scheduler.run_job('AMQ_P'||rec.priority||'j'||i||'_'||l_date,TRUE);
...
END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute) 
START : 18:08:56
END : 18:27:40
Total : 18.84 mins

 

Let’s check we have always 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200524175036';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200524175036';

LOG_DATE           JOB_NAME     STATUS	        REQ_START_DATE
24-MAY-20 05.50.51 AB1C		SUCCEEDED	24-MAY-20 05.50.36
24-MAY-20 05.50.56 AB1D		SUCCEEDED	24-MAY-20 05.50.51
24-MAY-20 05.51.14 AB1E		SUCCEEDED	24-MAY-20 05.50.56
24-MAY-20 05.51.49 AB1I		SUCCEEDED	24-MAY-20 05.51.14
24-MAY-20 05.52.14 AB1P		SUCCEEDED	24-MAY-20 05.51.49
24-MAY-20 05.52.34 AB1L		SUCCEEDED	24-MAY-20 05.52.14
24-MAY-20 05.52.55 AB1N		SUCCEEDED	24-MAY-20 05.52.34
24-MAY-20 05.53.17 AB1M		SUCCEEDED	24-MAY-20 05.52.55
24-MAY-20 05.53.29 AB1K		SUCCEEDED	24-MAY-20 05.53.17
24-MAY-20 05.53.39 AB1O		SUCCEEDED	24-MAY-20 05.53.29
24-MAY-20 05.53.57 AB1U		SUCCEEDED	24-MAY-20 05.53.39
24-MAY-20 05.54.07 AB1V		SUCCEEDED	24-MAY-20 05.53.57

To resume :

  • We have 148 jobs all started, most of the time in parallel.
  • My PL/SQL process now took 18.54 mins (Lightweight Jobs) instead of 27.57mins (Regular Jobs).
  • If we compare Regular Jobs VS Lightweight Jobs, the former seems to schedule the jobs randomly (start jobs with block of 4,5,6…8) while the last one schedule jobs by block of 3 or 4 (as we can see above).

Conclusion :

  • DBMS_SCHEDULER (Regular Jobs or Lightweight Jobs) can improve significantly your PL/SQL performance transforming transforming your serial process in parallel process.
  • If you have small jobs (short lived-jobs), use lightweight jobs instead regular jobs.
  • Don’t underestimate the development time (development, test, bug solving) to transform your serial process to parallel process. Create 1 job is different to create more than 100 or 1000 jobs through a PL/SQL loop (concurrency problem, CPU used by create/drop the jobs).
  • As developer, you are responsible to manage your jobs (create,drop,purge) in order to not fill the oracle parameter job_queue_processes (used by a lot of critical oracle processes).

Cet article How to use DBMS_SCHEDULER to improve performance ? est apparu en premier sur Blog dbi services.

PostgreSQL Shared Buffers vs free RAM

Wed, 2020-05-20 08:54

PostgreSQL, like all other database engines, modifies the table and index blocks in shared buffers. People think that the main goal of buffered reads is to act as a cache to avoid reading from disk. But that’s not the main reason as this is not mandatory. For example PostgreSQL expects that the filesystem cache is used. The primary goal of shared buffers is simply to share them because multiple sessions may want to read a write the same blocks and concurrent access is managed at block level in memory. Without shared buffers, you would need to lock a whole table. Most of the database engines use the shared buffers for caching. Allocating more memory can keep the frequently used blocks in memory rather than accessing disk. And because they manage the cache with methods suited to the database (performance and reliability) they don’t need another level of cache and recommend direct I/O to the database files. But not with Postgres. In order to keep the database engine simple and portable, PostgreSQL relies on the filesystem cache. For example, no multiblock read is implemented. Reading contiguous blocks should be optimized by the filesystem read-ahead.
But this simplicity of code implies a complexity for configuring a PostgreSQL database system. How much to set for the shared_buffers? And how much to keep free in the OS to be used for the filesystem cache?

I am not giving any answer here. And I think there is, unfortunately, no answer. The documentation is vague. It defines the recommended value as a percentage of the available RAM. That makes no sense for me. The cache is there to keep frequently used blocks and that depends on your application workload. Not on the available RAM. There is also this idea that because there is double buffering, you should allocate the same size to both caches. But that makes no sense again. If you keep blocks in the shared buffers, they will not be frequently accessed from the filesystem and will not stay in the filesystem cache. Finally, don’t think the defaults are good. The default shared_buffers is always too small and the reason is that a small value eases the installation on Linux without having to think about SHM size. But you have more, need more, and will need to configure SHM and huge pages for it.

My only recommendation is to understand your workload: what is the set of data that is frequently used (lookup tables, index branches, active customers, last orders,…). This should fit in the shared buffers. And the second recommendation is to understand how it works. For the set of data that is frequently read only, this is not so difficult. You need to have an idea about the postgres cache algorithm (LRU/Clock sweep in shared buffers which is static size) and the filesystem cache (LRU with the variable size of free memory). For the set of data that is modified, this much more complex.

Let’s start with the reads. As an example I have run a workload that reads at random within a small set of 150MB. I used pgio from Kevin Closson for that. I’ve executed multiple runs with warying the shared_buffers from smaller than my work set, 50MB, to 2x larger: 50MB, 100MB, 150MB, 200MB, 250MB, 300MB. Then, for each size of shared buffers, I’ve run with variations in the available RAM (which is used by the filesystem cache): 50MB, 100MB, 150MB, 200MB, 250MB, 300MB, 350MB, 400MB. I ensured that I have enough physical memory so that the system does not swap.

I determined the free space in my system:


[ec2-user@ip-172-31-37-67 postgresql]$ sync ; sync ; sync ; free -hw
              total        used        free      shared     buffers       cache   available
              total        used        free      shared     buffers       cache   available
Mem:           983M         63M        835M        436K          0B         84M        807M
Swap:            0B          0B          0B

I also use https://medium.com/@FranckPachot/proc-meminfo-formatted-for-humans-350c6bebc380


[ec2-user@ip-172-31-37-67 postgresql]$ awk '/Hugepagesize:/{p=$2} / 0 /{next} / kB$/{v[sprintf("%9d MB %-s",int($2/1024),$0)]=$2;next} {h[$0]=$2} /HugePages_Total/{hpt=$2} /HugePages_Free/{hpf=$2} {h["HugePages Used (Total-Free)"]=hpt-hpf} END{for(k in v) print sprintf("%-60s %10d",k,v[k]/p); for (k in h) print sprintf("%9d MB %-s",p*h[k]/1024,k)}' /proc/meminfo|sort -nr|grep --color=auto -iE "^|( HugePage)[^:]*" #awk #meminfo                                                                           33554431 MB VmallocTotal:   34359738367 kB                    16777215
      983 MB MemTotal:        1006964 kB                            491
      946 MB DirectMap2M:      968704 kB                            473
      835 MB MemFree:          855628 kB                            417
      808 MB MemAvailable:     827392 kB                            404
      491 MB CommitLimit:      503480 kB                            245
      200 MB Committed_AS:     205416 kB                            100
       78 MB DirectMap4k:       79872 kB                             39
       69 MB Cached:            71436 kB                             34
       55 MB Active:            56640 kB                             27
       43 MB Inactive:          44036 kB                             21
       42 MB Inactive(file):    43612 kB                             21
       29 MB Slab:              30000 kB                             14
       28 MB AnonPages:         29260 kB                             14
       28 MB Active(anon):      29252 kB                             14
       26 MB Active(file):      27388 kB                             13
       14 MB SUnreclaim:        14876 kB                              7
       14 MB SReclaimable:      15124 kB                              7
       13 MB Mapped:            14212 kB                              6
        4 MB PageTables:         4940 kB                              2
        2 MB Hugepagesize:       2048 kB                              1
        1 MB KernelStack:        1952 kB                              0
        0 MB Shmem:               436 kB                              0
        0 MB Inactive(anon):      424 kB                              0
        0 MB HugePages Used (Total-Free)
        0 MB HugePages_Total:       0
        0 MB HugePages_Surp:        0
        0 MB HugePages_Rsvd:        0
        0 MB HugePages_Free:        0
        0 MB Dirty:                 4 kB                              0

Then in order to control how much free RAM I want to set ($fs_MB) I allocate the remaining as huge pages:


sudo bash -c "echo 'vm.nr_hugepages = $(( (835 - $fs_MB) / 2 ))' > /etc/sysctl.d/42-hugepages.conf ; sysctl --system"

This limits the RAM available for the fileystem cache because huges pages cannot be used for it. And the huges pages can be used for the postgres shared buffers:


sed -i -e "/shared_buffers/s/.*/shared_buffers = ${pg_MB}MB/" -e "/huge_pages =/s/.*/huge_pages = on/" $PGDATA/postgresql.conf
grep -E "(shared_buffers|huge_pages).*=" $PGDATA/postgresql.conf
/usr/local/pgsql/bin/pg_ctl -l $PGDATA/logfile restart

Note that I actually stopped postgres before to be sure that no huge pages are used when resizing them:


for pg_MB in 50 100 150 200 250 300 350 400
do
for fs_MB in 400 350 300 250 200 150 100 50
do
/usr/local/pgsql/bin/pg_ctl -l $PGDATA/logfile stop
# set huge pages and shared buffers
/usr/local/pgsql/bin/pg_ctl -l $PGDATA/logfile restart
# run pgio runit.sh
done
done
100% Read only workload on 150MB

Here is the result. Each slide on the z-axis is a size of shared-buffers allocated by postgres. On the x-axis the size of the available RAM that can be used for filesystem cache by the Linux kernel. the y-axis is the number of tuples read during the run.

You will never get optimal performance when the frequent read set doesn’t fit in shared buffers. When the read set is larger than the shared buffers, you need more RAM in order to get lower performance. The frequently read set of data should fit in shared buffers.

50% updates on 150MB

Here is the same run where I only changed PCT_UPDATE to 50 in pgio.conf

This looks similar but there are two main differences, one visible here and another that is not represented in this graphic because I aggregated several runs.

First, increasing the shared buffers above the set of frequently manipulated data still improves performance, which was not the case with reads. As soon as the shared buffer is above the working set of 150MB the buffer cache hit ratio is at 100%. But that’s for reads. Updates generate a new version of data and both versions will have to be vacuumed and checkpoint.

Here is a graph about blks_read/s which shows that for a read-only workload we do not do any physical reads (I/O calls from the database to the filesystem) as soon as the working set fits in shared buffers. When we write, the read calls still improve when we increase the shared buffers a bit above the modified set. And the physical read efficiency is the best when there is as much free RAM as shared buffers.

Second point about the write workload, performance is not homogenous at all. Vacuum and Checkpoint happen at regular intervals and make the performance un-predictable. When showing the tuples/second for the 50% write workload, I aggregated many runs to display the maximum throughput achieved. Having the modified set of data fitting in free RAM helps to lower this variation as it avoids immediate physical write latency. The balance between shared buffers and free RAM is then a balance between high performance and predictability of performance: keep free ram as a performance “buffer” between userspace access and disk reads. There are also many parameters to “tune” this like with the frequency of vacuum and checkpoints. And this makes memory tuning very complex. Note that I changed only the shared_buffers here. When increasing shared_buffers for high write throughput, there are other structures to increase like the WAL segments.

The filesystem cache adds another level of un-predictable performance. For example, you may run a backup that reads all the database, bringing the blocks into the Least Recently Used. And I didn’t do any sequential scans here. They benefit from filesystem buffers with pre-fetching. All theses make any quick recommendation incorrect. Buffer cache hit ratios make no sense to estimate the response time improvement as they are not aware of the filesystem cache hits. But looking at them per relation, tables or indexes may help to estimate which relation is frequently accessed. Because that’s what matters: not the size of the database, not the size of your server RAM, not the general buffer cache hit ratio, but the size of data that is read and written frequently by your application.

Cet article PostgreSQL Shared Buffers vs free RAM est apparu en premier sur Blog dbi services.

Oracle Standard Edition on AWS ☁ socket arithmetic

Wed, 2020-05-20 03:57
By Franck Pachot

.
Note that I’ve written previously about Oracle Standard Edition 2 licensing before but a few rules change. This is written in May 2020.
TL;DR: 4 vCPU count for 1 socket and 2 sockets count for 1 server wherever hyper-threading is enabled or not.

The SE2 rules

I think the Standard Edition rules are quite clear now: maximum server capacity, cluster limit, minimum NUP, and processor metric. Oracle has them in the Database Licensing guideline.

2 socket capacity per server

Oracle Database Standard Edition 2 may only be licensed on servers that have a maximum capacity of 2 sockets.

We are talking about capacity which means that even when you remove a processor from a 4 socket server, it is still a 4 socket server. You cannot run Standard Edition if the servers have be the possibility for more than 2 sockets per server whether there is a processor in the socket or not.

2 socket used per cluster

When used with Oracle Real Application Clusters, Oracle Database Standard Edition 2 may only be licensed on a maximum of 2 one-socket servers

This one is not about capacity. You can remove a processor from a bi-socket to become a one-socket server, and then build a cluster running RAC in Standard Edition with 2 of those nodes. The good thing is that you can even use Oracle hypervisor (OVM or KVM), LPAR or Zones to pin one socket only for the usage of Oracle, and use the other for something else. The bad thing is that as of 19c, RAC with Standard Edition is not possible anymore. You can run the new SE HA which allows more on one node (up to the 2 socket rule) because the other node is stopped (the 10 days rule).

At least 10 NUP per server

The minimum when licensing by Named User Plus (NUP) metric is 10 NUP licenses per server.

Even when you didn’t choose the processor metric you need to count the servers. For example, if your vSphere cluster runs on 4 bi-socket servers, you need to buy 40 NUP licenses even if you can count a smaller population of users.

Processor metric

When licensing Oracle programs with … Standard Edition in the product name, a processor is counted equivalent to a socket; however, in the case of multi-chip modules, each chip in the multi-chip module is counted as one occupied socket.

A socket is a plastic slot where you can put a processor in it. This is what counts for the “2 socket capacity per server”. An occupied socket is one with a processor, physically or pinned with an accepted hard partitioning hypervisor method (Solaris Zones, IBM LPAR, Oracle OVM or KVM,…). This is what counts for the “2 socket occupied per cluster” rule. Intel is not concerned by the multi-module chip exception.

What about the cloud?

So, the rules mention servers, sockets and processors. How does this apply to modern computing where you provision a number of vCPU without knowing anything about the underlying hardware? In the AWS shared responsibility model you are responsible for the Oracle Licences (BYOL – Bring Your Own Licences) but they are responsible for the physical servers.

Oracle established the rules (which may or may not be referenced by your contract) in the Licensing Oracle Software in the Cloud Computing Environment (for educational purposes only – refer to your contract if you want the legal interpretation).

This document is only for AWS and Azure. There’s no agreement with Google Cloud and then you cannot run an Oracle software under license. Same without your local cloud provider: you are reduced to hosting on physical servers. The Oracle Public Cloud has its own rules and you can license Standard Edition on a compute instance with up to 16 OCPU and one processor license covers 4 OCPU (which is 2 hyper-threaded Intel cores).

Oracle authorizes to run on those 2 competitor public clouds. But they generally double the licenses required on competitor platforms in order to be cheaper on their own. They did that on-premises a long time ago for IBM processors and they do that now for Amazon AWS and Microsoft Azure.

So, the arithmetic is based on the following idea: 4 vCPU counts for 1 socket and 2 sockets counts for 1 server

Note that there was a time where it was 1 socket = 2 cores which meant that it was 4 vCPU when hyper-threading is enabled but 2 vCPU when not. They have changed the document and we count vCPU without looking at cores or threads. Needless to say that for optimal performance/price in SE you should disable hyper-threading in AWS in order to have your processes running on full cores. And use instance caging to limit the user sessions in order to leave a core available for the background processes.

Here are the rules:

  • 2 socket capacity per server: maximum 8 vCPU
  • 2 socket occupied per cluster: forget about RAC in Standard Edition and RAC in AWS
  • Minimum NUP: 10 NUP are ok to cover the maximum allowed 8 vCPU
  • Processor metric: 1 license covers 4 vCPU
Example

The maximum you can use for one database:
2 SE2 processor licences = 1 server = 2 sockets = 8 AWS vCPU
2 SE2 processor licences = 8 cores = 16 OCPU in Oracle Cloud

The cheaper option means smaller capacity:
1 SE2 processor licences = 1 sockets = 4 AWS vCPU
1 SE2 processor licences = 4 cores = 8 OCPU in Oracle Cloud

As you can see The difference between Standard and Enterprise Edition in the clouds is much smaller than on-premises where a socket can run more and more cores. The per-socket licensing was made at a time where processors had only a few cores. With the evolution, Oracle realized that SE was too cheap. They caged the SE2 usage to 16 threads per database and limit further on their competitor’s cloud. Those limits are not technical but governed by revenue management: they provide a lot of features in SE but also need to ensure that large companies still require EE.

But…

… there’s always an exception. It seems that Amazon has a special deal to allow Oracle Standard Edition on AWS RDS with EC2 instances up to 16 vCPU:

You know that I always try to test what I’m writing in a blog post. So, at least as of the publishing date and with the tested versions, it gets some verified facts.
I started an AWS RDS Oracle database on db.m4.4xlarge which is 16 vCPU. I’ve installed the instant client in my bastion console to access it:


sudo yum localinstall http://yum.oracle.com/repo/OracleLinux/OL7/oracle/instantclient/x86_64/getPackage/oracle-instantclient19.5-basic-19.5.0.0.0-1.x86_64.rpm
sudo yum localinstall http://yum.oracle.com/repo/OracleLinux/OL7/oracle/instantclient/x86_64/getPackage/oracle-instantclient19.5-sqlplus-19.5.0.0.0-1.x86_64.rpm

This is Standard Edition 2:


[ec2-user@ip-10-0-2-28 ~]$ sqlplus admin/FranckPachot@//database-1.ce45l0qjpoax.us-east-1.rds.amazonaws.com/ORCL

SQL*Plus: Release 19.0.0.0.0 - Production on Tue May 19 21:32:47 2020
Version 19.5.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Tue May 19 2020 21:32:38 +00:00

Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.7.0.0.0

On 16 vCPU:


SQL> show parameter cpu_count

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     16

On AWS:

SQL> host curl http://169.254.169.254/latest/meta-data/services/domain
amazonaws.com

With more than 16 threads in CPU:

SQL> @ snapper.sql ash 10 1 all
Sampling SID all with interval 10 seconds, taking 1 snapshots...

-- Session Snapper v4.31 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)


---------------------------------------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------------------------
   19.29   (1929%) |    1 | 3zkr1jbq4ufuk   | 0         | ON CPU                              | ON CPU
    2.71    (271%) |    1 | 3zkr1jbq4ufuk   | 0         | resmgr:cpu quantum                  | Scheduler
     .06      (6%) |    1 |                 | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2020-05-19 21:34:00, seconds=10, samples_taken=49, AAS=22.1

PL/SQL procedure successfully completed.

I also checked on CloudWatch (the AWS monitoring from the hypervisor) that I am running 100% on CPU.

I tested this on a very limited time free lab environment (this configuration is expensive) and didn’t check whether hyperthreading was enabled or not (my guess is: disabled) and I didn’t test if setting CPU_COUNT would enable instance caging (SE2 is supposed to be internally caged at 16 CPUs but I see more sessions on CPU there).

Of course, I shared my surprise (follow me on Twitter if you like this kind of short info about databases – I don’t really look at the numbers but it seems I may reach 5000 followers soon so I’ll continue at the same rate):

Do you think that running @OracleDatabase Standard Edition 2 in @awscloud is limited to 8 vCPU, as mentioned in https://t.co/J5IIQl49t1 Licensing Oracle Software in the Cloud Computing Environment?
Seems we can go further. This is RDS db.m4.4xlarge: pic.twitter.com/denby1JaLM

— Franck Pachot (@FranckPachot) May 19, 2020


and I’ll update this post when I have more info about this.

Cet article Oracle Standard Edition on AWS ☁ socket arithmetic est apparu en premier sur Blog dbi services.

SQL Server: Synchronize logins on AlwaysOn replicas with dbatools

Tue, 2020-05-19 10:47

The SQL Server environment  I worked with today has dozens of SQL Server instances using AlwaysOn Availability Groups for High Availability.
When a login is created on the Primary replica of an Availability Group it is not synchronized automatically on secondary replicas. This might cause some issues after a failover (Failed logins).

Since this is not done automatically by SQL Server out of the box the DBA has to perform this task.
To avoid doing this with T-SQL (sp_help_revlogin) or SSMS I use the magical dbatools and perform the following tasks once a week.dbatools

  1. Get the number of logins on each instance.
$primary = Get-DbaLogin -SqlInstance 'SQL01\APPX'
$primary.Count

$secondary = Get-DbaLogin -SqlInstance 'SQL02\APPX'
$secondary.Count
PS C:\> $primary.Count
41
PS C:\> $secondary.Count
40
  1. If numbers don’t match, I use the Copy-Login function to synchronize the missing login.
Copy-DbaLogin -Source 'SQL01\APPX' -Destination 'SQL02\APPX' `
	-Login (($primary | Where-Object Name -notin $secondary.Name).Name)

PS C:\>
Type             Name         Status     Notes
----             ----         ------     -----
Login - SqlLogin loginName    Successful

Obviously, there are many drawbacks to this process;

  • Having the same number of logins doesn’t mean they are actually the same.
    Some logins can be missing on both sides compared to the other one and both instances have the same number of logins.
  • I need to know which instance is the current primary replica (-Source in Copy-DbaLogin)
  • This is a manual process I do on every pair of instances using AlwaysOn.
  • I want a script that can manage any number of secondary replica

So I decided to write a new script that would automatically synchronize login from primary replicas to all secondary replicas. The only parameter I want to use as input for this script is the name of the listener.

Here is the simplest version of this script I could write;

$lsn = 'APP01-LSTN'

$primaryReplica =    Get-DbaAgReplica -SqlInstance $lsn | Where Role -eq Primary
$secondaryReplicas = Get-DbaAgReplica -SqlInstance $lsn | Where Role -eq Secondary

# primary replica logins
$primaryLogins = (Get-DbaLogin -SqlInstance $primaryReplica.Name)

$secondaryReplicas | ForEach-Object {
    # secondary replica logins
    $secondaryLogins = (Get-DbaLogin -SqlInstance $_.Name)

    $diff = $primaryLogins | Where-Object Name -notin ($secondaryLogins.Name)
    if($diff) {
        Copy-DbaLogin -Source $primaryReplica.Name -Destination $_.Name -Login $diff.Nane
    }   
}

Using just the listener name with Get-DbaAgReplica I can get all the replicas by Role, either Primary or Secondary.
Then I just need to loop through the secondary replicas and call Copy-DbaLogin.

I use a Central Management Server as an inventory for my SQL servers. I have groups containing only listeners.

CMS

The list of listeners can be easily retrieved from the CMS with Get-DbaRegisteredServer.

$Listeners= Get-DbaRegisteredServer -SqlInstance 'MyCmsInstance' | Where-Object {$_.Group -Like '*Prod*Listener'};

Now, looping through each listener I can sync dozens of secondary replicas in my SQL Server Prod environment with a single script run.
I had some issues with instances having multiple availability groups so I added: “Sort-Object -Unique”.
Notice I also filtered out some logins I don’t want to synchronize.

$Listeners = Get-DbaRegisteredServer -SqlInstance 'MyCmsInstance' | Where-Object {$_.Group -Like '*Prod*Listener*'};

foreach ($lsn in $Listeners) {

    $primaryReplica =    Get-DbaAgReplica -SqlInstance $lsn.ServerName | Where Role -eq Primary | Sort-Object Name -Unique
    $secondaryReplicas = Get-DbaAgReplica -SqlInstance $lsn.ServerName | Where Role -eq Secondary | Sort-Object Name -Unique
    <#
    Some instances have more than 1 AvailabilityGroup
        => Added Sort-Object -Unique
    #>

    # primary replica logins
    $primaryLogins = (Get-DbaLogin -SqlInstance $primaryReplica.Name -ExcludeFilter '##*','NT *','BUILTIN*', '*$')
    
    $secondaryReplicas | ForEach-Object {
        # secondary replica logins
        $secondaryLogins = (Get-DbaLogin -SqlInstance $_.Name -ExcludeFilter '##*','NT *','BUILTIN*', '*$')
        
        $diff = $primaryLogins | Where-Object Name -notin ($secondaryLogins.Name)
        if($diff) {
            Copy-DbaLogin -Source $primaryReplica.Name -Destination $_.Name -Login ($diff.Nane) -Whatif
        } 
    }  
}

Do not test this script in Production. Try it in a safe environment first, then remove the “-WhatIf” switch.
The next step for me might be to run this script on a schedule. Or even better, trigger the execution after an Availability Group failover?

Copy-DbaLogin is one of many dbatools commands that can be very useful to synchronize objects between instances. You can find a few examples below.

Cet article SQL Server: Synchronize logins on AlwaysOn replicas with dbatools est apparu en premier sur Blog dbi services.

Oracle Text : Using and Indexing – the CONTEXT Index

Sat, 2020-05-16 17:46

Everybody has already faced performance problem with oracle CLOB columns.

The aim of this blog is to show you (always from a real user case) how to use one of Oracle Text Indexes (CONTEXT index) to solve performance problem with CLOB column.

The oracle text complete documentation is here : Text Application Developer’s Guide

Let’s start with the following SQL query which take more than 6.18 minutes to execute :

SQL> set timing on
SQL> set autotrace traceonly
SQL> select * from v_sc_case_pat_hist pat_hist where upper(pat_hist.note) LIKE '%FC IV%';

168 rows selected.

Elapsed: 00:06:18.09

Execution Plan
----------------------------------------------------------
Plan hash value: 1557300260

--------------------------------------------------------------------------------
---------------------

| Id  | Operation                           | Name          | Rows  | Bytes | Co
st (%CPU)| Time     |

--------------------------------------------------------------------------------
---------------------

|   0 | SELECT STATEMENT                    |               |   521 |  9455K| 24
285   (1)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE      |     1 |    28 |
  2   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN                   | PK_DWH_CODE   |     1 |       |
  1   (0)| 00:00:01 |

|*  3 |  VIEW                               |               |   521 |  9455K| 24
285   (1)| 00:00:01 |

|*  4 |   TABLE ACCESS FULL                 | CASE_PAT_HIST |   521 |   241K| 24
283   (1)| 00:00:01 |

--------------------------------------------------------------------------------
---------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DWH_CODE_NAME"='DWH_PIT_DATE')
       filter("DWH_CODE_NAME"='DWH_PIT_DATE')
   3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND
              "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE"))
   4 - filter(UPPER("NOTE") LIKE '%FC IV%')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     134922  consistent gets
     106327  physical reads
        124  redo size
      94346  bytes sent via SQL*Net to client
      37657  bytes received via SQL*Net from client
        338  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        168  rows processed

SQL>

 

Checking the execution plan, oracle optimizer does a Full Scan to access the table CASE_PAT_HIST.

As the the sql function upper(pat_hist.note) is used, let’s try to create a function based index :

CREATE INDEX IDX_FBI_I1 ON SCORE.CASE_PAT_HIST (UPPER(note))
                                                *
ERROR at line 1:
ORA-02327: cannot create index on expression with datatype LOB

 

The message is clear, we cannot create an index on a column with dataype LOB.

Moreover :

  1. Even if my column datatype would not be a CLOB, the search criteria LIKE ‘%FC IV%’  prevent the use of any index since the oracle optimizer has no idea from which letter the string get started, so it will scan the whole table.
  2. Indeed, only the following search criteria will use the index :
    1. LIKE ‘%FC IV’
    2. LIKE ‘FC IV%’
    3. LIKE ‘FC%IV’

So to improve the performance of my SQL query and to index my CLOB column, the solution is to create an Oracle Text Index :

In Oracle 12.1 release, three different Oracle Text Index exists :

  • CONTEXT: Suited for indexing collections or large coherent documents.
  • CTXCAT: Suited for small documents or text fragments.
  • CTXRULE: Used to build a document classification or routing application.

So, let’s try to create an oracle text index of type CONTEXT :

SQL> CREATE INDEX IDX_CPH_I3 ON SCORE.CASE_PAT_HIST LOB(note) INDEXTYPE IS CTXSYS.CONTEXT;

Index created.

Elapsed: 00:00:51.76
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SCORE','CASE_PAT_HIST');

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.20

 

Now we have to change the queries Where Clause in order to query it with the CONTAINS operator :

SQL> SELECT * from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 0;

170 rows selected.

Elapsed: 00:00:00.82

Execution Plan
----------------------------------------------------------
Plan hash value: 768870586

--------------------------------------------------------------------------------
---------------------

| Id  | Operation                           | Name          | Rows  | Bytes | Co
st (%CPU)| Time     |

--------------------------------------------------------------------------------
---------------------

|   0 | SELECT STATEMENT                    |               |  2770 |    49M|  3
355   (1)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DWH_CODE      |     1 |    28 |
  2   (0)| 00:00:01 |

|*  2 |   INDEX SKIP SCAN                   | PK_DWH_CODE   |     1 |       |
  1   (0)| 00:00:01 |

|*  3 |  VIEW                               |               |  2770 |    49M|  3
355   (1)| 00:00:01 |

|   4 |   TABLE ACCESS BY INDEX ROWID       | CASE_PAT_HIST |  2770 |  1284K|  3
353   (1)| 00:00:01 |

|*  5 |    DOMAIN INDEX                     | IDX_CPH_I3    |       |       |
483   (0)| 00:00:01 |

--------------------------------------------------------------------------------
---------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DWH_CODE_NAME"='DWH_PIT_DATE')
       filter("DWH_CODE_NAME"='DWH_PIT_DATE')
   3 - filter("DWH_VALID_FROM"<=TO_NUMBER("PT"."DWH_PIT_DATE") AND
              "DWH_VALID_TO">TO_NUMBER("PT"."DWH_PIT_DATE"))
   5 - access("CTXSYS"."CONTAINS"("NOTE",'%FC IV%',1)>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement


Statistics
----------------------------------------------------------
         59  recursive calls
          0  db block gets
       3175  consistent gets
        417  physical reads
        176  redo size
      95406  bytes sent via SQL*Net to client
      38098  bytes received via SQL*Net from client
        342  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        170  rows processed

 

Now the query is executed in just few millisecond.

By checking the execution plan, we note an access to the table SCORE.CASE_PAT_HIST within a DOMAIN INDEX represented by our Context Index (IDX_CPH_I3).

Now let’s compare the results given by the LIKE and the CONTAINS operators:

SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE upper(note) LIKE '%FC IV%'
  2  UNION ALL
  3  SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1;

  COUNT(*)
----------
       168
       170

 

The CONTAINS clause return 2 rows in more, let’s checking :

SQL> select note from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 and case_id in (1,2);

NOTE
--------------------------------------------------------------------------------
Text Before , functional class (FC) IV
Text Before , WHO FC-IV

 

For the LIKE clause, the wildcard %FC IV% returns :

  • Text Before FC IV
  • FC IV Text After
  • Text Before FC IV Text After

For the CONTAINS clause, the wildcard %FC IV% returns :

  • Text Before FC IV
  • FC IV Text After
  • Text Before FC IV Text After
  • Text Before FC%IV
  • FC%IV Text After
  • Text Before FC%IV Text After

So, in term of functionality LIKE and CONTAINS clause are not exactly the same since the former returns less data than the last one.

If we translate the CONTAINS clause in the LIKE clause, we should write : “LIKE ‘%FC%IV%'”

For my customer case, the CONTAINS clause is correct, the business confirmed me this datas must be returned.

The CONTEXT oracle text index has some limitations :

  • You cannot using several CONTAINS through the operand OR / AND, you will face oracle error “ORA-29907: found duplicate labels in primary invocations”
  • SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%FC TZ%', 1) > 1;
    SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1 or CONTAINS(note, '%BE TZ%', 1) > 1
                                                                                               *
    ERROR at line 1:
    ORA-29907: found duplicate labels in primary invocations

 

To solve this issue, let’s rewrite the SQL through an UNION clause :

SQL> SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%FC IV%', 1) > 1
  2  UNION
  3  SELECT count(*) from v_sc_case_pat_hist pat_hist WHERE CONTAINS(note, '%BE TZ%', 1) > 1;

  COUNT(*)
----------
       170
       112

 

Conclusion :

  • The benefits of a creating Oracle Text Index (CONTEXT index in our case) include fast response time for text queries with the CONTAINS, CATSEARCH and MATCHES Oracle Text operators.We decrease the response tie from 6.18 mins to few milliseconds.
  • CATSEARCH and MATCHES are respectively the operators used for CTXCAT index and CTXRULE index I will present you in a next blog.
  • Transparent Data Encryption-enabled column does not support Oracle Text Indexes.
  • Always check if the data returned by the CONTAINS clause correspond to your business needs.

Cet article Oracle Text : Using and Indexing – the CONTEXT Index est apparu en premier sur Blog dbi services.

Always free / always up tmux in the Oracle Cloud with KSplice updates

Thu, 2020-05-14 15:35
By Franck Pachot

.
I used to have many VirtualBox VMs on my laptop. But now, most of my labs are in the Cloud. Easy access from everywhere.

GCP

There’s the Google Cloud free VM which is not limited in time (I still have the 11g XE I’ve created 2 years ago running there) being able to use 40% of CPU with 2GB of RAM:


top - 21:53:10 up 16 min,  4 users,  load average: 9.39, 4.93, 2.09
Tasks:  58 total,   2 running,  56 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.9 us,  8.2 sy,  0.0 ni, 12.6 id,  0.0 wa,  0.0 hi,  0.3 si, 66.0 st
GiB Mem :    1.949 total,    0.072 free,    0.797 used,    1.080 buff/cache
GiB Swap:    0.750 total,    0.660 free,    0.090 used.    0.855 avail Mem

This is cool, always free but I cannot ssh to it: only accessible with Cloud Shell and it may take a few minutes to start.

AWS

I also use an AWS free tier but this one is limited in time 1 year.


top - 19:56:11 up 2 days, 13:09,  2 users,  load average: 1.00, 1.00, 1.00
Tasks: 110 total,   2 running,  72 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.4 us,  0.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 87.4 st
GiB Mem :      1.0 total,      0.1 free,      0.7 used,      0.2 buff/cache
GiB Swap:      0.0 total,      0.0 free,      0.0 used.      0.2 avail Mem

A bit more CPU throtteled by the hypervisor and only 1GB of RAM. However, I can create it with a public IP and access it though SSH. This is interesting especially to run other AWS services as the AWS CLI is installed here in this Amazon Linux. But limited in time and in credits.

OCI

Not limited in time, the Oracle Cloud OCI free tier allows 2 VMs with 1GB RAM and 2 vCPUs throttled to 1/8:


top - 20:01:37 up 54 days,  6:47,  1 user,  load average: 0.91, 0.64, 0.43
Tasks: 113 total,   4 running,  58 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.8 us,  0.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.2 si, 74.8 st
KiB Mem :   994500 total,   253108 free,   363664 used,   377728 buff/cache
KiB Swap:  8388604 total,  8336892 free,    51712 used.   396424 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18666 opc       20   0  112124    748    660 R  25.2  0.1   6:20.18 yes
18927 opc       20   0  112124    700    612 R  24.8  0.1   0:06.53 yes

This is what I definitlely choose as a bastion host for my labs.

tmux

I run mainly one thing here: TMUX. I have windows and panes for all my work and it stays there in this always free VM that never stops. I just ssh to it and ‘tmux attach’ and I’m back to all my terminals opened.

Ksplice

This, always up, with ssh opened to the internet, must be secured. And as I have all my TMUX windows opened, I don’t want to reboot. No problem: I can update the kernel with the latest security patches without reboot. This is Oracle Linux and it includes Ksplice.
My VM is up since February:


[opc@b ~]$ uptime
 20:15:09 up 84 days, 15:29,  1 user,  load average: 0.16, 0.10, 0.03

The kernel is from October:


[opc@b ~]$ uname -r
4.14.35-1902.6.6.el7uek.x86_64

But I’m actually running a newer kernel:


[opc@b ~]$ sudo uptrack-show --available
Available updates:
None

Effective kernel version is 4.14.35-1902.301.1.el7uek
[opc@b ~]$

This kernel is from April. Yes, 2 months more recent than the last reboot. I have simply updated it with uptrack, which downloads and installs the latest Ksplice rebootless kernel updates:

Because there’s no time limit, no credit limit, always up, ready to ssh to it and running the latest security patches without reboot or even quiting my tmux sessions, the Oracle Autonomous Linux is the free tier compute instance I use everyday. You need to open a free trial to get it (https://www.oracle.com/cloud/free/) but it is easy to create a compute instance which is flagged ‘always free’.

Cet article Always free / always up tmux in the Oracle Cloud with KSplice updates est apparu en premier sur Blog dbi services.

Oracle Database Appliance: which storage capacity to choose?

Thu, 2020-05-14 12:14
Introduction

If you’re considering ODA for your next platform, you surely already appreciate the simplicity of the offer. 3 models with few options, this is definitely easy to choose from.

One of the other benefit is also the hardware support of 5 years, and combined with software updates generally available for up to 7 years old ODAs, you can keep your ODA running even longer for non-critical databases and/or if you have a strong Disaster Recovery solution (including Data Guard or Dbvisit standby). Some of my customers are still using X4-2s and are confident in their ODAs because it’s been quite reliable across the years.

Models and storage limits

One of the main drawback of the ODA: it doesn’t have unlimited storage. Disks are local NVMe SSDs (or in a dedicated enclosure), and it’s not possible (technically possible but not recommended) to add storage through a NAS connexion.

3 ODA models are available, X8-2S and X8-2M are one-node ODAs, and X8-2HA being a two-nodes ODA with DAS storage including SSD and/or HDD (High Performance or High Capacity version).

Please refer to my previous blog post for more information about the current generation.

Storage on ODA is always dedicated to database related files: datafiles, redologs, controlfiles, archivelogs, flashback logs, backups (if you do it locally on ODA), etc. Linux system, Oracle products (Grid Infrastructure and Oracle database engines), home folders and so on reside on internal M2 SSD disks large enough for a normal use.

X8-2S/X8-2M storage limit

ODA X8-2S is the entry level ODA. It only has one CPU, but with 16 powerfull cores and 192GB of memory it’s all but a low end server. 10 empty storage slots are available in the front pane but don’t expect to extend the storage. This ODA is delivered with 2 disks and doesn’t support adding more disks. That’s it. With the two 6.4TB disks, you’ll have a RAW capacity of 12.8TB.

ODA X8-2M is much more capable than his little brother. Physically identical to X8-2S, it has two CPUs and twice the amount of RAM. This 32-cores server fitted with 384GB of RAM is a serious player. It’s still delivered with two 6.4TB disks but unlike the S version, all the 10 empty storage slots can be populated to reach a stunning 76.8TB of RAW storage. This is still not unlimited, but the limit is actually quite high. Disks can be added by pair, so you can have 2-4-6-8-10-12 disks for various configurations and for a maximum of 76.8TB RAW capacity. Only disks dedicated for ODA are suitable, and don’t expect to put bigger disks as it only supports the same 6.4TB disks than those embedded with the base server.

RAW capacity means without redundancy, and you will loose half of the capacity with ASM redundancy. It’s not possible to run an ODA without redundancy, if you think about that. ASM redundancy is the only way to secure data, as no RAID controler is inside the server. You already know that disk capacity and real capacity always differs, so Oracle included several years ago in the documentation the usable capacity depending on your configuration. The usable capacity includes reserved space for a single disk failure (15% starting from 4 disks).

On base ODAs (X8-2S and X8-2M with 2 disks only). The usable storage capacity is actually 5.8TB and no space is reserved for disk failure. If a disk fails, there is no way to rebuild redundancy as only one disk survives.

Usable storage is not database storage, don’t miss that point. You’ll need to split this usable storage between DATA area and RECO area (actually ASM diskgroups). Most often, RECO is sized between 10% and 30% of usable storage.

Here is a table with various configurations. Note that I didn’t include ASM high redundancy configurations here, I’ll explain that later.

Nb disks Disk size TB RAW cap. TB Official cap. TB DATA ratio DATA TB RECO TB 2 6.4 12.8 5.8 90% 5.22 0.58 2 6.4 12.8 5.8 80% 4.64 1.16 2 6.4 12.8 5.8 70% 4.06 1.74 4 6.4 25.6 9.9 90% 8.91 0.99 4 6.4 25.6 9.9 80% 7.92 1.98 4 6.4 25.6 9.9 70% 6.93 2.97 6 6.4 38.4 14.8 90% 13.32 1.48 6 6.4 38.4 14.8 80% 11.84 2.96 6 6.4 38.4 14.8 70% 10.36 4.44 8 6.4 51.2 19.8 90% 17.82 1.98 8 6.4 51.2 19.8 80% 15.84 3.96 8 6.4 51.2 19.8 70% 13.86 5.94 10 6.4 64 24.7 90% 22.23 2.47 10 6.4 64 24.7 80% 19.76 4.94 10 6.4 64 24.7 70% 17.29 7.41 12 6.4 76.8 29.7 90% 26.73 2.97 12 6.4 76.8 29.7 80% 23.76 5.94 12 6.4 76.8 29.7 70% 20.79 8.91 X8-2HA storage limit

Storage is more complex on X8-2HA. If you’re looking for complete information about its storage, review the ODA documentation for all the possibilities.

Briefly, X8-HA is available in two flavors: High Performance, the one I highly recommend, or High Capacity, which is nice if you have really big databases you want to store on only one ODA. But this High Capacity version will make use of spinning disks to achieve such amount of TB. Definitely not the best solution for the performance. The 2 nodes of this ODA are empty, no disk in the front panel, just empty space. All data disks are in a separate enclosure connected on both nodes with SAS cables. Depending on your configuration, you’ll have 6 to 24 SSD (HP) or a mix of 6 SSD and 18 HDD (HC). When your first enclosure is filled with disks, you can also add another storage enclosure of the same kind to eventually double the total capacity. Usable storage starts from 17.8TB to 142.5TB for HP, and from 114.8TB to 230.6TB for HC.

Best practice for storage usage

First you should consider that ODA storage is high performance storage for high database throughput. Thus, storing backups on ODA is a nonsense. Backups are files written once and mainly dedicated to get erased without being used. Don’t loose precious TB for that. Moreover, if backups are done in the FRA, they are actually located on the same disks as DATA. It’s why most of the configuration will be done with 10% to 20% of RECO, not more. Because we definitely won’t put backups on the same disks as DATA. 10% for RECO is a minimum, I wouldn’t recommend setting less than that, Fast Recovery Area being always a problem if too small.

During deployment you’ll have to choose between NORMAL or HIGH redundancy. NORMAL is quite similar to RAID1, but at the block level and without requiring disk parity (you need 2 or more disks). HIGH is available starting from 3 disks and makes each block existing 3 times on 3 different disks. HIGH seems to be better, but you loose even more precious space, and it doesn’t protect you from other failures like disaster in your datacenter or user errors. Most of the failure protection systems embedded in the servers are actually doubling the components: power supplies, network interfaces, system disks, and so on. So increasing the security of block redundancy without increasing the security of other components is not necessary in my opinion. Real solution for increased failure protection is Data Guard or Dbvisit: 2 ODAs, in 2 different geographical regions, with databases replicated from 1 site to the other.

Estimate your storage needs for the next 5 years, and even more

Are you able to do that? It’s not that simple. Most of the time you can estimate for the next 2-3 years, but more than that is highly uncertain. Maybe a new project will start and will require much more storage? Maybe you will have to provision more databases for testing purpose? Maybe your main software will leave Oracle to go to MS SQL or PostgreSQL in 2 years? Maybe a new CTO will arrive and decide that Oracle is too expensive and will build a plan to get rid of Oracle. We never know what’s going to happen in such a long time. But at least you can provide an estimation with all the information you have now and your own margin.

Which margin should I choose?

You probably plan to monitor the free space on your ODA. Based on the classic threshold, higher than 85% of disk usage is something you should not reach. Because you may not have a solution for expanding storage. 75% is on my opinion a good space usage you shouldn’t reach on ODA. So consider 25% less usable space than available when you do your calcultations.

Get bigger to last longer

I don’t like wasting money or resources for things that don’t need to, but in that particular case, I mean on ODA, after years working on X3-2, X4-2, and newer versions, I strongly advise to choose the maximum number of extensions you could. Maybe not 76TB on an ODA X8-2M if you only need 10TB, but 50TB is definitely more secure for 5 years and more. Buying new extensions could be challenging after 3 or 4 years, because you have no guarantee that these extensions will still be available. You can live with memory or CPU contentions, but without enough disk space, it’s much more difficult. Order your ODA fully loaded to make sure no extension will be needed.

The more disk you get, the more fast and secure you are

Last but not least, having more disks on your ODA maximize the throughput: because ASM is mirroring and stripping blocks on all the disks. For sure on NVMe disks you probably won’t use all that bandwith. More disks also adds more security for your data. Loosing one disk in a 4-disk ODA requires the rebalancing of 25% of your data to the 3 safe disks, and rebalancing is not immediate. Loosing one disk in a 8-disk ODA requires the rebalancing of much less data, actually 12.5% assuming you have the same amount of data on the 2 configurations.

A simple example

You need a single-node ODA with expandable storage. So ODA X8-2M seems fine.

You have an overview of your databases growth trend and plan to double the size in 5 years. Starting from 6TB, you plan to reach 12TB at a maximum. As you are aware of the threshold you shouldn’t reach, you know that you’ll need 16TB of usable space for DATA (maximum of 75% of disk space used). You want to make sure to have enough FRA so you plan to set DATA/RECO ratio to 80%/20%. Your RECO should be set to 4TB. Your ODA disk configuration should have at least 20TB of usable disk space. A 8-disk ODA is 19.8TB of usable space, not enough. A 10-disk ODA is 24.7TB of usable space for 19.76TB of DATA and 4.94TB of RECO, 23% more than needed, a comfortable additional margin. And don’t hesitate to take a 12-disk ODA (1 more extension) if you want to secure your choice and be ready for unplanned changes.

Conclusion

Storage on ODA is quite expensive, but don’t forget that you may not find a solution for an ODA with insufficient storage. Take the time to make your calculation, keep a strong margin, and think long-term. Being long-term is definitely the purpose of an ODA.

Cet article Oracle Database Appliance: which storage capacity to choose? est apparu en premier sur Blog dbi services.

20c: AWR now stores explain plan predicates

Wed, 2020-05-13 12:00
By Franck Pachot

.
In a previous post https://blog.dbi-services.com/awr-dont-store-explain-plan-predicates/ I explained this limitation in gathering filter and access predicates by Statspack and then AWR because of old bugs about reverse parsing of predicates. Oracle listens to its customers through support (enhancement requests), though the community (votes on database ideas), and through the product managers who participate in User Groups and ACE program. And here it is: in 20c the predicates are collected by AWS and visible with DBMS_XPLAN and AWRSQRPT reports.

I’ll test with a very simple query:


set feedback on sql_id echo on pagesize 1000

SQL> select * from dual where ascii(dummy)=42;

no rows selected

SQL_ID: g4gx2zqbkjwh1

I used the “FEEDBACK ON SQL” feature to get the SQL_ID.

Because this query is fast, it will not be gathered by AWR except if I ‘color’ it:


SQL> exec dbms_workload_repository.add_colored_sql('g4gx2zqbkjwh1');

PL/SQL procedure successfully completed.

Coloring a statement is the AWR feature to use when you want to get a statement always gathered, for example when you have optimized it and want compare the statistics.

Now running the statement between two snapshots:


SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> select * from dual where ascii(dummy)=42;

no rows selected

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

Here, I’m sure it has been gathered.

Now checking the execution plan:


SQL> select * from dbms_xplan.display_awr('g4gx2zqbkjwh1');

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID g4gx2zqbkjwh1
--------------------
select * from dual where ascii(dummy)=42

Plan hash value: 272002086

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ASCII("DUMMY")=42)


18 rows selected.

Here I have the predicate. This is a silly example but the predicate information is very important when looking at a large execution plan trying to understand the cardinality estimation or the reason why an index is not used.

Of course, this is also visible from the ?/rdbms/admin/awrsqrpt report:

What if you upgrade?

AWR gathers the SQL Plan only when it is not already there. Then, when we will update to 20c only the new plans will get the predicates. Here is an example where I simulate the pre-20c behaviour with “_cursor_plan_unparse_enabled”=false:


SQL> alter session set "_cursor_plan_unparse_enabled"=false;

Session altered.

SQL> exec dbms_workload_repository.add_colored_sql('g4gx2zqbkjwh1');

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> select * from dual where ascii(dummy)=42;

no rows selected

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> select * from dbms_xplan.display_awr('g4gx2zqbkjwh1');

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID g4gx2zqbkjwh1
--------------------
select * from dual where ascii(dummy)=42

Plan hash value: 272002086

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

13 rows selected.

No predicate here. Even If I re-connect to reset the “_cursor_plan_unparse_enabled”:


SQL> connect / as sysdba
Connected.
SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> select * from dual where ascii(dummy)=42;

no rows selected

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> select * from dbms_xplan.display_awr('g4gx2zqbkjwh1');

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID g4gx2zqbkjwh1
--------------------
select * from dual where ascii(dummy)=42

Plan hash value: 272002086

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

13 rows selected.

This will be the situation after upgrade.

If you want to re-gather all sql_plans, you need to purge the AWR repository:


SQL> execute dbms_workload_repository.drop_snapshot_range(1,1e22);

PL/SQL procedure successfully completed.

SQL> execute dbms_workload_repository.purge_sql_details();

PL/SQL procedure successfully completed.

SQL> commit;

This clears everything, so I do not recommend to do that at the same time as the upgrade as you may like to compare some performance with the past. Anyway, we have time and maybe this fix will be backported in 19c.

There are very small chances that fix is ported to Statspack, but you can do it yourself as I mentioned in http://viewer.zmags.com/publication/dd9ed62b#/dd9ed62b/36 (“on Improving Statspack Experience”) with something like:


sed -i -e 's/ 0 -- should be//' -e 's/[(]2254299[)]/--&/' $ORACLE_HOME/rdbms/admin/spcpkg.sql

Cet article 20c: AWR now stores explain plan predicates est apparu en premier sur Blog dbi services.

SQL Server 2019: Copy files through SQL Server

Mon, 2020-05-11 10:28

Three new interesting extended stored procedures comes with SQL Server 2019.
I was very interested to discover these new store procedures:

  • Sys.xp_copy_file is to copy a specific files from a folder to another
    • Syntax: exec master.sys.xp_copy_file ‘source folder+file’, ‘destination folder+file’
    • Example:
      exec master.sys.xp_copy_file 'C:\Temp\Sources\File1.txt', 'C:\Temp\Destinations\File1.txt'

Before the command:

After the command:


As you can see in my test, you will have these 2 information where indicate the sucess of the query:
Commands completed successfully.
Time: 0.315s

  • Sys.xp_copy_files is to copy multiple files though the wildcard Character from a folder to another
    • Syntax: exec master.sys.xp_copy_files ‘source folder+files’, ‘destination folder’
    • Example:
      exec master.sys.xp_copy_files 'C:\Temp\Sources\File*.txt', 'c:\Temp\Destinations\'

Before the command:

After the command:

  • Sys.xp_delete_files is to delete a specific file or multiple files though the wildcard Character in a folder
      • Syntax: exec master.sys.xp_delete_files ‘source folder+files’
      • Example:
        exec master.sys.xp_delete_files 'c:\Temp\Destinations\File*.txt'

    Before the command:

  • After the command:

I go a little forward to see how it’s reacted if the file already exists or when we update the source
I copy the first time the file ‘c:\ Temp\Sources\file1.txt’ to ‘c:\ Temp\Destinations\file1.txt’
I have one line “first Line”

I had to the source a second Line “second line”
I copy again the file and result…

As you can see, the second line is in the destination.
Without warning or question, the file is overwriting.

The funny part is when you create a file manually (file6.txt) and want to delete it through SQL Server, it’s not possible.

Only the file, where SQL Server account is the owner, will be deleted…

Before SQL Server 2019, to do these operations, we use xp_cmdshell. Now with this new version, we can do easily copy operation or delete operations for files in a folder.
It’s very good and useful! A customer already asks me to implement it…
See you!

Cet article SQL Server 2019: Copy files through SQL Server est apparu en premier sur Blog dbi services.

How to install MABS (Microsoft Azure Backup Server) and use it to protect on-premise virtual machines

Mon, 2020-05-11 04:08

During my last blog-posts I tested the MARS (Microsoft Azure Recovery Service) through Windows Admin Center and how to use Azure Backup to recover files/folders or System State for protected servers.
To continue my Azure protection possibilities overview I will today installed a MABS on one of my virtual on-premise server and uses it to protect my others virtual on-premises servers.

To do so you go to your Azure Recovery Services vault. A vault is indeed a place on Azure where backup will be stored. Recovery Service vault give the possibility to manage backup server, monitor backup, manage access to those backup and specify also how the vault will be replicated.
On your Recovery Service vault just go on Getting Started, Backup and select where your workload is running, here on-premises, and what do you want to backup, here select Hyper-V Virtual Machines and Microsoft SQL Server. This selection can be changed later when you will protect your Virtual Machine. Press the Prepare infrastructure button:

Once done you have to download installation files for the MABS and also credential to connect your MABS to your Azure Recovery Server vault:

Next step is to copy downloaded files on your future MABS server and start installation.
This server should fit some minimum requirements:

  • Two cores and 8GB RAM
  • Windows server 2016 or 2019 64 bits
  • Part of a domain
  • .NET 3.5 has to be installed if you want to install SQL Server (will be required during the installation)

If requirements are fulfilled you can start the installation:

After a prerequisite check the wizard ask you if you want to install a new SQL Server 2017 instance which will be used with MABS, Reporting Service will be also installed for reporting capabilities or use an existing one. Here we will install a new one:

Installation settings will be displayed afterwards with space requirements and possibility to change files location for the MABS. A password will also be asked for the SQL Service accounts created during the SQL Server installation. A summary of the future installation is displayed and the install can start by first checking that necessary Windows Features are installed:

The next step is to connect your MABS to your Service Recovery Vault, you have to use the Vault credential file downloaded before:

Backups are encrypted for security reason, a passphrase as to be typed or generated and as to be saved in a secure location like a keypass locally or in Azure on Key Vault. Once done the server is registered with the Microsoft Azure Backup and installation can start:

At the end all should be green, I had an issue when I tried to use an existing instance of SQL Server during another installation, the problem came because my instance didn’t have SQL Client tools installed, so if you use an existing SQL instance have a look before otherwise you will have to redo the complete installation. But here installation succeeded:

After installation you should find a shortcut on your desktop to open your MABS, let’s open it and go to the Management part in order to add new on-premise servers to your MABS to back up their workload:

After you click on the Add button a wizard appears. You can select the type of server you want to protect, here Windows Servers and the deployment method you will use to deploy the agent, here agent installation:

A list of servers located in the same domain as you MABS are displayed and you can add the servers where you want to deploy the agent:

After you have to specify administrator credential for the servers where you want to install agent.  DPM (Data Protection Manager, in fact your MABS) will use those credentials to install the agent on servers. Restart method is also requested in case a restart is necessary after the agent installation. Once those steps are finished, protection agents are installed on servers:

When installation complete successfully you have you selected server in your MABS:

Those servers are marked as unprotected because for the moment you don’t create any rules to protect the workload running on those servers.
It will be the goal of my next blog-post.
See you soon!

Cet article How to install MABS (Microsoft Azure Backup Server) and use it to protect on-premise virtual machines est apparu en premier sur Blog dbi services.

Oracle Materialized View Refresh : Fast or Complete ?

Sun, 2020-05-10 13:07

In contrary of views, materialized views avoid executing the SQL query for every access by storing the result set of the query.

When a master table is modified, the related materialized view becomes stale and a refresh is necessary to have the materialized view up to date.

I will not show you the materialized view concepts, the Oracle Datawarehouse Guide is perfect for that.

I will show you, from a user real case,  all steps you have to follow to investigate and tune your materialized view refresh.

And, as very often in performance and tuning task, most of the performance issue comes from the way to write and design your SQL (here the SQL statement loading the materialized view).

 

First of all, I’m saying that spending almost 50 mins (20% of my DWH Load) to refresh materialized view is too much :

The first step is to check which materialized view has the highest refresh time :

SELECT * 
FROM (
      SELECT OWNER,
             MVIEW_NAME,
             CONTAINER_NAME,
             REFRESH_MODE,
             REFRESH_METHOD,
             LAST_REFRESH_TYPE,
             STALENESS,
             ROUND((LAST_REFRESH_END_TIME-LAST_REFRESH_DATE)*24*60,2) as REFRESH_TIME_MINS
       FROM ALL_MVIEWS
       WHERE LAST_REFRESH_TYPE IN ('FAST','COMPLETE')
      )
ORDER BY REFRESH_TIME_MINS DESC;

OWNER   MVIEW_NAME                      CONTAINER_NAME                  REFRESH_MODE  REFRESH_METHOD LAST_REFRESH_TYPE STALENESS      REFRESH_TIME_MINS
------- ------------------------------- ------------------------------- ------------- -------------- ----------------- --------------------------------
SCORE   MV$SCORE_ST_SI_MESSAGE_HISTORY   MV$SCORE_ST_SI_MESSAGE_HISTORY DEMAND        FAST           FAST              FRESH          32.52
SCORE   MV$SCORE_ST_SI_MESSAGE           MV$SCORE_ST_SI_MESSAGE         DEMAND        FAST           FAST              FRESH          16.38
SCORE   MV$SC1_MYHIST2_STOP              MV$SC1_MYHIST2_STOP            DEMAND        FORCE          COMPLETE          NEEDS_COMPILE  .03
SCORE   MV$SC1_MYHIST2_START             MV$SC1_MYHIST2_START           DEMAND        FORCE          COMPLETE          NEEDS_COMPILE  .03
SCORE   MV$SC1_RWQ_FG_TOPO               MV$SC1_RWQ_FG_TOPO             DEMAND        FORCE          COMPLETE          NEEDS_COMPILE  .02

All the refresh time comes from the mview  : MV$SCORE_ST_SI_MESSAGE_HISTORY and MV$SCORE_ST_SI_MESSAGE.

Thanks to columns ALL_MVIEWS.LAST_REFRESH_DATE and ALL_MVIEWS.LAST_REFRESH_END_TIME, we got the sql statements and the executions plans related to the refresh operation :

The first operation is a “Delete“:

The second operation is an “Insert“:

Let’s extract the PL/SQL procedure doing the refresh used by the ETL tool :

dbms_mview.refresh('SCORE.'||l_mview||'','?',atomic_refresh=>FALSE);
--'?' = Force : If possible, a fast refresh is attempted, otherwise a complete refresh.

Being given that, here all questions which come to me :

  1. My materialized view can be fast-refreshed, so why it takes more than 48 mins to refresh ?
  2. With atomic_refresh set to false, oracle normally optimize refresh by using parallel DML and truncate DDL, so why a “Delete” operation is done instead a “Truncate” more faster ?

To answer to the first point, to be sure that my materialized view can be fast refresh, we can also use explain_mview procedure and check the capability_name called “REFRESH_FAST”:

SQL> truncate table mv_capabilities_table;

Table truncated.

SQL> exec dbms_mview.explain_mview('MV$SCORE_ST_SI_MESSAGE_HISTORY');

PL/SQL procedure successfully completed.

SQL> select capability_name,possible,related_text,msgtxt from mv_capabilities_table;

CAPABILITY_NAME                POSSIBLE             RELATED_TEXT            MSGTXT
------------------------------ -------------------- ----------------------- ----------------------------------------------------------------------------
PCT                            N
REFRESH_COMPLETE               Y
REFRESH_FAST                   Y
REWRITE                        Y
PCT_TABLE                      N                    ST_SI_MESSAGE_HISTORY_H relation is not a partitioned table
PCT_TABLE                      N                    ST_SI_MESSAGE_HISTORY_V relation is not a partitioned table
PCT_TABLE                      N                    DWH_CODE                relation is not a partitioned table
REFRESH_FAST_AFTER_INSERT      Y
REFRESH_FAST_AFTER_ONETAB_DML  Y
REFRESH_FAST_AFTER_ANY_DML     Y
REFRESH_FAST_PCT               N											PCT is not possible on any of the detail tables in the materialized view
REWRITE_FULL_TEXT_MATCH        Y
REWRITE_PARTIAL_TEXT_MATCH     Y
REWRITE_GENERAL                Y
REWRITE_PCT                    N											general rewrite is not possible or PCT is not possible on any of the detail tables
PCT_TABLE_REWRITE              N                    ST_SI_MESSAGE_HISTORY_H relation is not a partitioned table
PCT_TABLE_REWRITE              N                    ST_SI_MESSAGE_HISTORY_V relation is not a partitioned table
PCT_TABLE_REWRITE              N                    DWH_CODE				relation is not a partitioned table

18 rows selected.

Let’s try to force a complete refresh with atomic_refresh set to FALSE in order to check if the “Delete” operation is replaced by a “Truncate” operation:

Now we have a “Truncate“:

--c = complete refresh
 dbms_mview.refresh('SCORE.'||l_mview||'','C',atomic_refresh=>FALSE);

Plus an “Insert” :

Let’s check now the refresh time :

SELECT * 
FROM ( SELECT OWNER, 
			  MVIEW_NAME, 
			  CONTAINER_NAME, 
			  REFRESH_MODE, 
			  LAST_REFRESH_TYPE, 
			  STALENESS, 
			  round((LAST_REFRESH_END_TIME-LAST_REFRESH_DATE)*24*60,2) as REFRESH_TIME_MINS 
	   FROM ALL_MVIEWS 
	   WHERE LAST_REFRESH_TYPE IN ('FAST','COMPLETE')
	 ) 
ORDER BY REFRESH_TIME_MINS DESC;

OWNER   MVIEW_NAME                       CONTAINER_NAME                   REFRESH_MODE LAST_REFRESH_TYPE STALENESS           REFRESH_TIME_MINS
------- -------------------------------- -------------------------------- ------------ ----------------- -------------------------------------
SCORE   MV$SCORE_ST_SI_MESSAGE           MV$SCORE_ST_SI_MESSAGE           FAST         COMPLETE 	 FRESH                            6.75
SCORE   MV$SCORE_ST_SI_MESSAGE_HISTORY   MV$SCORE_ST_SI_MESSAGE_HISTORY   FAST         COMPLETE 	 FRESH                               1

 

Conclusion (for my environment) :

  • The “Complete” refresh (7.75 mins) is more faster than the “Fast” refresh (48.9 mins),
  • The parameter “atomic_refresh=FALSE” works only with “complete” refresh, so “truncate” is only possible with “complete“.
  • It’s not a surprise to have “Complete” more faster than “Fast” since the materialized views are truncated instead of being deleted.

Now, I want to understand why “Fast refresh” is very long (48.9 mins).

In order to be fast refreshed, materialized view requires materialized view logs storing the modifications propagated from the base tables to the container tables (regular table with same name as materialized view which stores the results set returned by the query).

Let’s check the base tables used into the SQL statement loading the materialized view :

Be focus on the table names after the clause “FROM“:

  • ST_SI_MESSAGE_HISTORY_H
  • ST_SI_MESSAGE_HISTORY_V
  • DWH_CODE

Let’s check the number of rows which exist on each tables sources :

SQL> SELECT 'DWH_CODE' as TABLE_NAME,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME = 'DWH_CODE'
  2  UNION ALL
  3  SELECT 'ST_SI_MESSAGE_HISTORY_H' as TABLE_NAME,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME = 'ST_SI_MESSAGE_HISTORY_H'
  4  UNION ALL
  5  SELECT 'ST_SI_MESSAGE_HISTORY_V' as TABLE_NAME,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME = 'ST_SI_MESSAGE_HISTORY_V';

TABLE_NAME                NUM_ROWS
----------------------- ----------
DWH_CODE                         1
ST_SI_MESSAGE_HISTORY_H    4801733
ST_SI_MESSAGE_HISTORY_V    5081578

 

To be fast refreshed, the MV$SCORE_ST_SI_MESSAGE_HISTORY materialized view requires materialized logs on the ST_SI_MESSAGE_HISTORY_H, ST_SI_MESSAGE_HISTORY_V and DWH_CODE tables:

SQL> SELECT LOG_OWNER,MASTER,LOG_TABLE
  2  FROM all_mview_logs
  3  WHERE MASTER IN ('DWH_CODE','ST_SI_MESSAGE_H','ST_SI_MESSAGE_V');

LOG_OWNER   MASTER                 LOG_TABLE
----------- ------------------ ----------------------------
SCORE       ST_SI_MESSAGE_V        MLOG$_ST_SI_MESSAGE_V
SCORE       ST_SI_MESSAGE_H        MLOG$_ST_SI_MESSAGE_H
SCORE       DWH_CODE               MLOG$_DWH_CODE

 

As, the materialized view logs contains only the modifications during a fast refresh, let’s check the contents (number of rows modified coming from the base tables) just before to execute the fast-refresh :

SQL> SELECT
  2              owner,
  3              mview_name,
  4              container_name,
  5              refresh_mode,
  6              last_refresh_type,
  7              staleness
  8          FROM
  9              all_mviews
 10          WHERE
 11              last_refresh_type IN (
 12                  'FAST',
 13                  'COMPLETE'
 14              )
 15  ;

OWNER   MVIEW_NAME                     CONTAINER_NAME                 REFRESH_MODE LAST_REFRESH_TYPE STALENESS
------- ------------------------------ ---------------------------------------------------------------------------
SCORE   MV$SCORE_ST_SI_MESSAGE         MV$SCORE_ST_SI_MESSAGE         DEMAND       COMPLETE          NEEDS_COMPILE
SCORE   MV$SCORE_ST_SI_MESSAGE_HISTORY MV$SCORE_ST_SI_MESSAGE_HISTORY DEMAND       COMPLETE 	     NEEDS_COMPILE

 

STALENESS = NEEDS_COMPILE means the materialized view need to be refreshed because base tables have been modified. It’s normal since we have stopped the ETL process just before the execution of the refresh mview procedure in order to see the content of the mview logs.

The contents of materialized view logs are :

SQL> SELECT * FROM "SCORE"."MLOG$_DWH_CODE";

M_ROW$$               SNAPTIME$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$  XID$$
--------------------- --------- --------- --------- ---------------- ----------------
AAAbvUAAUAAABtjAAA    01-JAN-00 U 		  U 		02   1125921382632021
AAAbvUAAUAAABtjAAA    01-JAN-00 U 		  N 		02   1125921382632021

SQL> SELECT * FROM "SCORE"."MLOG$_ST_SI_MESSAGE_V";

no rows selected

SQL> SELECT * FROM "SCORE"."MLOG$_ST_SI_MESSAGE_H";
no rows selected 
SQL>

I’m a little bit surprised because:

  • Being given my refresh time, I expected to have a lot of modifications coming from the big tables : ST_SI_MESSAGE_V (5081578 rows) and ST_SI_MESSAGE_H (4801733 rows) instead of DWH_CODE (1 row).

After analyzing the ETL process, it appears that only this table (DWH_CODE) is modified every day with the sysdate. This table is a metadata table which contents only one row identifying the loading date.

If we check the SQL statement loading the materialized view, this table is used to populate the column DWH_PIT_DATE (see print screen above).

But since this table is joined with ST_SI_MESSAGE_H and ST_SI_MESSAGE_V, the oracle optimizer must do a full scan on the materialized view MV$SCORE_ST_SI_MESSAGE_HISTORY (more than 500K rows) to populate each row with exactly the same value:

SQL> select distinct dwh_pit_date from score.mv$score_st_si_message_history;

DWH_PIT_DATE
------------
09-MAY-20

There is no sense to have a column having always the same value, here we have definitely a materialized view design problem.Whatever the refresh mode using : “Complete” or “Fast”, we always scan all the materialized view logs to populate column DWH_PIT_DATE.

To solve this issue, let’s check the materialized view logs dependencies :

SQL> SELECT DISTINCT NAME
  2  FROM ALL_DEPENDENCIES
  3  WHERE TYPE = 'VIEW'
  4  AND REFERENCED_OWNER = 'DWH_LOAD'
  5  AND REFERENCED_NAME IN ('MV$SCORE_ST_SI_MESSAGE','MV$SCORE_ST_SI_MESSAGE_HISTORY')
  6  ORDER BY NAME;

NAME
--------------------------------------------------------------------------------
V$LOAD_CC_DMSG
V$LOAD_CC_FMSG
V$LOAD_CC_MSG_ACK
V$LOAD_CC_MSG_BOOKEDIN_BY
V$LOAD_CC_MSG_PUSHEDBACK
V$LOAD_CC_MSG_SCENARIO
V$LOAD_CC_MSG_SOURCE
V$LOAD_CC_MSG_SRC
V$LOAD_CC_MSG_TRIAGED_BY

9 rows selected.

SQL>

In my environment, only this objects (oracle views) use the materialized views, so I can safely remove the column DWH_CODE.DWH_PIT_DATE (the column not the join with the table DWH_CODE) from the materialized views and move it to the dependent objects.

After this design modifications, let’s execute the refresh and check the refresh time :

SQL> SELECT *
  2  FROM ( SELECT OWNER,
  3    MVIEW_NAME,
  4    CONTAINER_NAME,
  5    REFRESH_MODE,
  6    REFRESH_METHOD,
  7    LAST_REFRESH_TYPE,
  8    STALENESS,
  9    ROUND((LAST_REFRESH_END_TIME-LAST_REFRESH_DATE)*24*60,2) as REFRESH_TIME_MINS
 10     FROM ALL_MVIEWS
 11     WHERE LAST_REFRESH_TYPE IN ('FAST','COMPLETE')
 12   )
 13  ORDER BY REFRESH_TIME_MINS DESC;

OWNER     MVIEW_NAME                            CONTAINER_NAME                       REFRES REFRESH_ LAST_REF STALENESS           REFRESH_TIME_MINS
--------- --------------------------------- ------------------------------------ ------ -------- -------- ------------------- ---------------------
SCORE     MV$SCORE_ST_SI_MESSAGE                MV$SCORE_ST_SI_MESSAGE               DEMAND FAST     COMPLETE FRESH                            1.58
SCORE     MV$SCORE_ST_SI_MESSAGE_HISTORY        MV$SCORE_ST_SI_MESSAGE_HISTORY       DEMAND FAST     COMPLETE FRESH                             .28

 

The refresh time is faster (1.86 mins) than the last one (7.75 mins) and now oracle optimizer does not full scan the materialized view to populate each row with same value (DWH_CODE.DWH_PIT_DATE).

Conclusion :

  • We have reduced the refresh time from 50mins to 1.86 mins.
  • Fast Refresh is not always more faster than Complete Refresh, it depends of the SQL statement loading the view and the number of rows propagated from the base tables to the container tables within the materialized view logs.
  • To decrease the refresh time, act only on the refresh option (Fast, Complete, Index,etc.) is not enough, we have to also analyze and modify the SQL statement loading the materialized view.
  •  If you have design problem, never be afraid to modify the SQL statement and even some part of your architecture (like here the dependent objects). Of course you have to know very well the impact on your application and on your ETL process.

Cet article Oracle Materialized View Refresh : Fast or Complete ? est apparu en premier sur Blog dbi services.

APEX Connect 2020 – Day 2

Sun, 2020-05-10 05:16

For the second and last virtual conference day, I decided to attend presentations on following topics:
– Universal Theme new features
– Oracle APEX Source Code Management and Release Lifecycle
– Why Google Hates My APEX App
– We ain’t got no time! Pragmatic testing with utPLSQL
– Why APEX developers should know FLASHBACK
– ORDS – Behind the scenes … and more!
and the day ended with a keynote from Kellyn Pot’Vin-Gorman about “Becoming – A Technical Leadership Story”

Universal Theme new features

What is the Universal Theme (UT)?
The user interface of APEX integrated since APEX version 5.0 also known as Theme 42 (“Answer to the Ultimate Question of Life, the Universe, and Everything” – The Hitchhiker’s Guide to the Galaxy by Douglas Adams)
New features introduced with UT:
– Template options
– Font APEX
– Full modal dialog page
– Responsive design
– Mobile support
With APEX 20.1 released in April, a new version 1.5 of the UT comes. With that new version different other components related to it like JQuery libraries, OracleJET, Font APEX, … have changed, so check the release notes.
One of the most relevant new features is Mega Menu introducing a new navigation useful if you need to maximize the display of your application pages. You can check the UT sample app embedded on APEX to test it.
Some other changes are:
– Theme Roller enhancement
– Application Builder Redwood UI
– Interactive Grid with Control Break editing
– Friendly URL
Note also that Font Awesome is no longer natively supported (since APEX 19.2) so consider moving to Font APEX.
You can find more about UT online with the dedicated page.

Oracle APEX Source Code Management and Release Lifecycle

Source code management with APEX is always a challenging question for developers used to work with other programming languages and source code version control systems like GitHub.
There are different aspects to be considered like:
– 1 central instance for all developers or 1 local instance for each developer?
– Export full application or export pages individually?
– How to best automate application exports?
There is no universal answers to them. This must be considered based on the size of the development team and the size of the project.
There are different tools provided by APEX to manage export of the applications:
– ApexExport java classes
– Page UI
– APEX_EXPORT package
– SQLcl
But you need to be careful about workspace and application IDs when you run multiple instances.
Don’t forget that merge changes are not supported in APEX!
You should have a look into the Oracle apex life cycle management White Paper for further insight.

Why Google Hates My APEX App

When publishing a public web site Google provides different tools to help you getting more out of it with tools based on:
– Statistics
– Promotion
– Search
– Adds (to get money back)
When checking Google Analytics for the statistics of an APEX application, you realize that the outcome doesn’t really reflect the content of the APEX application, specially in terms of pages. This is mainly due to the way APEX manages page parameter in the f?p= procedure call. That call is much different than the standards URLs where parameters are given by “&” (which Goggle tools are looking for) and not “:”.
LEt’s hope this is going to improve with the new Friendly URL feature introduced by APEX 20.1.

We ain’t got no time! Pragmatic testing with utPLSQL

Unit testing should be considered right from the beginning while developing new PL/SQL packages.
utPLSQL is an open source PL/SQL package that can help to unit test your code. Tests created as part of the development process deliver value during implementation.
What are the criteria of choice for test automation?
– Risk
– Value
– Cost efficiency
– Change probability
Unit testing can be integrated into test automation which is of great value in the validation of your application.
If you want to know more about test automation you can visit the page of Angie Jones.

Why APEX developers should know FLASHBACK

For most people Flashback is an emergency procedure, but it’s much more in fact!
APEX developers know about flashback thanks to the restore as of functionality on pages in the app Builder.
Flashback is provided at different levels:

  1. Flashback query: allows to restore data associated to a specific query based on the SCN. This can be useful for unit testing.
  2. Flashback session: allows to flashback all queries of the session. By default up to 900 seconds in the past (undo retention parameter).
  3. Flashback transaction: allows to rollback committed transaction thanks to transaction ID (XID) with dbms_transaction.undo_transaction
  4. Flashback drop: allows to recover dropped objects thanks to the user recycle bin. Deleted objects are kept until space is free (advice: keep 20% of free sapce). BEWARE! this is not working for truncated objects.
  5. Flashback table: allows to recover a table to a given point in time. Only applicable for data and cannot help in case of DDL or drop.
  6. Flashback database: allows to restore the database to a given point in time based on restore points. This is only for DBAs. This can be useful to rollback an APEX application deployment as a lot of objects are changed. As it works with pluggable databases it can be used to produce copies to be distributed to individual XE instances for multiple developers.
  7. Data archive: allows to recover based on audit history. It’s secure and efficient and can be imported from existing application audits. It’s now FREE (unless using compression option).

The different flashback option can be used to rollback mistakes, but not only. They can also be used for unit testing or reproducing issues. Nevertheless you should always be careful when using commands like DROP and even more TRUNCATE.

ORDS – Behind the scenes … and more!

ORDS provides multiple functionalities:
– RESTful services for the DB
– Web Listener for APEX
– Web Client for the DB
– DB Management REST API
– Mongo style API for the DB
Regarding APEX Web Listener, EPG and mod_plsql are deprecated so ORDS is the only option for the future.
ORDS integrates into different architectures allowing to provide isolation like:
– APEX application isolation
– REST isolation
– LB whitelists
With APEX there are 2 options to use RESTful services:
– Auto REST
– ORDS RESTful services
developers can choose the best suited one according to their needs.
The most powerful feature is REST enabled SQL.

Becoming – A Technical Leadership Story

Being a leader is defined by different streams:
-Influencing others
-Leadership satisfaction
-Technical leadership
and more…
A couple of thoughts to be kept:
– Leaders are not always managers.
– Mentors are really important because they talk to you not about you like sponsors.
– Communication is more than speaking
But what is most important on my point of you is caring about others, how about you?

Thanks to virtualization of the conference all the presentations have been recorded, so keep tuned on DOAG and you will be able to see those and much more! So take some time and watch as much as possible because everything is precious learning. Thanks a lot to the community.
Keep sharing and enjoy APEX!

Cet article APEX Connect 2020 – Day 2 est apparu en premier sur Blog dbi services.

Handle DB-Links after Cloning an Oracle Database

Sat, 2020-05-09 18:45
By Clemens Bleile

After cloning e.g. a production database into a database for development or testing purposes, the DBA has to make sure that no activities in the cloned database have an impact on data in other production databases. Because after cloning production data jobs may still try to modify data through e.g. db-links. I.e. scheduled database jobs must not start in the cloned DB and applications connecting to the cloned database must not modify remote production data. Most people are aware of this issue and a first measure is to start the cloned database with the DB-parameter

job_queue_processes=0

That ensures that no database job will start in the cloned database. However, before enabling scheduler jobs again, you have to make sure that no remote production data is modified. Remote data is usually accessed through db-links. So the second step is to handle the db-links in the cloned DB.

In a recent project we decided to be strict and drop all database links in the cloned database.
REMARK: Testers and/or developers should create the needed db-links later again pointing to non-production data.
But how to do that, because private DB-Links can only be dropped by the owner of the db-link? I.e. even a connection with SYSDBA-rights cannot drop private database links:

sys@orcl@orcl> connect / as sysdba
Connected.
sys@orcl@orcl> select db_link from dba_db_links where owner='CBLEILE';
 
DB_LINK
--------------------------------
CBLEILE_DB1
PDB1
 
sys@orcl@orcl> drop database link cbleile.cbleile_db1;
drop database link cbleile.cbleile_db1
                   *
ERROR at line 1:
ORA-02024: database link not found
 
sys@orcl@orcl> alter session set current_schema=cbleile;
 
Session altered.
 
sys@orcl@orcl> drop database link cbleile_db1;
drop database link cbleile_db1
*
ERROR at line 1:
ORA-01031: insufficient privileges

We’ll see later on how to drop the db-links. Before doing that we make a backup of the db-links. That can be achieved with expdp:

Backup of db-links with expdp:

1.) create a directory to store the dump-file:

create directory prod_db_links as '<directory-path>';

2.) create the param-file expdp_db_links.param with the following content:

full=y
INCLUDE=DB_LINK:"IN(SELECT db_link FROM dba_db_links)"

3.) expdp all DB-Links

expdp dumpfile=prod_db_links.dmp logfile=prod_db_links.log directory=prod_db_links parfile=expdp_db_links.param
Username: <user with DATAPUMP_EXP_FULL_DATABASE right>

REMARK: Private db-links owned by SYS are not exported by the command above. But SYS must not own user-objects anyway.

In case the DB-Links have to be restored you can do the following:

impdp dumpfile=prod_db_links.dmp logfile=prod_db_links_imp.log directory=prod_db_links
Username: <user with DATAPUMP_IMP_FULL_DATABASE right>

You may also create a script prod_db_links.sql with all ddl (passwords are not visible in the created script):

impdp dumpfile=prod_db_links.dmp directory=prod_db_links sqlfile=prod_db_links.sql
Username: <user with DATAPUMP_IMP_FULL_DATABASE right>

Finally drop the directory again:

drop directory prod_db_links;

Now, that we have a backup we can drop all db-links. As mentioned earlier, private db-links cannot be dropped, but you can use the following method to drop them:

As procedures are running with definer rights by default, we can create a procedure under the owner of the db-link and in the procedure drop the dblink. SYS has the privileges to execute the procedure. The following example will drop the db-link cbleile.cbleile_db1:

select db_link from dba_db_links where owner='CBLEILE';
 
DB_LINK
--------------------------------
CBLEILE_DB1
PDB1

create or replace procedure CBLEILE.drop_DB_LINK as begin
execute immediate 'drop database link CBLEILE_DB1';
end;
/
 
exec CBLEILE.drop_DB_LINK;
 
select db_link from dba_db_links where owner='CBLEILE';
 
DB_LINK
--------------------------------
PDB1

I.e. the db-link CBLEILE_DB1 has been dropped.
REMARK: Using a proxy-user would also be a possibility to connect as the owner of the db-link, but that cannot be automated in a script that easily.

As we have a method to drop private db-links we can go ahead and automate creating the drop db-link commands with the following sql-script drop_all_db_links.sql:

set lines 200 pages 999 trimspool on heading off feed off verify off
set serveroutput on size unlimited
column dt new_val X
select to_char(sysdate,'yyyymmdd_hh24miss') dt from dual;
spool drop_db_links_&&X..sql
select 'set echo on feed on verify on heading on' from dual;
select 'spool drop_db_links_&&X..log' from dual;
select 'select count(*) from dba_objects where status='''||'INVALID'||''''||';' from dual;
REM Generate all commands to drop public db-links
select 'drop public database link '||db_link||';' from dba_db_links where owner='PUBLIC';
REM Generate all commands to drop db-links owned by SYS (except SYS_HUB, which is oracle maintained)
select 'drop database link '||db_link||';' from dba_db_links where owner='SYS' and db_link not like 'SYS_HUB%';
PROMPT
REM Generate create procedure commands to drop private db-link, generate the execute and the drop of it.
declare
   current_owner varchar2(32);
begin
   for o in (select distinct owner from dba_db_links where owner not in ('PUBLIC','SYS')) loop
      dbms_output.put_line('create or replace procedure '||o.owner||'.drop_DB_LINK as begin');
      for i in (select db_link from dba_db_links where owner=o.owner) loop
         dbms_output.put_line('execute immediate '''||'drop database link '||i.db_link||''''||';');
      end loop;
      dbms_output.put_line('end;');
      dbms_output.put_line('/');
      dbms_output.put_line('exec '||o.owner||'.drop_DB_LINK;');
      dbms_output.put_line('drop procedure '||o.owner||'.drop_DB_LINK;');
      dbms_output.put_line('-- Seperator -- ');
   end loop;
end;
/
select 'select count(*) from dba_objects where status='''||'INVALID'||''''||';' from dual;
select 'set echo off' from dual;
select 'spool off' from dual;
spool off
 
PROMPT
PROMPT A script drop_db_links_&&X..sql has been created. Check it and then run it to drop all DB-Links.
PROMPT

Running above script generates a sql-script drop_db_links_<yyyymmdd_hh24miss>.sql, which contains all drop db-link commands.

sys@orcl@orcl> @drop_all_db_links
...
A script drop_db_links_20200509_234906.sql has been created. Check it and then run it to drop all DB-Links.
 
sys@orcl@orcl> !cat drop_db_links_20200509_234906.sql
 
set echo on feed on verify on heading on
 
spool drop_db_links_20200509_234906.log
 
select count(*) from dba_objects where status='INVALID';
 
drop public database link DB1;
drop public database link PDB2;
 
create or replace procedure CBLEILE.drop_DB_LINK as begin
execute immediate 'drop database link CBLEILE_DB1';
execute immediate 'drop database link PDB1';
end;
/
exec CBLEILE.drop_DB_LINK;
drop procedure CBLEILE.drop_DB_LINK;
-- Seperator --
create or replace procedure CBLEILE1.drop_DB_LINK as begin
execute immediate 'drop database link PDB3';
end;
/
exec CBLEILE1.drop_DB_LINK;
drop procedure CBLEILE1.drop_DB_LINK;
-- Seperator --
 
select count(*) from dba_objects where status='INVALID';
 
set echo off
 
spool off
 
sys@orcl@orcl>

After checking the file drop_db_links_20200509_234906.sql I can run it:

sys@orcl@orcl> @drop_db_links_20200509_234906.sql
sys@orcl@orcl> 
sys@orcl@orcl> spool drop_db_links_20200509_234906.log
sys@orcl@orcl> 
sys@orcl@orcl> select count(*) from dba_objects where status='INVALID';
 
  COUNT(*)
----------
   1
 
1 row selected.
 
sys@orcl@orcl> 
sys@orcl@orcl> drop public database link DB1;
 
Database link dropped.
 
sys@orcl@orcl> drop public database link PDB2;
 
Database link dropped.
 
sys@orcl@orcl> 
sys@orcl@orcl> create or replace procedure CBLEILE.drop_DB_LINK as begin
  2  execute immediate 'drop database link CBLEILE_DB1';
  3  execute immediate 'drop database link PDB1';
  4  end;
  5  /
 
Procedure created.
 
sys@orcl@orcl> exec CBLEILE.drop_DB_LINK;
 
PL/SQL procedure successfully completed.
 
sys@orcl@orcl> drop procedure CBLEILE.drop_DB_LINK;
 
Procedure dropped.
 
sys@orcl@orcl> -- Seperator --
sys@orcl@orcl> create or replace procedure CBLEILE1.drop_DB_LINK as begin
  2  execute immediate 'drop database link PDB3';
  3  end;
  4  /
 
Procedure created.
 
sys@orcl@orcl> exec CBLEILE1.drop_DB_LINK;
 
PL/SQL procedure successfully completed.
 
sys@orcl@orcl> drop procedure CBLEILE1.drop_DB_LINK;
 
Procedure dropped.
 
sys@orcl@orcl> -- Seperator --
sys@orcl@orcl> 
sys@orcl@orcl> select count(*) from dba_objects where status='INVALID';
 
  COUNT(*)
----------
   1
 
1 row selected.
 
sys@orcl@orcl> 
sys@orcl@orcl> set echo off
sys@orcl@orcl> 
sys@orcl@orcl> select owner, db_link from dba_db_links;

OWNER				 DB_LINK
-------------------------------- --------------------------------
SYS				 SYS_HUB

1 row selected.

A log-file drop_db_links_20200509_234906.log has been produced as well.

After dropping all db-links you may do the following checks as well before releasing the cloned database for the testers or the developers:

  • disable all jobs owned by not Oracle maintained users. You may use the following SQL to generate the commands in sqlplus:

select 'exec dbms_scheduler.disable('||''''||owner||'.'||job_name||''''||');' from dba_scheduler_jobs where enabled='TRUE' and owner not in (select username from dba_users where oracle_maintained='Y');
  • check all directories in the DB and make sure the directory-paths do not point to shared production folders

column owner format a32
column directory_name format a32
column directory_path format a64
select owner, directory_name, directory_path from dba_directories order by 1;
  • mask sensitive data, which should not be visible to testers and/or developers.

At that point you are quite sure to not affect production data with your cloned database and you can set
job_queue_processes>0
again and provide access to the cloned database to the testers and/or developers.

Cet article Handle DB-Links after Cloning an Oracle Database est apparu en premier sur Blog dbi services.

Pages