Thursday 2 July 2020

Rac - Interview question

0. What is ORACLE RAC IP & VIRTUAL IP or VIP

Public IP:  The public IP address is for the server.  This is the same as any server IP address, a unique address with exists in /etc/hosts.
Private IP: Oracle RCA requires "private IP" addresses to manage the CRS, the clusterware heartbeat process and the cache fusion layer.
Virtual IP:  Oracle uses a Virtual IP (VIP) for database access.  The VIP must be on the same subnet as the public IP address.  The VIP is used for RAC failover (TAF).


In previous terms of RAC, We have one Public IP & One Private IP.

Public IP is for the rest of the world. In Normal case we use a protocol called Stateful protocol for Ex. TCP/IP (its like acknowledges)
Private communication between Node to Node should be very fast, In general we use a protocol called State less Protocol. For example UDP.

UDP like a SEND & FORGET.

In tnsnames. Ora

Node1 / Host1 Public IP
Node2 / Host2 Public IP.

Basically these hosts or node names be resolved by the clients.

When ever a client requires to establish a connection It looks for the first resolvable IP in Network file and establishes a connection.
The traditional setup before 10gR2, which uses a TCP/Ip protocol to get connected the client to the node.

TCP/IP basically is a state full protocal and should have a certain number of re-tries or time out required, before the client recognizes that node gone down.
Suppose if the time out we set is 600s, the client keep tries to establish a connection. After 600 seconds the client will get an error infor something like below.

Can not resolve the hostname or tns lost contact.

Then only client will fail over to the next available node.

==========

Since this is unacceptable for certain kind of high transaction environments.

So the Oracle comes with a better solution introduced a concept called a VIP.

VIP, A Virtual IP is nothing but another IP which runs on same interface eth0 as your Public IP.

This VIP is available on all nodes like your each node individual. Your listener is aware of both Public IP & vip.

It listens to public IP & VIP. Incase of a fail over the vip of Node-1 shifted to Node# 2.

The trouble is as soon as the VIP shifted to Node# 2, it changes the mac address which is appended of each VIP of the network interfaces.

When you start eth0 vip on Node#1 eth0 of node#1 address will be appended to vip1
When vip1 switches to node2 the mac address of vip1 going to be mac address of the interface card of Node#2, which means a new mac address is initiaed.

This changed mac address is immediately going to ask node#2, something is called re-arg, re-arg is protocol address resolution protocol
It means the node#2 immediately broad cast the new mac address to the all connected client.

All connected clients when they get the notification that the new vip have new mac address, they immediately Marked as invalid all connected sessions.

They imm’ly connected to vip on the other node, but the listener running on other node will not listen to VIP1,
It only listen to the PublicIP and VIP2. Once the vip1 is discarded by the listener running on Node#2, The client
Reads the second VIP address from tnsnames and connects to VIP#2.

THIS WHOLE THING COMPLETED IN 20 secs time

*******************************************************************************************************************************************************
1. Remote Listener Vs Local Listener:
====================================

Let's explain with an example...

suppose we have 2-node cluster: host1 and host2, with VIP address host1-vip and host2-vip respectively.
and one RAC database (orcl) running on this cluster; instace 1 (orcl1) on host1, and instance 2 (orcl2) on host2

we have listener_host1 running on host1, and listener_host2 running on host2.

listener_host1 is considered local listener for orcl1 instance, while listener_host2 is considered remote listener for that same orcl1 instance (because the listener in not running on the same machine as the database instance).

similarly, listener_host2 is considered local listener for orcl2 instance, and considered as remote listener for orcl1.

to make this consideration a real configuration, we configure the 2 parameters local_listener and remote_listener for both instances as below:

orcl1.local_listener=(address of listener_host1)
orcl1.remote_listener=(addresses of both listener_host1 and listener_host2)

orcl2.local_listener=(address of listener_host2)
orcl2.remote_listener=(addresses of both listener_host1 and listener_host2)

(as you see, we can simply use both listeners for the remote listener, as a simple configuration. But of course you could have configured orcl1.remote_listener=(addres of listener_host2) only.)

with such configuration, both listeners in the cluster knows about both instances, and about both hosts (statistics about host load, and instance load). and can make decision about forwarding a client connection request to the other node if it's less loaded. Which is the mechanism behind server-side load balancing.

clients are generally configured with tnsnames with both VIP addresses of the 2 hosts (i.e. can connect to either listener). so if a client attempts the connection to the database with the first IP (which is listener_host1), and suppose host1 is a bit loaded that host2, in such case the listener_host1 knows there is another instance orcl2 running on host2 that's less loaded. in such case, listener_host1 sends a redirect packet to the client asking him to transparently reconnect to listener_host2 to establish the database connection.

without such configuration of remote listener, each listener knows only about the local instance, and have nothing to do but connecting the client to the instance running on the same host as the listener. in such case you have only what's called client-side load balancing.

************************************************************************************************************************************************************8

2. WHAT IS SCAN LISTENER AND HOW IT WORKS IN ORACLE RAC 11GR2 ?

https://rajeevjhaoracle.wordpress.com/2013/11/15/what-is-scan-listener-and-how-it-works-in-oracle-rac-11gr2/


Terminologies:

Let us understand the components related to SCAN individually and then relate them together to understand SCAN in COMPLETE.

SCAN VIP:
==========

SCAN VIP is one of the resources you find in the output of “crsctl status resource –t” command. Number of SCAN VIP’s you notice will be the same as the number of SCAN LISTENERS in the setup.

SCAN VIP’s are physical IP addresses that you allocate to SCAN listeners. In the example that I use later in this blog, 192.168.122.5, 192.168.122.6, 192.168.122.7 are SCAN VIP’s. If you identify that SCAN VIP’s are online in the output of “crsctl status resource –t” command then IP addresses are online on the physical network ports. Only when SCAN VIP’s are online we can start the SCAN listeners.


SCAN LISTENER:
==============

SCAN Listener is the oracle component which starts running a service on the port (by default it’s 1521) using the SCAN VIP (IP address). So SCAN listener doesn’t start if SCAN VIP is not online. This is the major difference between a SCAN listener and SCAN VIP. The number of SCAN listeners you notice in the output will be the same as a number of SCAN VIP’s ONLINE. Name that is given to SCAN LISTENER is referred to as SCAN NAME and it is registered in DNS server. In our example which you will find next, the SCAN name is “SCAN_LISTENER”.

So in short, always SCAN LISTENER and SCAN VIP are related to each other.


SCAN components in a cluster:
SCAN Name
SCAN IPs (3)
SCAN Listeners (3)


Node VIP:
=========

Node VIP is the IP address which has the capability of relocating to the local listener of other nodes when the current node goes down. The drawback of using only node VIP is TNS entry of application server should have all node VIP addresses updated. You can identify the IP address assigned to node VIP will be a part of LOCAL listener. Test it yourself, run “lsnrctl status” on every node and you will find that local listener has two IP addresses – 1 node IP and 2 node VIP.

The challenge of having multiple VIP addresses in TNS entry is addressed in 11gR2 introducing SCAN.

For more information visit: https://www.dbatrainings.com/


3. What is Voting Disk and OCR
************************************************************************************************************************************8

Voting Disk : Voting disk is shared disk that will be accessed by all the members nodes in the cluster. The
voting disk is used as a central reference for all nodes and keeps the heartbeat information between nodes.
If any of node is unable to ping the voting disk, the cluster immediately recognizes the communication
failure and evicts the node from cluster. The voting disk is sometimes called “Quorum device “.

OCR ( Oracle Cluster Registry) :-

OCR is shared storage and it is accessed by all nodes in the cluster. CRS ( Oracle Cluster Ready Services)
uses cluster registry to keep the configuration information. OCR is central repository for the CRS and
keeps the detail of the services and status of the resources.

How do you do Voting disk & OCR backup ?

Voting Disk backup is done using dd command
dd if=/dev/raw/vote01 of=/backup/vote01.bak

OCR Backup :
Oracle automatically takes backup every 4 hrs on master node. You can also take backup using ocrconfig
export utility.


Part 2:


OCR, Voting and OLR files
=========================

Over view:


Oracle Clusterware includes two important components that manage configuration and node membership: Oracle Cluster Registry (OCR), which also includes the local component Oracle Local Registry (OLR), and voting disks.
·       OCR manages Oracle Clusterware and Oracle RAC database configuration information
·       OLR resides on every node in the cluster and manages Oracle Clusterware configuration information for each particular node
·       Voting disks manage information about node membership. Each voting disk must be accessible by all nodes in the cluster for nodes to be members of the cluster
You can store OCR and voting disks on Oracle Automatic Storage Management (Oracle ASM), or a certified cluster file system.



Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices. Oracle recommends that you use Oracle ASM to store OCR and voting disks.Oracle recommends that you configure multiple voting disks during Oracle Clusterware installation to improve availability.


OCR:


OCR contains information about all Oracle resources in the cluster.
Oracle recommends that you configure:
·       At least three OCR locations, if OCR is configured on non-mirrored or non-redundant storage. Oracle strongly recommends that you mirror OCR if the underlying storage is not RAID. Mirroring can help prevent OCR from becoming a single point of failure.
·       At least two OCR locations if OCR is configured on an Oracle ASM disk group. You should configure OCR in two independent disk groups. Typically this is the work area and the recovery area.

*************************************************************************************************************************************

4. Backing Up Oracle Cluster Registry:

Automatic backups:Oracle Clusterware automatically creates OCR backups every four hours, each full day and end of the week. Oracle Database always retains the last three backup copies of OCR.  You cannot customize the backup frequencies or the number of files that Oracle Database retains. This backups will be done by CRSD process.

Manual backups:  Use the following  command on a node to force Oracle Clusterware to perform a backup of OCR at any time.

ocrconfig -manualbackup

 The -manualbackup option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR.

NOTE: The OCRCONFIG executable is located in the $GRID_HOME/bin directory.

OCRCONFIG utility:

Use the following command to display the OCR backup files.

ocrconfig -showbackup

To check manually/auto taken OCR backups saparatly use the flag manual/auto as follow.

ocrconfig –showbackup manual
ocrconfig –showbackup auto

The default location for generating backups on Linux or UNIX systems is $GRID_HOME/cdata/cluster_name, where cluster_name is the name of your cluster.

The OCRCONFIG utility creates a log file in $GRID_HOME/log/host_name/client

OCRCHECK Utility


The OCRCHECK utility displays the version of the OCR's block format, total space available and used space, OCRID, and the OCR locations that you have configured. OCRCHECK performs a block-by-block checksum operation for all of the blocks in all of the OCRs that you have configured. It also returns an individual status for each file and a result for the overall OCR integrity check.

You can only use OCRCHECK when the Oracle Cluster Ready Services stack is ONLINE on all nodes in the cluster.

# ocrcheck

==============================================

Voting Disk:

Voting disks manage information about node membership. Each voting disk must be accessible by all nodes in the cluster for nodes to be members of the cluster.

Storing Voting Disks on Oracle ASM

Oracle ASM manages voting disks differently from other files that it stores. If you choose to store your voting disks in Oracle ASM, then Oracle ASM stores all the voting disks for the cluster in the disk group you choose.
Once you configure voting disks on Oracle ASM, you can only make changes to the voting disks' configuration using the crsctl replace votedisk command. This is true even in cases where there are no working voting disks.

Backing Up Voting Disks

In Oracle Clusterware 11g release 2 (11.2), you no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added.

Restoring Voting Disks

Run the following command as root from only one node to start the Oracle Clusterware stack in exclusive mode, which does not require voting files to be present or usable:

# crsctl start crs -excl

Run the crsctl query css votedisk  command to retrieve the list of voting files currently defined

crsctl query css votedisk

This list may be empty if all voting disks were corrupted, or may have entries that are marked as status 3 or OFF
If the voting disks are stored in Oracle ASM, then run the following command to migrate the voting disks to the Oracle ASM disk group you specify:

crsctl replace votedisk +asm_disk_group

If you did not store voting disks in Oracle ASM, then run the following command using the File Universal Identifier (FUID) obtained in the previous step:

$ crsctl delete css votedisk FUID

Add a voting disk, as follows:

$ crsctl add css votedisk path_to_voting_disk

Stop the Oracle Clusterware stack as root:

# crsctl stop crs

Restart the Oracle Clusterware stack in normal mode as root:

# crsctl start crs

======================================

OLR:

In Oracle Clusterware 11g release 2 (11.2), each node in a cluster has a local registry for node-specific resources, called an Oracle Local Registry (OLR), that is installed and configured when Oracle Clusterware installs OCR. It contains manageability information about Oracle Clusterware, including dependencies between various services. Oracle High Availability Services uses this information. OLR is located on local storage on each node in a cluster.

Its default location is in the path $GRID_HOME/cdata/host_name.olr
To Check OLR status on each node using following command.

# ocrcheck -local

If we need to see the contents of the OLR

# ocrdump -local -stdout

To backup OLR manually

# ocrconfig –local –manualbackup

TO see the contents of the backup OLR file

ocrdump -local -backupfile olr_backup_file_name

To change backup location

ocrconfig -local -backuploc new_olr_backup_path

To restore OLR follow the step

# crsctl stop crs

# ocrconfig -local -restore file_name

# ocrcheck -local

# crsctl start crs

$ cluvfy comp olr

************************************************************************************************************************************************************.

5. Oracle RAC: Difference between CRSCTL and SRVCTL

srvctl
=======
Use SRVCTL to manage Oracle supplied resources such as
Listener
Instances
Disk groups
Networks
srvctl is used to manage the elements of the cluster like databases, instances, listeners, services etc.

srvtcl was introduced with Oracle 9i.

If resource name begins with ora then use SRVCTL.

Oracle suggest DBAs to use srvctl command to start the databases.

SRVCTL manages Oracle resource related operations:
Starting and stopping database instances and services
Also from 11gR2 manages the cluster resources like network, vip, disks etc

We can use the SRVCTL to manage configuration information.  Use SRVCTL commands to (add, remove, start, stop, modify, enable, and disable a number of entities, such as Databas ES, instances, listeners, SCAN listeners, services, grid naming Service (GNS), and Oracle ASM.)


crsctl
======
Use CRSCTL for managing Oracle Clusterware and its resources.

crsctl command is used to manage the elements of the clusterware like crs, css, evm.


crsctl was introduced with Oracle 10g.

CRSCTL manages Oracle Clusterware related operations like:
Starting and stopping Oracle Clusterware
Enabling and disabling Oracle Clusterware daemons
Checking the health of the cluster
Registering cluster resources


OCRCHECK:
========

 -- Displays health of OCR (Oracle Cluster Registry) / Oracle Local Registry (OLR).

How do I identify the OCR (or OLR) file location?
# ocrcheck


***************************************************************************************************************

6. What Thread in Rac:

A thread is a set of redo log files. Each instance mounts one thread of redo logs when it starts up. When using RAC, each instance must have its own thread of redo logs.

To map an instance to a different thread, use the THREAD initialization parameter (for example: THREAD = 2).


*****************************************************************************************************************

7. Special init parameters in Oracle RAC

(which are not available in stand alone instance)


instance_number
Oracle Real Application Clusters (RAC) parameter specifies a unique number that maps the instance.

instance_group
This RAC parameter specifies one or more instance groups and assigns the current instance to those groups.

thread
This RAC init parameter specifies the number of the redo thread to be used by an instance.

cluster_database
Oracle Real Application Clusters (RAC) parameter that specifies whether or not Real Application Clusters is enabled

cluster_database_instances
Oracle Real Application Clusters (RAC) parameter that specifies the number of instances that are configured as part of your cluster database.

cluster_interconnects
This RAC parameter provides Oracle with information about additional cluster interconnects available for use in Real Application Clusters environments.

remote_listener
local_listener
parallel_instance
max_commit_propagation_delay

init parameters, in Oracle RAC, with SAME VALUE across all instances
active_instance_count
archive_lag_target
compatible
cluster_database
cluster_database_instances
cluster_interconnects     
control_files
db_block_size
db_domain
db_files
db_name
db_recovery_file_dest
db_recovery_file_dest_size
db_unique_name
dml_locks                     -- when 0
instance_type                -- rdbms or asm
max_commit_propagation_delay
parallel_max_servers
remote_login_password_file
trace_enabled
undo_management

For example,
*.cluster_database=true
*.cluster_database_instances=2
*.compatible='11.2.0.2'
*.undo_management='AUTO'

init parameters, in Oracle RAC, with Unique Values across all instances
instance_number
instance_name
thread
undo_tablespace/rollback_segments

For example,
INST1.instance_number=1
INST1.instance_name=rac1
INST1.thread=1
INST1.undo_tablespace='UNDOTBS1'

INST2.instance_number=2
INST2.instance_name=rac2
INST2.thread=2
INST2.undo_tablespace='UNDOTBS2'

No comments:

Post a Comment