April 16, 2010

Backup pieces missing from Oracle control file

I recently needed to duplicate a database to point in time more than a week earlier.  I had all of the files from the backupset available on disk.  When I ran a crosscheck, I could see they were available, but when I ran a list, the backup was only a few MB in size, rather  than a few GB, and there was only one file associated.  There were no data files listed, and when I tried to run the restore, I got the following message:

RMAN-01005: not all datafiles have backups than can be recovered to scn 9878899591331;

This was extremely frustrating, because I could see all of the files.  Fortunately, an Oracle support rep told me about the ‘catalog backuppiece’ command, which registers missing backup data in the control file.

rman target /
catalog backuppiece ‘/path/to/backup/file’;

After that I ran a list and the information for the backup was correct in size, and the datafiles were once again associated with it.

It took me a week to get there, but I was able to recover everything.  While it was a very educational week, it was rather frustrating for the people who were waiting for the database to be ready.

Point in time duplication

Have you ever needed to “get back in time”, only to discover that the Delorean wouldn’t start?  I recently had a customer that needed a schema restored from a day where we didn’t take an export.  We use RMAN to provide disaster recovery, but our SLA doesn’t require us to provide daily “I shot myself in the foot” insurance. To make matters worse, they waited over a week to ask for the restore.

The system in question uses RMAN to disk, so I restored the files from the time frame and attempted to duplicate the database to that point in time, only to get an error that there were no backups on record for that SCN.  Huh?  I could see the backups when I ran “list backup”, and when I did the crosscheck, I could see they were available.  Further inspection showed that the backup was in the catalog, but the pieces were not.  I’ll cover that in a separate post, for search purposes.  Once you have everything you need, this is the syntax that will get you a point in time clone:

  1. Create a pfile for the clone
    create pfile ‘initclone.ora’ from spfile
    If the clone is on the same host, be sure to edit it to include the path to the control, log and data files
    Add the following lines to force filename conversion during the restore
    db_file_name_convert=(/path/to/source/,/path/to/clone/)
    log_file_name_convert=(/path/to/source/,/path/to/clone/)
  2. Create the bdump, udump and cdump directories
  3. Add the clone to oratab
  4. Create a password file for the clone.  You will need one for the source database, if you don’t already have one.
    orapwd file=$ORACLE_HOME/dbs/orapwclone password=syspassword
  5. Perform a ’startup nomount’ of the clone
  6. Making sure that the clone is the current ORACLE_SID, run
    rman target sys@<source> nocatalog auxiliary /
    duplicate target database to ‘<clone>’ until time ‘<timestamp>’;

You may need to recover files from tape if all of the backup pieces aren’t on disk.  If they are on disk, but rman doesn’t recognize them, see my next post.

Once you’ve got your clone open, you can export the missing schema objects and save the day.

February 19, 2009

How the 11g DBMS install can bring down RAC…

This is only security related in that the underlying issue is permission, and for a time I thought I had created the problem by changing the permissions to secure the environment.

I upgraded my RAC environments to 11g late last year, and I have found very little about 11g that I do not love.  There is one small thing which is extremely unlovable, but once you know what is wrong, it is easy to fix.

Our process for moving to 11g was intended to be slow and methodical.  We upgraded the clusterware, and then waited a week before installing the 11g DBMS in preparation for the database upgrade.  The first sign of trouble came when I was preparing to shut things down for the database upgrade.  I did a srvctl stop nodeapps and got a VIP error on the remote node.  Alas, I was a Diva with a mission (as in DEADLINE), so I continued with my cold backup, and 11g database build.  I still didn’t make the connection when the 11g database would not start on the remote node.  Eventually my outage window, and my patience expired and I reverted to the cold backup of the 10g database.  Imagine my surprise when it wouldn’t run on the remote node, either.

I started browsing the crsd logs when the following messages caught my eye

… [ CREVT][518] CAAMonitorHandler  0:Could not execute /opt/oracle/product/clusterware/bin/rcgwrap(check) for ora.hostname.vip
category: 1234, operation: scls_canexec, loc: , OS error: 0, other: no exe permission, file /opt/oracle/product/clusterware/bin/racgwrap
… [ CRSAPP][518] CheckResource error for ora.hostname.vip error code = -1

This doesn’t look good.  I checked the permissions for racgwrap and they were set at 750.  I check the other node, and they were 751.  The other node works and the message says “other no exe permission”.  Let’s try setting the permission to 751.

This improves the situation but does not restore full functionality.  I ultimately end up doing an ls -l of the clusterware $ORACLE_HOME/bin directory on each node and find that there are 115 files with 751 permission on the functional node, and only 8 on the problem node. Still thinking that I’ve been overzealous in my effort to secure binaries, I restore the permissions and make plans to upgrade the database another evening.

When a peer had the very same problem, and I knew he hasn’t been making permission changes, I started to wonder if we have a defective root.sh.  I had another RAC pair to upgrade.  The clusterware was done, but the DBMS was not.  Throughout the DBMS install process I ran

ls -lc | grep -c — -rwxr-x–x

on the remote node.  During the “Remote operations” part of the install, the count dropped from 115 to 8.  I then ran

find . -ctime -1 -exec chmod 751 {} \;

This left me with 116 binaries, so I ran the command

ls -lc | grep -v — -rwx-rx–x

on each node and discovered that onsctl should be set to 711.

I was feeling pretty smug at this point.  This never works for me. Without fail I end up discovering I’ve got dog poop on my shoe.  In this case, the dog poop was racgwrap, which is a symbolic link that points to  $ORACLE_HOME/racg/admin/racgwrap.  Once this was set back to 751, my humility was restored and so was the functionality of the remote node.

I have since been told that this problem is documented somewhere in Metalink, but I have not been able to locate it.

While removing world execute from binaries is considered a good security practice, you really don’t want to apply it to your clusterware binaries!

October 7, 2008

Client permissions in 11g

I’m happy to report that I only had to make on adjustment to the 11g client in order to make it available to my customers.  The 751 permissions on sqlplus continue to result in

map : Permission Denied

errors by users who aren’t in the DBA group, but while this was mostly an annoyance in 10g, in 11g it resulted in a fault and a core dump.  Setting the permission to 755 cleared the problem.  Thank goodness I wrote that post back in May.  When I Googled the error, I got my own post!

June 16, 2008

Oracle 10g Client Permissions… Again!

Just when I thought I had the client permission issue put to bed, one of my customers announces that he is getting errors when trying to run oerr from the client machine.

oerr: Unknown facility ‘ora’

The oerr command depends on $ORACLE_HOME/lib/facility.lis to work correctly. The $ORACLE_HOME/lib/facility.lis file must be readable by other, and the directory must have a minimum of execute permission. In addition to the facility.lis file, to be truly functional, oerr users must also have read access to the files in $ORACLE_HOME/rdbms/mesg.

So, here is my latest take on the minimum privileges you will need to grant an Oracle 10gR2 client after a new install:

chmod 755 $ORACLE_HOME
chmod 755 $ORACLE_HOME/bin/sqlplus
chmod 755 $ORACLE_HOME/jdbc
chmod 755 $ORACLE_HOME/jdbc/lib
chmod 755 $ORACLE_HOME/ldap
chmod 755 $ORACLE_HOME/ldap/admin
chmod 644 $ORACLE_HOME/ldap/admin/*
chmod 755 $ORACLE_HOME/ldap/lib
chmod 755 $ORACLE_HOME/ldap/lib32
chmod 755 $ORACLE_HOME/ldap/mesg
chmod 644 $ORACLE_HOME/ldap/mesg/*
chmod 755 $ORACLE_HOME/lib
chmod 644 $ORACLE_HOME/lib/facility.lis
chmod 755 $ORACLE_HOME/network
chmod 755 $ORACLE_HOME/network/admin
chmod 644 $ORACLE_HOME/network/admin/sqlnet.ora
chmod 644 $ORACLE_HOME/network/admin/ldap.ora
chmod 755 $ORACLE_HOME/nls
chmod 755 $ORACLE_HOME/nls/data
chmod 644 $ORACLE_HOME/nls/data/lx1boot.nlb
chmod 755 $ORACLE_HOME/oracore
chmod 755 $ORACLE_HOME/oracore/zoneinfo
chmod 644 $ORACLE_HOME/oracore/zoneinfo/timezlrg.dat
chmod 755 $ORACLE_HOME/rdbms
chmod 755 $ORACLE_HOME/rdbms/mesg
chmod 644 $ORACLE_HOME/rdbms/mesg/*
chmod 755 $ORACLE_HOME/sqlplus
chmod 755 $ORACLE_HOME/sqlplus/mesg
chmod 644 $ORACLE_HOME/sqlplus/mesg/*

If you arrived at this post from a search engine, you might want to check for newer posts, as client permissions have been a moving target.

May 30, 2008

changePerm.sh is not your friend

After vanquishing the map error yesterday, we got to the real problem:

sqlplus user@mydb
SQL*Plus: Release 10.2.0.3.0 - Production on Fri May 30 12:53:53 2008
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

ERROR:
Error while trying to retrieve text for error ORA-12154

When I searched for this error in Metalink and Google, I found references to Pro-C, PHP and VB, but nothing related to SQLPLUS. In a moment of weakness, I decided to run changePerm.sh.

Before running it I did an ls -lR $ORACLE_HOME > before

After running it I did an ls -lR $ORACLE_HOME > after

Then I ran diff before after and discovered 750 changes had been made, including opening permissions to OPatch and other directories that really didn’t seem like they should be open to the world. Oracle did give me fair warning when I started the script.

$> ./changePerm.sh $ORACLE_HOME

——————————————————————————-
Disclaimer: The purpose of this script is to relax permissions on some of the
files in the database Oracle Home so that all clients can access them.
Please note that Oracle Corporation recommends using the most restrictive file
permissions as possible for your given implementation. Running this script
should be done only after considering all security ramifications.
——————————————————————————-

But I was feeling the pressure to “fix it”. I then spent hours changing permissions back to their original setting on most of the directories under $ORACLE_HOME, and along the way I discovered that removing access to $ORACLE_HOME/ldap brought the error back. Aha! Here is my revised list of the minimum permission needed for an Oracle client.

chmod 755 /opt/oracle
chmod 755 $ORACLE_HOME
chmod 755 $ORACLE_HOME/jdbc
chmod 755 $ORACLE_HOME/jdbc/lib
chmod 755 $ORACLE_HOME/ldap
chmod 755 $ORACLE_HOME/ldap/mesg
chmod 644 $ORACLE_HOME/ldap/mesg/*
chmod 755 $ORACLE_HOME/network
chmod 755 $ORACLE_HOME/network/admin
chmod 755 $ORACLE_HOME/sqlplus
chmod 755 $ORACLE_HOME/sqlplus/mesg
chmod 644 $ORACLE_HOME/sqlplus/mesg/sp1us.msb
chmod 755 $ORACLE_HOME/nls
chmod 755 $ORACLE_HOME/nls/data
chmod 644 $ORACLE_HOME/nls/data/lx1boot.nlb
chmod 755 $ORACLE_HOME/oracore
chmod 755 $ORACLE_HOME/oracore/zoneinfo
chmod 644 $ORACLE_HOME/oracore/zoneinfo/timezlrg.dat

The really odd thing is that I still have many clients that have the ldap directory locked down, and they are running without problems, even though they are using ldap for name resolution. I don’t get it, but it’s late Friday evening, and I don’t need to get it. I just need to go home!

May 29, 2008

Setting the right level of protection on the Oracle client

The default permissions on the Oracle 10gR2 client after installation make the binaries unavailable to anyone outside the DBA group. What a great idea! I’ve been looking for ways to eliminate end users for years. I’m constantly advocating their termination whenever cost-cutting is required, but so far these suggestions have gone unappreciated. If I can’t eliminate them, I suppose I can’t completely lock them out of the database, either. Too bad.

Since our security policy requires that we always grant the minimum privilege needed to perform the job, I have discovered that the following permissions achieve that minimum:

chmod 755 /opt/oracle
chmod 755 $ORACLE_HOME
chmod 755 $ORACLE_HOME/jdbc
chmod 755 $ORACLE_HOME/jdbc/lib
chmod 755 $ORACLE_HOME/sqlplus
chmod 755 $ORACLE_HOME/sqlplus/mesg
chmod 755 $ORACLE_HOME/sqlplus/mesg/sp1us.msb
chmod 755 $ORACLE_HOME/nls
chmod 755 $ORACLE_HOME/nls/data
chmod 755 $ORACLE_HOME/nls/data/lx1boot.nlb
chmod 755 $ORACLE_HOME/oracore
chmod 755 $ORACLE_HOME/oracore/zoneinfo
chmod 755 $ORACLE_HOME/oracore/zoneinfo/timezlrg.dat

This has been working rather well for months, but recently we’ve had complaints of the following error when running SQLPLUS from the client machine (if users weren’t such complainers, I might not be so anxious to have them removed from the equation)

$ sqlplus
1220822 : map : Permission Denied
1220822 : map : Permission Denied
1220822 : map : Permission Denied

SQL*Plus: Release 10.2.0.3.0 - Production on Thu May 29 11:50:20 2008

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

Enter user-name: diva@mydb

The errors don’t prevent access, but they are annoying. When I searched Google for the message, I found no hits, which is what inspired me to write this post. If you are reading it, perhaps you also searched Google and found exactly one link.

The reason for the message is that the default permission on the SQLPLUS binary is 751 (rwxr-x–x). Changing the permission to 755 (rwxr-xr-x) makes the messages go away. It doesn’t make the users go away though. In my case eliminating the message just eliminated a red herring, but at least it led to the user telling me what the real problem was, instead of just complaining about the message.

As you probably know, Oracle provides a changePerm.sh script in the install directory, which relaxes permissions on Oracle directories and files. The ORAFAQ site (http://www.orafaq.com/wiki/ChangePerm_sh) suggests that the permissions will be too relaxed and that the script should only be run as a last resort. For now, I’m adding the SQLPLUS binary to my list of relaxed permissions.

chmod 755 $ORACLE_HOME/bin/sqlplus

May 13, 2008

Don’t grant execute on DBMS_SYSTEM!

If you support an Oracle database that is used as a resource manager for BEA WebLogic or IBM’s WebSphere products, chances are someone has come to you with a document that says you need to grant them execute on SYS.DBMS_SYSTEM. When this happens, be a Diva and tell them no. This usually puts customers into shock. It’s fun to see the looks on their faces.

The reason they are asking is because of a feature of the xarecover function. If there is a system crash during a distributed transaction, the xarecover function will attempt to clean up. Part of the clean up is a call to DIST_TXN_SYNC, a procedure that is included in the SYS.DBMS_SYSTEM package. The problem is that there are many procedures in this package, and granting the execute privilege gives access to all of them. Oddly enough, the xarecover function comes from the Oracle JDBC, so if you are looking for someone to blame for this, the culprit is Oracle, not IBM or BEA.

The good news is that Oracle has redeemed themselves by moving the DIST_TXN_SYNC procedure to a new package called DBMS_XA that is available to the public. The even better news is that they have made a patch available so you can install it in 10.2.0.2 or 10.2.0.3. It will be included in 10.2.0.4. Because the call comes from the JDBC libraries on the client machine, you have to patch both the DBMS and the client to make it work. The patch numbers are 5892995 (client) and 5945463 (DBMS).

For more information about XA, read the following white papers. You will find the information quite valuable.

http://www.techworld.com/whitepapers/index.cfm?whitepaperid=3139
http://www.oracle.com/technology/products/database/clustering/pdf/bestpracticesforxaandrac.pdf

Here is a link to the DBMS_XA documentation in Oracle 11.1 from the PSOUG web site

http://www.psoug.org/reference/dbms_xa.html

May 11, 2008

Is your security policy putting you at risk?

Most large organizations have a security policy, which is a good thing. Blind adherence to the policy is not. Typically, security policies are written with the end user in mind, and more often than not they are written for servers, and then applied everywhere. Policies that work well for end users logging into a server could be problematic for accounts used for automated authentication to a database.

The first problem that comes to mind is password expiration. Having a 90 day password expiration policy for end user accounts is a very good idea. It is even a good idea for end user accounts within your database. It may not be a good idea for accounts that are used to authenticate your web applications. These values are often cached, and an outage is often required to change the password and clear the old one from cache. Can your application sustain an outage every 90 days? More importantly, if you have set PASSWORD_LIFE_TIME to 90 days for the account profile, what will happen to your application when day 90 arrives? If you aren’t proactive about changing the password before the expiration data, chances are the application will fail, and you will be limited to logging in with SQLPLUS. If you are going to use PASSWORD_LIFE_TIME for these types of accounts (which I don’t necessarily recommend), set it for a high value like 366, and make changing the database password part of every software release.

The next problem that comes to mind is account locking after failed attempts. The default value of this setting used to be UNLIMITED, but as of 10gR2 the default value is 10. As I said in the profile tutorial, this is a brilliant strategy for end user accounts, but a terrible idea for application accounts, as it sets you up for a denial of service attack. The idea that you need to lock an account when it is being attacked is hard for some people to let go of. When I explained the concept to my security team, they kept suggesting that we raise the number of attempts. How about 50 or 100? The problem is that it is very easy to set up an automated attack to launch that many failed logins. It can happen in a matter of seconds. Even if you only lock the account for a minute or two, the attacker can lock up the account faster than you can type alter user account unlock;

So what do you do? To start with, be sure to use long, strong passwords. I mean really ugly passwords with 12-16 characters, and a good variety of alpha, numbers with a special character or two. If you are passing it from an application, does it really matter if the password is ys6Pklm-3uq0vm?

Next, (and this is just as important) be sure you enable auditing of failed login attempts. If you don’t know how, check out the Audit tutorial. Once you have enabled auditing, actively monitor the DBA_AUDIT_TRAIL. You can do this with an Enterprise Manager or Grid Control alert, but if you do, be sure that you have a strong password (and no account locking) for DBSNMP, or your attacker may be able to disable your effort. If you use Enterprise Manager for most of your monitoring/management activities, by all means set up the alert, but don’t rely on it as your only defense. Schedule a job to run every few minutes and alert you if any account has experienced an excessive number of failed login attempts.

declare
vSender VARCHAR2(30) := ‘oracle@dbserver’;
vRecip varchar(40) := ‘thediva@database-diva.com’;
vSub varchar(40) := ‘Excessive failed login attempts’;
vMType VARCHAR2(30) := ‘text/plain; charset=us-ascii’;
vMsg varchar2(80);
cursor fl_cursor is
select username, count(username) fail_count from dba_audit_trail
where returncode=1017 and timestamp > sysdate - 6/1440
group by username order by count(username);
userinfo fl_cursor%ROWTYPE;
begin
open fl_cursor;
loop
fetch fl_cursor into userinfo;
exit when fl_cursor%NOTFOUND;
if userinfo.fail_count > 19 then
vMsg := userinfo.fail_count||’ failed login attempts by user ‘||userinfo.username;
sys.utl_mail.send(vSender,vRecip,null,null,vSub,vMsg,vMtype,null);
end if;
end loop;
close fl_cursor;
end;

To create the user-defined metric in Enterprise Manager, choose a query with metric type number with SQL output 2 columns. Use the following query:

select username, count(username) from dba_audit_trail
where returncode=1017 and timestamp > sysdate - 1/240
group by username order by count(username) ;

Choose a value for your warning and critical thresholds, and set a schedule.Screen shot of Failed Logins User-Defined Metric in Enterprise Manager

March 7, 2008

Keeping up to date with CPUs

It’s March, and I’ve already installed the January 2008 Critical Patch Update in all of my environments.  This gives me at least 6 weeks of contentment before the April CPU is released.  Normally, I would not have all of the environments patched before the next CPU was released, but timing is everything and an opportunity presented itself.  If you haven’t applied it yet, the January 2008 CPU requires you to install a new opatch version first.   In addition to the normal @catcpu after the binaries have been patched,  you also need to rebuild your views.  If you are running in RAC and looking forward to a rolling upgrade, you will be disappointed.  The catcpu doesn’t require a “startup upgrade”, but the view rebuild does.  Just keep telling the finance manager that RAC will pay for itself the day one of the servers fails ;)

Do you patch your clients?  Whenever a new patch set is released, I apply it to the clients as well as the database servers.  The same is true of Critical Patch Updates. The last CPU to include client patches was the April 2007 CPU.  If you have already applied it, your clients are fully patched.  If not, you should plan to patch the clients while the database is down for the January 2008 CPU.

Clients can be particularly vulnerable because they are often installed by system administrators, developers or end users who probably don’t know that they need to apply patches.

« Previous entries