Occasional DBA

Tuesday, February 10, 2009

Unable to drop table; acquired NOWAIT

Occasionally, you will find that you cannot do something with a particular table, be it an insert, delete, or even a drop. Usually this is caused by a lock on that table. The way around this is to remove the session that has the lock.

In many cases, it may be best to just wait for the lock to go away on its own. This is particularly true if you are working on a busy database during production hours. However, there are times when you cannot wait, or you know that the process is not a user process and can be safely killed. To do this you would use the KILL SESSION syntax of the ALTER SYSTEM command.

What happens if you don"t know the offending session number?

If you have a known locked object, you can utilize the V$LOCKED_OBJECT dba view.

SELECT object_id,
session_id,
oracle_username,
os_user_name,
process,
locked_mode
FROM V$SESSION

If you are lucky, there will only be one object and one session, you can then use the session id and the object_id to get the remaining information. First, we need to verify that we're lookint at the right object

SELECT owner,
object_name,
object_type
FROM dba_objects
WHERE object_id = '%variable%'

(You would, of course, put the number you obtained from the V$LOCKED_OBJECT view in where I have put %variable%) Assuming the query returns the object name and type that you were expecting, we should go ahead and get the serial number of the locking process.

SELECT sid,
serial#
FROM V$SESSION
WHERE sid = '%sessionid%'

Again, replace the variable as appropriate.

Now, we have all the information we need to kill the troublesome session and get on with our day.

ALTER SYSTEM KILL SESSION '%sessionid%,%serial#%' IMMEDIATE

Note that you need BOTH seesionid and the serial# and that they are contained in single quotes and separated by a comma. Also note that IMMEDIATE is optional and is equivalent to a forced command. This command will release the lock on your object, allowing you to finish what you started.

Monday, May 19, 2008

Truncate SQL Logs

I am off today.

Yeah, right.

The webmaster called me in a panic because the Content Management Server was really poking along and causing people to complain of web-site slowness. He determined that we needed, "more RAM, more processors, or maybe a new machine." Which I doubted.

I looked around at the MS SQL Server 2000 that resides behind the CMS and found an 18GB log file slowly growing larger. What to do, what to do?

Truncate the Logs.

When you truncate the transaction logs for a database you are, in essence, telling the database to stop worry about any transaction logs that are older than those that will fit in the allotted space so, there are some important steps that need to be taken before you do this.

First, you must make sure you have a good backup up the db in question. If your last scheduled full backup (and you DO have scheduled backups...right) was relatively recent, you should be fine. In this case, I had one taken on Sunday night. However, if you do not have a good backup, you can quickly run one by either re-running your existing job or just creating a new one. This is important. You MUST alter the RETAINDAYS statement in your existing job to be fewer than the number of days until the next backup. For instance, if you run backups once a week on Sundays, your RETAINDAYS is probably set to 6. If you want to run a job on Wednesday, you would have to alter the RETAINDAYS value to 3, otherwise your backups on Sunday will not run, since the last backup has not yet expired.

If you would rather just run a special job, you can do so rather easily as below:

USE %databasename%;
go
BACKUP DATABASE [%databasename%] TO [%backupdevice%]
WITH FORMAT

%databasename% and %backupdevice% will need to replaced with parameters from your site. The 'use' statement above is a bit redundant, but I always use it in order to be POSITIVE I am dealing with the right DB. Also note that using the WITH FORMAT option will overwrite anything you had previously stored to that device.

Assuming you do not have a backup device built, you can backup to a file directly with:

USE %databasename%;
go
BACKUP DATABASE [%databasename%] TO DISK = 'C:\Path\to\file.bak'
WITH FORMAT

Again, place the appropriate names in the appropriate places.

Once that is done, you will need to actually truncate the logfile. This is done with the code shown below:

USE %databasename%;
BACKUP LOG %databasename% WITH NO_LOG
go;

Rather than using NO_LOG you could use TRUNCATE_ONLY, they are equivalent in this case.

Now that you have gotten rid of the excess transaction logs, you will need to let the database know you have done so and that it can release the space on the disk. Time for some DBCC!

USE %databasename%;
DBCC SHRINKFILE (%logfilename%|%logfileid%);
go;

You may use the fully qualified log filename or you may use the file id. I usually use the file id because it is a smallish number. In my case, it was '2.' You can find the fileid in the sysfiles table in the database you are working with. Like so:

SELECT fileid,
size,
filename
FROM sysfiles

You will now need to make another full backup, as shown above, in case you have some disaster between now and the next backup, you won't need to re-truncate your logs. Also, since most of the transaction logs were just dropped, you need to run a full backup to have a valid retore path anyway.

Thursday, May 15, 2008

Unable to Qualify My Own Domain Name

Today a network engineer called me and said that he needed a file that was supposedly residing on a decommissioned ftp server.

Wonderful.

I spent my morning digging through the pile of yet-to-be-surplussed hardware and found what I think might be the correct machine. Yet another 280R being revived and pressed into service. (Incidentally, 280R's are heavy, do not lift it by yourself as you will probably hurt your back or tear your pants, or both.) Turns out my first pick was the wrong machine. Somebody conveniently removed the machine labels when they went to the surplus pile. My second, equally heavy, pick is the correct machine.

So, I wrangle the beast into place and plug everything in and power it up, I also have a laptop plugged into the RSC card because I know better than to think it will boot happily. Turns out, though, that I'm wrong. It boots up quite nicely, and immediately begins to complain about not being able to qualify it's name:

unable to qualify my own domain name (servername) -- using short name

Now, when a Solaris machine throws that error, you can be sure that nothing to do with the network will function. This error is caused because the machine cannot find its DNS server on the network. That could mean that the DNS server is gone or, more likely, that the network is not configured properly.

Since I know that this machine was in a different building before it was decommissioned, I imagine that the jack it is plugged into is not correctly set up to support the network configuration on the server. To find out what the machine expects, I try an ifconfig -a:

prompt $> ifconfig -a

lo0: flags=2001000849
inet 127.0.0.1 netmask ff000000
eri0: flags=1000843
& inet xxx.xxx.69.99 netmask ffff0000 broadcast xxx.xxx.69.127

Welp, I wasn't root when I ran that command, so the MAC address is not showing. Network engineers ALWAYS want the MAC address so they can trace the machine through the network. So, I'll have to try again as root:

prompt #> ifconfig -a

lo0: flags=2001000849
inet 127.0.0.1 netmask ff000000
eri0: flags=1000843
inet xxx.xxx.69.99 netmask ffff0000 broadcast xxx.xxx.69.255
ether 0:3:ba:xx:xx:xx

OK, we see the IP address and the MAC address. Good. Be aware that the MAC does not display proceeding zeros. Some network engineers will not understand that format. I always relay it like this:

00:03:BA:XX:XX:XX

The other thing I notice is the dreaded interface type. Experience tells me that 'eri0' does not cooperate well with Cisco switches that auto-detect speed settings. So I pass this info along to the engineer who sets up my port correctly and sets the speed to run at 100Mb.

Still doesn't work.

After some arguing with the engineer, I find that the network is set up to use a small CIDR block. That leads me to look at the ifconfig -a output again:

eri0: flags=1000843
inet xxx.xxx.69.99 netmask ffff0000 broadcast xxx.xxx.69.255
ether 0:3:ba:xx:xx:xx

That netmask value, 'ffff0000' looks a lot like a Class B address, so does the broadcast address. I check with the engineer. Sure enough, they are wrong. He tells me the correct netmask and the correct broadcast address. Now I need to set those values. Ifconfig is, again, your friend:

ifconfig eri0 netmask 255.255.255.192 broadcast xxx.xxx.69.127

Now we check to see if it took:

prompt #> ifconfig -a
lo0: flags=2001000849
inet 127.0.0.1 netmask ff000000
eri0: flags=1000843
inet xxx.xxx.69.99 netmask ffffffc0 broadcast xxx.xxx.69.127
ether 0:3:ba:xx:xx:xx

Huzzah! But does the network actually function?

prompt #>nslookup yahoo.com

Server: xxx.xxx.67.1
Address: xxx.xxx.67.1#53

Non-authoritative answer:
Name: yahoo.com
Address: 66.94.234.13
Name: yahoo.com
Address: 216.109.112.135

I call the engineer back and give him the login/password info he needs, and he gets his file! Finally, I can turn the beast off and move it out of my office. It was starting to get hot. As an additional precaution, I take the time to make a copy of the file in question on my PC, so that I won't have to play this game again in 6 months time.

Tuesday, October 16, 2007

Solaris 10 fsck

Today a POS E250 that runs some Sunray kiosks crapped out for no particular reason. I am working from home today, so I tried to SSH into the machine, with no luck. I next tried telnet, also no luck. Ping worked though.

Uh oh...hung system.

I had to get a Windows admin on-site to do a cold boot on the machine, which did not help. I was all set to drive into the office on my flex-day...again...and then I remembered the magic of RSC. E250's come with an RSC (Remote System Control) card and, further, have a neato little utility that shows you a GUI version of the front panel and also allows a command console, even when the server is screwed up or off. Newer versions of RSC are web-based.

Anyway, as I suspected, the idiot server was hung waiting for an fsck to be run on a particular slice. How to handle this? Well, first you will need to enter the root password to get into single-user mode. Then you will need to check the appropriate slice. In this case, the slice having difficulty was listed as:

/dev/dsk/c1t8d0s6

So, now we run fsck on that slice. There are two important points here. First, fsck requires the raw device (if the slice is larger than 2Gb), which means you would run it on:

/dev/rdsk/c1t8d0s6

The second point is the -y flag. If you do not use the -y flag, you will spend the rest of your life pressing 'y' to fix busted inodes. So, the command you want is:

fsck -F ufs -y /dev/rdsk/c1t8d0s6

or

fsck -F ufs -o f,p /dev/rdsk/c1t8d0s6

Both will have the same result. The second method may be slightly safer because the options are based upon the filesystem. In this case, f=force and p=preen (non-interactive mode)

The '-F' is optional and tells what type of filesystem is in use. The default filesystem for the machine is defined in /etc/default/fs and fsck will use that value if you do not provide one. I use the -F out of habit and to be sure I get it right. YMMV.

Once the fsck completes you can either reboot the machine or type exit to continue the boot process. This is especially useful if you have more than one filesystem that is botched, or if you are not sure of the status of the remaining filesystems. I usually use the exit feature to make sure I got everything and then do a graceful reboot to be sure.

-TheDave

Thursday, October 11, 2007

Swap Space

So, today I am building a machine to use as a test box for ensuring that our various databases can survive application upgrades, database upgrades, and OS patches that have been accumulating for the last couple of months. A fairly routine install of Solaris 10 on a 280R went well until it came to the swap space.

I never allow Solaris to automatically lay out my filesystems as having a 600MB / slice and a 70GB /export slice is entirely useless in my environment. So, I went ahead and chopped it up as I saw fit and let the installer work its magic.

The came the time to set up the swap slice. I generally install the swap area at slice 0 and use the beginning cylinders of the disk for the space for the sake of speed. Especially on a database machine, that can be important. So, the first thing I do is go into format and add my swap slice. Then I attempt to add the newly made slice to my running swap config:

# swap -a /dev/rdsk/c0t0d0s0
/dev/rdsk/c0t0d0s0: No such file or directory

oops...I should not be using the raw device. I try again, this time using the actual device path:

# swap -a /dev/dsk/c0t0d0s0
/dev/dsk/c0t0d0s0: No such file or directory

What the hell? OK, time for a sanity check:

# man swap
Reformatting page. Please wait.... done

Before reading a word, I realize that I failed to make a filesystem on the newly created slice.

# newfs -v /dev/rdsk/c0t0d0s0
newfs: /dev/rdsk/c0t0d0s0: No such file or directory

Well, this is NO good. Maybe I forgot to write the label when I formatted the disk? A simple check with format shows that the slice is, indeed, still there. Did I use the raw device when attempting the newfs? Let's try again, just to be sure:

# newfs -v /dev/rdsk/c0t0d0s0
newfs: /dev/rdsk/c0t0d0s0: No such file or directory

Damn. What the hell do I do now?

To make it short, I spend the next 45 minutes reading various man pages and web pages to try to figure out what my problem is. I even had an abortive experiment with mkfs, which is not useful in this situation. Then during my umpteenth view of the output from format I see a small detail that I missed. See if you can catch it:

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
1. c1t1d0

Specify disk (enter its number):

Give up?

I tried:

# swap -a /dev/dsk/c0t0d0s0
/dev/dsk/c0t0d0s0: No such file or directory

I was using the wrong damn controller number. The lesson here, ladies and gentlemen, is to PAY ATTENTION to what you are doing so that a 1 minute effort doesn't end up taking a whole hour of your life from you.

Incidentally, if you wanted to make the new swap space permanent across reboots, you would need to add the following line to /etc/vfstab:

/dev/dsk/c1t0d0s0 - - swap - no -

You do not NEED to reboot at this point, but this is a test machine and a new build so I did so anyway.

-TheDave