Sunday, December 14, 2014

vMotion Fails at 14% - Operation Timed Out

While working on a 3 host VMware cluster I was trying to vMotion some servers around and begin maintenance and updates.  I hit this error for the first time "Operation Timed Out" and the dreaded red X.

As it turns out there was a vmx-****.vswp file inside the VM folder that apparently was left over from a failed DRS migration.  After doing some research I found out that during a DRS migration the VMX file is started on both nodes.  Once the migration is completed normally one of these gets removed.  In this case it did not.

If you browse the datastore of the VM that will not move and you open up the folder of the VM you should see two vmx-****.vswp files.  If you see only one then you've got another issue and this isn't the fix.  If you do see two they will be .vswp-1 and .vswp-2.  There could be a .vswp for the VM itself.  DON'T DELETE THAT ONE!  I can't stress that enough.

To know which one to get rid of just look at the time stamps.  If the VM is running it may be hard to tell which is which so make sure you turn the VM off before you begin this process.  Once it's off then whichever file remains in the folder is the one you need to delete.

You can try deleting the file using the datastore browser but if that doesn't work then here's how you can remove it.

1. Start the SSH daemon on the host which has the VM registered and then login via SSH.  I recommend Putty.  You can download it HERE along with many other great tools for network and systems maintenance.
2. # cd /vmfs/volumes/{datastore name}
3. # cd {VM folder name}
4. # ls -lash *.vswp (this will show all the files and timestamps just to verify)
5. # rm vmx-{VM name}-[1-2].vswp

That's all there is to it.  Remember to disable the SSH daemon on your host and now you should be able to vMotion the VM again with it running.

Saturday, December 13, 2014

How Long Is Left on Database (IN RECOVERY)

While working on a SQL 2008 R2 database connection error for a medical company I found the database showing (IN RECOVERY).  Naturally no connections could be made during this process.  What I truly needed to know was how to see how long the process has left.  I found this great script HERE.  For my own purposes I decided to list it here as well.

DECLARE @DBName VARCHAR(64) = 'databasename'

DECLARE @ErrorLog AS TABLE([LogDate] CHAR(24), [ProcessInfo] VARCHAR(64), [TEXT] VARCHAR(MAX))

INSERT INTO @ErrorLog
EXEC sys.xp_readerrorlog 0, 1, 'Recovery of database', @DBName

SELECT TOP 5
[LogDate]
,SUBSTRING([TEXT], CHARINDEX(') is ', [TEXT]) + 4,CHARINDEX(' complete (', [TEXT]) - CHARINDEX(') is ', [TEXT]) - 4) AS PercentComplete
,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0 AS MinutesRemaining
,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0/60.0 AS HoursRemaining
,[TEXT]

FROM @ErrorLog ORDER BY [LogDate] DESC

Update for SQL 2012.  The syntax is a bit different:

DECLARE @DBName VARCHAR(64) = 'databasename'

DECLARE @ErrorLog AS TABLE([LogDate] CHAR(24), [ProcessInfo] VARCHAR(64), [TEXT] VARCHAR(MAX))

INSERT INTO @ErrorLog
EXEC master..sp_readerrorlog 0, 1, 'Recovery of database', @DBName

SELECT TOP 5
[LogDate]
,SUBSTRING([TEXT], CHARINDEX(') is ', [TEXT]) + 4,CHARINDEX(' complete (', [TEXT]) - CHARINDEX(') is ', [TEXT]) - 4) AS PercentComplete
,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0 AS MinutesRemaining
,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0/60.0 AS HoursRemaining
,[TEXT]

FROM @ErrorLog ORDER BY [LogDate] DESC

Just copy and paste this script in, change the database name, and you're good to go.

EDIT:  One thing to note is that it may show over 85% and yet if you check the database the process has completed.  I suspect this is because once the recovery is done this information is no longer updated.